BERT for sentiment analysis on Sustainability Reports

This is from KPMG, they have a sustainable "things" team or something.

# The problem
    different ppl have diff opinions on the expressed sentiment.
    very time consuming to find problematic examples.
    there is also no good comparison we can make.

# Can we quantify the balance?
    --> SENTIMENT ANALYSIS!

    We tried a LOT of them, different models, comertial:
        Heaven on demand
        Rosetta
        text-processor.com
      Open source:
        Stanford Sentiment Treebank
        Textblob
      Self-Traind:
        tf RNN model

    They didn't really work well because of the data used for their training.
    They used "www.menti.com" to do annotations/classification of data for their reports.

    they had to normalize how to rank the sentiment. Also negative sentiment on public reports is written in "positive" way
        (we will see improvements on.., new opportunities, or challenges on..)

# BERT
    Lets start simple
    Vector representations
        one hot encoding for a matrix bit, 1 bit per word of a long vector of works

    Word2Vec models: variation of an auto encoder.
        NN, structure of an hourglass. You try to predict the input itself.

    Word2Vec doesn't work for words w/ multiple meanings (river bank, hsbc bank)
    Sequence models: predict based on sequence of words
        basically: RNN

    BERT:
        Masked Language Modelling (MLM)
        try to predict a word by masking that word, capturing the context of everything around it.
        .. she's going quite fast, something about CLS tokens and how they are important for classification..

        They use BERT, with the output that into their own classifier model.
        BERTbase and BERTlarge

        tinyBERT
        ALBERT
        RoBERT
        CamemBERT

        pytorch, keras, hugging face, ..

# Finetuning BERT for their data
    ~800 sustainability reports
    aprox 90 pages each
    extract all data and split into sentences.
    didn't do stemming nor lematization

    They don't like tensorflow and ended up building some code on Keras
    this was before tf-2.0

    ~8000 labelled sentences
    2days labelled by humans
    Negative sentiment: only 3%, Positive: 21%, Neutral: 76%

# Results
Accuracy: 82%
    Negative: 71%
    Positive: 80%
    Neutral: 94%

    it never conflicted neutral with positive, so it was only on adjacent labels

    BERT was able to correctly predict negative sentences that humans didn't spot.

# Back to the problem
    the analists don't always agree and cant consistently evaluate reports.
    But BERT was consistent with them and seems to have better consistency than humans.
    This means bulk sentiment analsysis is now easy to do on the reports.

# Demo
    PDF parser
    POS tags
    Named Entities
    Sentiment (highlighted color on sentences)
    Report overview, with topics (probably tf/idf)


.. this was a very fast talk, and she skimmed a lot over important bits.
 My interpretation: this was a very good execution and very smart team, but it was so fast I don't understand the "insight" or learnings.

 NOTE TO SELF: when you do talks, think about what is the LEARNING for your audience