“Are you interested in how people emotionally reacted to COVID-19 last summer? Here I construct a sentiment model using over 100 thousands tweets to predict public emotion on COVID-19. But before we jump into the model, let's first do some fun exploratory analysis on tweet data."
Data sources: Kaggle/twitter: 170 thousands covid tweets posted from 07/2020 to 08/2020
Out of 170 thousands tweets wordwide, people tends to use twitter web APP and android to post tweets.
The top tags are #COVID19, #Covid19,#Coronavirus... Hmmm.... that is kind of within our expectation :)
After data cleaning and grouping, we find the most used words from U.S. users are centered around politics such as Trump and education such as school. In contrast, Indian users tend to discuss new cases and vaccine for COVID-19.
Now Let's do the data modeling!
Data sources: Kaggle/twitter: over 1 millions random tweets with pre-assigned sentiment score. We used 100000 of this data as training set our model and 20000 as validation set.
Well, the data are pretty messy. We first removed punctuation, stopwords, url, and lowercased our tweets. Finally, we tokenized our tweets.
data cleaning
We used RNN network to build our deep learning model. In our training data, we pre-labled 1 as "positive" emotion while 0 as "negative" emotion. After we converted our tokenized sentence into vectors (leveraged GloVe to calculate embedding matrix), we identified the 3rd epoch is our optimized choice.
padded sentences
validation accuracy peaked around 3rd epoch ( epoch =2 )
After we trained our model, we threw in our covid tweets to make prediction. We set our threshold as 0.5. That is, for sentiment score over 0.5, we classified the tweet as 1, below classified as 0.
our prediction dataframe
Now let's see the result! Out of 10000 tweet, 46% are positive and 54% are negative.This even spreadout is out of my expectation since there are indeed many people who are optimistics amid this pandemics.
Among positive tweet, if you zoom in a little bit, we can see people often used "health", "thank" and "good".Among negative tweet, we people often used words such as "death". In future, we could further apply this model to predict future public reaction regarding more specific insidences such as covid vaccines and lockdown. Currently, I am exploring the covid vaccine datasets and will update this website soon:)