Covid Tweet Analysis








Out of 170 thousands tweets wordwide, people tends to use twitter web APP and android to post tweets.




The top tags are #COVID19, #Covid19,#Coronavirus... Hmmm.... that is kind of within our expectation :)



After data cleaning and grouping, we find the most used words from U.S. users are centered around politics such as Trump and education such as school. In contrast, Indian users tend to discuss new cases and vaccine for COVID-19.
Now Let's do the data modeling!

Well, the data are pretty messy. We first removed punctuation, stopwords, url, and lowercased our tweets. Finally, we tokenized our tweets.



We used RNN network to build our deep learning model. In our training data, we pre-labled 1 as "positive" emotion while 0 as "negative" emotion. After we converted our tokenized sentence into vectors (leveraged GloVe to calculate embedding matrix), we identified the 3rd epoch is our optimized choice.


After we trained our model, we threw in our covid tweets to make prediction. We set our threshold as 0.5. That is, for sentiment score over 0.5, we classified the tweet as 1, below classified as 0.



Now let's see the result! Out of 10000 tweet, 46% are positive and 54% are negative.This even spreadout is out of my expectation since there are indeed many people who are optimistics amid this pandemics.



Among positive tweet, if you zoom in a little bit, we can see people often used "health", "thank" and "good".Among negative tweet, we people often used words such as "death". In future, we could further apply this model to predict future public reaction regarding more specific insidences such as covid vaccines and lockdown. Currently, I am exploring the covid vaccine datasets and will update this website soon:)