Points of post:
- To describe how twitteR can be used, together with Word Clouds (wordcloud), to reveal what people are twittering about.
- To show how rapidly interest in various subjects can change.
Twitter (https://twitter.com/) is a social media network allowing users to send short (140 characters) messages called ‘tweets’. Twitter started up in 2006 and has had phenomenal success with 340 million tweets sent each day in 2012.
In an earlier post on our blog, we described how archives of Google internet searches can tell us what people are interested in, and how these interests change over time. It is also possible to assess what factors might be affecting this interest, e.g. a disease outbreak or climatic disaster. Similarly Twitter might be viewed as a massive, passive opinion poll.
Again wonderful R has libraries to analyze the worlds’ tweets (twitteR) which can be useful to perform social media mining and to reveal what people are thinking (twittering) about; whether it’s Brad Pitts’ facial hair (http://www.dailymail.co.uk/tvshowbiz/article-2792839/brad-pitt-continues-wear-rakish-moustache-long-sideburns-grew-sea-new-york.html) or climate variability and change.
Jason and I are very interested in how climate change will affect the people of Bangladesh, in particular, and we’ve already explored the temporal and spatial dependence of various weather variables such as temperature and sea-level in the country.
Hence we experimented by inserting the two keywords, ‘climate-change’ and ‘Bangladesh’ as our search parameters into twitteR. Our aim was to reveal with what other words or phrases these keywords were associated. ‘Word Clouds’ are a useful way to summarize the results. In Word cloud graphical representations, font sizes are scaled according to the frequencies at which words occur; in this case how often they are ‘tweeted’. Below is a Word cloud for September 2014 which shows that ‘URBANISATION’ and ‘ROUND TABLE’ occurred most often in tweets.
By December 2014,
however, when we tried again, the identical keywords were associated with
‘FLOATING GARDENS’, and the increasingly nebulous ‘ADAPTATION’ and
‘RESILIENCE’.
Clearly social media
such as Twitter, combined with R’s software packages and plotting facilities,
are fantastic tools for understanding the often capricious winds affecting the
movement of global public interest and opinion.
To carry out your own social media mining using the twitteR package, you will first need to have an account on twitter and create an 'app'. Please see this and this link to get a better idea on how to get that done.
You will also find information in the above links on the 'twitter authentication' process which is also required to be able to perform the keyword searches using twitteR.
The code used to produce the above word clouds is outlined below.
# Load required libraries
library(RCurl) library(stringr) library(tm) library(wordcloud) library(RColorBrewer) library(twitteR) library(streamR) library(grid) library(ggplot2) library(wesanderson) # Load credentials load("D:/ClimData/Twitter/twitter authentification.Rdata") registerTwitterOAuth(twitCred) options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))) tweets <- searchTwitter("Climate Change Bangladesh", n=1500, lang="en") tweets.text <- sapply(tweets, function(x) x$getText()) # Remove non alphanumeric characters tweets.text <- gsub("[^a-zA-Z0-9 ]","",tweets.text) # Convert all text to lower case tweets.text <- tolower(tweets.text) # Replace blank space (“rt”) tweets.text <- gsub("rt", "", tweets.text) # Replace @UserName tweets.text <- gsub("@\\w+", "", tweets.text) # Remove punctuation tweets.text <- gsub("[[:punct:]]", "", tweets.text) # Remove links tweets.text <- gsub("http\\w+", "", tweets.text) # Remove tabs tweets.text <- gsub("[ |\t]{2,}", "", tweets.text) # Remove blank spaces at the beginning tweets.text <- gsub("^ ", "", tweets.text) # Remove blank spaces at the end tweets.text <- gsub(" $", "", tweets.text) # remove word amp tweet.text <- gsub("amp","",tweets.text) # Create corpus tweets.text.corpus <- Corpus(VectorSource(tweets.text)) # Clean up by removing stop words tweets.text.corpus <- tm_map(tweets.text.corpus, function(x)removeWords(x,stopwords())) # Create document term matrix applying some transformations tdm = TermDocumentMatrix(tweets.text.corpus, control = list(removePunctuation = TRUE, stopwords = c("climate change","bangladesh","climate", "change"), stopwords("english"), removeNumbers = TRUE, tolower = TRUE)) # Define tdm as matrix m = as.matrix(tdm) # Get word counts in decreasing order word_freqs = sort(rowSums(m), decreasing=TRUE) # Create a data frame with words and their frequencies dm = data.frame(word=names(word_freqs), freq=word_freqs) # Plot and save the image in png format png("BGD_ClimateChange_Dec2014.png", width=5, height=5, units="in", res=500) wordcloud(dm$word, dm$freq, random.order=FALSE, min.freq = 2,scale=c(4,0.5),max.words = 100,colors=wes.palette(5,"Darjeeling")) dev.off()
No comments:
Post a Comment