This blog is run by Jason Jon Benedict and Doug Beare to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world. Jason currently works as a geospatial professional based in Malaysia. Doug lives in the United Kingdom and is currently Director of Globefish Consultancy Services which provides scientific advice to organisations that currently include the STECF [Scientific, Technical and Economic Committee for Fisheries, https://stecf.jrc.europe.eu/] and ICCAT, https://www.iccat.int/en/

Monday, 8 December 2014

Twittering about climate change and Bangladesh

Points of post:
  • To describe how twitteR can be used, together with Word Clouds (wordcloud), to reveal what people are twittering about.
  • To show how rapidly interest in various subjects can change.

Twitter (https://twitter.com/) is a social media network allowing users to send short (140 characters) messages called ‘tweets’. Twitter started up in 2006 and has had phenomenal success with 340 million tweets sent each day in 2012. 

In an earlier post on our blog, we described how archives of Google internet searches can tell us what people are interested in, and how these interests change over time. It is also possible to assess what factors might be affecting this interest, e.g. a disease outbreak or climatic disaster. Similarly Twitter might be viewed as a massive, passive opinion poll. 

Again wonderful R has libraries to analyze the worlds’ tweets (twitteR) which can be useful to perform social media mining and to reveal what people are thinking (twittering) about; whether it’s Brad Pitts’ facial hair (http://www.dailymail.co.uk/tvshowbiz/article-2792839/brad-pitt-continues-wear-rakish-moustache-long-sideburns-grew-sea-new-york.html) or climate variability and change. 

Jason and I are very interested in how climate change will affect the people of Bangladesh, in particular, and we’ve already explored the temporal and spatial dependence of various weather variables such as temperature and sea-level in the country. 

Hence we experimented by inserting the two keywords, ‘climate-change’ and ‘Bangladesh’ as our search parameters into twitteR. Our aim was to reveal with what other words or phrases these keywords were associated. ‘Word Clouds’ are a useful way to summarize the results. In Word cloud graphical representations, font sizes are scaled according to the frequencies at which words occur; in this case how often they are ‘tweeted’. Below is a Word cloud for September 2014 which shows that ‘URBANISATION’ and ‘ROUND TABLE’ occurred most often in tweets.



By December 2014, however, when we tried again, the identical keywords were associated with ‘FLOATING GARDENS’, and the increasingly nebulous ‘ADAPTATION’ and ‘RESILIENCE’.


Clearly social media such as Twitter, combined with R’s software packages and plotting facilities, are fantastic tools for understanding the often capricious winds affecting the movement of global public interest and opinion.

To carry out your own social media mining using the twitteR package, you will first need to have an account on twitter and create an 'app'. Please see this and this link to get a better idea on how to get that done.

You will also find information in the above links on the 'twitter authentication' process which is also required to be able to perform the keyword searches using twitteR.

The code used to produce the above word clouds is outlined below.

# Load required libraries
library(RCurl)
library(stringr)
library(tm)
library(wordcloud)
library(RColorBrewer)
library(twitteR)
library(streamR)
library(grid)
library(ggplot2)
library(wesanderson)
 
# Load credentials
load("D:/ClimData/Twitter/twitter authentification.Rdata")
registerTwitterOAuth(twitCred)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
 
tweets <- searchTwitter("Climate Change Bangladesh", n=1500, lang="en") 
 
tweets.text <- sapply(tweets, function(x) x$getText())
 
# Remove non alphanumeric characters
tweets.text <- gsub("[^a-zA-Z0-9 ]","",tweets.text)
 
# Convert all text to lower case
tweets.text <- tolower(tweets.text)
 
# Replace blank space (“rt”)
tweets.text <- gsub("rt", "", tweets.text)
 
# Replace @UserName
tweets.text <- gsub("@\\w+", "", tweets.text)
 
# Remove punctuation
tweets.text <- gsub("[[:punct:]]", "", tweets.text)
 
# Remove links
tweets.text <- gsub("http\\w+", "", tweets.text)
 
# Remove tabs
tweets.text <- gsub("[ |\t]{2,}", "", tweets.text)
 
# Remove blank spaces at the beginning
tweets.text <- gsub("^ ", "", tweets.text)
 
# Remove blank spaces at the end
tweets.text <- gsub(" $", "", tweets.text)
 
# remove word amp
tweet.text <- gsub("amp","",tweets.text)
 
# Create corpus
tweets.text.corpus <- Corpus(VectorSource(tweets.text))
 
# Clean up by removing stop words
tweets.text.corpus <- tm_map(tweets.text.corpus, function(x)removeWords(x,stopwords()))
 
# Create document term matrix applying some transformations
tdm = TermDocumentMatrix(tweets.text.corpus,
      control = list(removePunctuation = TRUE,
      stopwords = c("climate change","bangladesh","climate", "change"), stopwords("english"),
      removeNumbers = TRUE, tolower = TRUE))
 
# Define tdm as matrix
m = as.matrix(tdm)
 
# Get word counts in decreasing order
word_freqs = sort(rowSums(m), decreasing=TRUE) 
 
# Create a data frame with words and their frequencies
dm = data.frame(word=names(word_freqs), freq=word_freqs)
 
# Plot and save the image in png format
png("BGD_ClimateChange_Dec2014.png", width=5, height=5, units="in", res=500)
 
wordcloud(dm$word, dm$freq, random.order=FALSE, min.freq = 2,scale=c(4,0.5),max.words = 100,colors=wes.palette(5,"Darjeeling"))
 
dev.off()
Created by Pretty R at inside-R.org

No comments:

Post a Comment