The Jason & Doug Blog: Twitter

This blog is run by Jason Jon Benedict and Doug Beare to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world. Jason currently works as a geospatial professional based in Malaysia. Doug lives in the United Kingdom and is currently Director of Globefish Consultancy Services which provides scientific advice to organisations that currently include the STECF [Scientific, Technical and Economic Committee for Fisheries, https://stecf.jrc.europe.eu/] and ICCAT, https://www.iccat.int/en/

Showing posts with label Twitter. Show all posts

Monday, 23 February 2015

Why are the British tweeting about climate change at night?

Points of post:

To demonstrate how the library twitteR can be combined with mapping capabilities of ggplot

To reveal the locations of people tweeting about climate change at two points in time.

To show that climate change tweeters are mostly from the developed world.

To show that some (sad) tweeters are twittering about climate change in the middle of the night.

In a previous blog post, we used twitteR and wordcloud to summarize which other words were occurring in tweets (combined from around the world) together with the keywords, ‘climate-change’ and ‘Bangladesh’ at two arbitrarily selected time-points.

The geo-locations of tweets are, however, also often available and are potentially very interesting and revealing.

Fifteen-hundred (1500) is the maximum number of tweets that can be captured using the twitteR library with one call. There probably are ways to get more data but we guess you probably have then to spend money.

Here we used #climatechange because it coincided with the last days of COP20 in Lima, Peru which ran from 1^st to 12^th December 2014. It is an extremely important global forum at which nations can meet and discuss their options for reducing carbon emissions, see http://unfccc.int/meetings/lima_dec_2014/meeting/8141.php)

The first ‘tweet map’ (below) we produced is based on approximately 1500 geo-located tweets that contained the hash-tag, #climatechange, and which were ‘tweeted’ at about 10am GMT on the 11^th December 2014. It shows that #climatechange tweets were coming from 4 main areas: North America, Europe, India and Australia. There didn’t appear either to be too many tweets coming out of Lima which surprised us. Maybe the delegates were too busy enjoying the South-American hospitality, and catching up with old mates to take much interest in Climate Change!

Geo-located tweets with #climatechange tweeted at around 10am GMT 11th December 2014

The second ‘tweet-map’ (below), also based on approximately 1500 geo-located #climatechange tweets, is for a snapshot that took place at 5 hours later at around 3am GMT on the final day of the conference (12^th December 2014). The overall pattern between the maps remains the same but the relative frequency of #climatechange tweeters from Europe, as compared to North America, has increased. People in the United Kingdom were particularly keen, twittering like mad about climate change at 3am. Why? We don’t know.

Geo-located tweets with #climatechange tweeted at around 3am GMT 12th December 2014

Note that tweets are geo-located, either by exploiting the users’ location as defined in their profile, or by ascertaining the exact location directly if allowed by the user. This can be effected, either from GPS-enabled software which many people have installed on their smart-phones, or by using an IP-address. This means that not all tweets can be geo-located with any great precision. Some are only geo-located at the National and/or regional levels, as evident from the large circle in the middle of Australia. That’s to say these cautious tweeters only gave ‘Australia’ as their location.

As we have explained in an earlier blog post on worldclouds and twitteR, to pull the data from Twitter using its API, you will need to have a Twitter account and carry out a 'twitter authentication'. The R code to perform a search on twitter for the selected 'term(s)' and mapping them out is detailed below.

# Load required libraries
 
library(RCurl)
library(maps)
library(stringr)
library(tm)
library(twitteR)
library(streamR)
library(grid)
library(ggplot2)
library(rgdal)
library(ggmap)
 
# Set working directory
 
setwd("D:/ClimData/")
 
#### Fonts on Windows ####
windowsFonts(ClearSans="TT Clear Sans")
 
# Load Credentials
 
load("D:/ClimData/Twitter/twitter authentification.Rdata")
registerTwitterOAuth(twitCred)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
 
# Search term on twitter
 
searchTerm <- "#climatechange"
searchResults <- searchTwitter(searchTerm,n=1500,since='2014-12-11', until='2014-12-12')  
tweetFrame <- twListToDF(searchResults) 
 
userInfo <- lookupUsers(tweetFrame$screenName)  
userFrame <- twListToDF(userInfo)
 
locatedUsers <- !is.na(userFrame$location)
 
# Geocode locations using 'ggpmap' library
 
locations <- geocode(userFrame$location[locatedUsers])
 
locations_robin <- project(as.matrix(locations), "+proj=robin")
 
locations_robin_df <- as.data.frame(locations_robin)
 
# Import world boundaries
 
world <- readOGR(dsn="D:/Data/ne_10m_admin_0_countries", layer="ne_10m_admin_0_countries")
 
world_robin <- spTransform(world, CRS("+proj=robin"))
 
world_robin_df <- fortify(world_robin)
 
counts <- aggregate(locations_robin_df$V1,by=list(x=locations_robin_df$V1,y=locations_robin_df$V2),length)
names(counts)[3] <- "count"
 
# Theme options for Map
 
theme_opts <- list(theme(panel.grid.minor = element_blank(),
                         panel.grid.major = element_blank(),
                         panel.background = element_blank(),
                         panel.border = element_blank(),
                         plot.background = element_blank(),
                         axis.line = element_blank(),
                         axis.text.x = element_blank(),
                         axis.text.y = element_blank(),
                         axis.ticks = element_blank(),
                         axis.title.x = element_blank(),
                         axis.title.y = element_blank(),
                         legend.position = "bottom",
                         legend.key = element_blank(),
                         legend.title = element_text(colour="black", size=12, face="bold",family="Clear Sans"),
                         legend.text = element_text(colour="black", size=10, face="bold",family="Clear Sans"),
                         plot.title = element_text(size=15,face="bold",lineheight=0.5,family="Clear Sans")))
 
# Plot map and tweet counts 
 
tp <- ggplot(world_robin_df)+
      geom_polygon(aes(x = long, y = lat, group = group), fill = "grey20")+
      geom_path(aes(x = long, y = lat, group = group),colour = "grey40", lwd = 0.2)+
      geom_point(data= counts, aes(x=x,y=y,size=count),color="#32caf6", alpha=I(8/10))+
      scale_size_continuous(name="Number of tweets")+
      ggtitle("Twitter Map of #climatechange\n")+
      xlab("")+ ylab("")+
      coord_equal()+
      theme_bw() + 
      guides(size = guide_legend(title.position = "top",title.hjust =0.5))+
      theme_opts
 
tp
 
# Save to png
 
ggsave(tp,file="D:/Twitter_ClimateChange_Map.png",dpi=500,w=10,h=6,unit="in",type="cairo-png")

Created by Pretty R at inside-R.org

Monday, 8 December 2014

Twittering about climate change and Bangladesh

Points of post:

To describe how twitteR can be used, together with Word Clouds (wordcloud), to reveal what people are twittering about.
To show how rapidly interest in various subjects can change.

Twitter (https://twitter.com/) is a social media network allowing users to send short (140 characters) messages called ‘tweets’. Twitter started up in 2006 and has had phenomenal success with 340 million tweets sent each day in 2012.

In an earlier post on our blog, we described how archives of Google internet searches can tell us what people are interested in, and how these interests change over time. It is also possible to assess what factors might be affecting this interest, e.g. a disease outbreak or climatic disaster. Similarly Twitter might be viewed as a massive, passive opinion poll.

Again wonderful R has libraries to analyze the worlds’ tweets (twitteR) which can be useful to perform social media mining and to reveal what people are thinking (twittering) about; whether it’s Brad Pitts’ facial hair (http://www.dailymail.co.uk/tvshowbiz/article-2792839/brad-pitt-continues-wear-rakish-moustache-long-sideburns-grew-sea-new-york.html) or climate variability and change.

Jason and I are very interested in how climate change will affect the people of Bangladesh, in particular, and we’ve already explored the temporal and spatial dependence of various weather variables such as temperature and sea-level in the country.

Hence we experimented by inserting the two keywords, ‘climate-change’ and ‘Bangladesh’ as our search parameters into twitteR. Our aim was to reveal with what other words or phrases these keywords were associated. ‘Word Clouds’ are a useful way to summarize the results. In Word cloud graphical representations, font sizes are scaled according to the frequencies at which words occur; in this case how often they are ‘tweeted’. Below is a Word cloud for September 2014 which shows that ‘URBANISATION’ and ‘ROUND TABLE’ occurred most often in tweets.

By December 2014, however, when we tried again, the identical keywords were associated with ‘FLOATING GARDENS’, and the increasingly nebulous ‘ADAPTATION’ and ‘RESILIENCE’.

Clearly social media such as Twitter, combined with R’s software packages and plotting facilities, are fantastic tools for understanding the often capricious winds affecting the movement of global public interest and opinion.

To carry out your own social media mining using the twitteR package, you will first need to have an account on twitter and create an 'app'. Please see this and this link to get a better idea on how to get that done.

You will also find information in the above links on the 'twitter authentication' process which is also required to be able to perform the keyword searches using twitteR.

The code used to produce the above word clouds is outlined below.

# Load required libraries

library(RCurl)
library(stringr)
library(tm)
library(wordcloud)
library(RColorBrewer)
library(twitteR)
library(streamR)
library(grid)
library(ggplot2)
library(wesanderson)
 
# Load credentials
load("D:/ClimData/Twitter/twitter authentification.Rdata")
registerTwitterOAuth(twitCred)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
 
tweets <- searchTwitter("Climate Change Bangladesh", n=1500, lang="en") 
 
tweets.text <- sapply(tweets, function(x) x$getText())
 
# Remove non alphanumeric characters
tweets.text <- gsub("[^a-zA-Z0-9 ]","",tweets.text)
 
# Convert all text to lower case
tweets.text <- tolower(tweets.text)
 
# Replace blank space (“rt”)
tweets.text <- gsub("rt", "", tweets.text)
 
# Replace @UserName
tweets.text <- gsub("@\\w+", "", tweets.text)
 
# Remove punctuation
tweets.text <- gsub("[[:punct:]]", "", tweets.text)
 
# Remove links
tweets.text <- gsub("http\\w+", "", tweets.text)
 
# Remove tabs
tweets.text <- gsub("[ |\t]{2,}", "", tweets.text)
 
# Remove blank spaces at the beginning
tweets.text <- gsub("^ ", "", tweets.text)
 
# Remove blank spaces at the end
tweets.text <- gsub(" $", "", tweets.text)
 
# remove word amp
tweet.text <- gsub("amp","",tweets.text)
 
# Create corpus
tweets.text.corpus <- Corpus(VectorSource(tweets.text))
 
# Clean up by removing stop words
tweets.text.corpus <- tm_map(tweets.text.corpus, function(x)removeWords(x,stopwords()))
 
# Create document term matrix applying some transformations
tdm = TermDocumentMatrix(tweets.text.corpus,
      control = list(removePunctuation = TRUE,
      stopwords = c("climate change","bangladesh","climate", "change"), stopwords("english"),
      removeNumbers = TRUE, tolower = TRUE))
 
# Define tdm as matrix
m = as.matrix(tdm)
 
# Get word counts in decreasing order
word_freqs = sort(rowSums(m), decreasing=TRUE) 
 
# Create a data frame with words and their frequencies
dm = data.frame(word=names(word_freqs), freq=word_freqs)
 
# Plot and save the image in png format
png("BGD_ClimateChange_Dec2014.png", width=5, height=5, units="in", res=500)
 
wordcloud(dm$word, dm$freq, random.order=FALSE, min.freq = 2,scale=c(4,0.5),max.words = 100,colors=wes.palette(5,"Darjeeling"))
 
dev.off()

Created by Pretty R at inside-R.org

UA-52263387-1