This blog is run by Jason Jon Benedict and Doug Beare to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world. Jason currently works as a geospatial professional based in Malaysia. Doug lives in the United Kingdom and is currently Director of Globefish Consultancy Services which provides scientific advice to organisations that currently include the STECF [Scientific, Technical and Economic Committee for Fisheries, https://stecf.jrc.europe.eu/] and ICCAT, https://www.iccat.int/en/

Wednesday, 26 November 2014

Trending climate-related keywords with GTrendsR





Key points of post

  • Google stores data on millions of internet searches throughout the world and these data are publicly available. 
  • Google trends (http://www.google.com/trends/?hl=en-GB)’ is a fascinating system that summarizes information from all these searches revealing what subjects are ‘trending’ worldwide.
  • Conveniently R has a library (GTrendsR) capable of extracting, analyzing and plotting these data. 
  • Here we explore time series data (2004-2014) summarizing Google searches for the keywords, ‘climate-change’, ‘global-warming’, ‘ocean-acidification’ and ‘sea-level rise’
  • All four series have trends and have either unimodal or bimodal seasonal cycles.
The other day Jason decided to play around (nerdy I know) with R’s relatively new GTrendR library and analyze the time-series statistics for the following four key words: ‘climate change’, ‘ocean acidification’, ‘global warming’, and ‘sea-level rise’. The results are displayed below for the last decade (details of how Google calculates these statistics are available here: https://support.google.com/trends/answer/4355164?hl=en). 


First let’s talk about what has happened to the ‘popularity’ of these searches over the long-term, ie. between 2004 and 2014. Searches for ‘climate change’ were fairly static between 2004 and 2008 and have since increased in popularity. ‘Global warming’ was similarly stable but then fell. In the past we think that ‘global warming’ was the more commonly used term to describe ‘anthropogenic heating’. As the wider consequences of increasing temperatures for the entire global weather and climate system became increasingly more recognized, however, it became clear that a more general term was needed and hence, ‘climate-change’.

Searches for ‘ocean-acidification (OA)’ – a product of CO2 pollution – and also known as the ‘ugly twin’ of climate-change have not changed substantially over the last decade or so, apart from a massive spike in mid-2013. My pal Prof. Hall-Spencer in Plymouth suggested the spike might be due to interest around this article (http://www.nbcnews.com/video/ann-curry-reports/54882960#54882960), and/or the fact that ocean acidification started getting discussed a lot in Washington DC in mid 2012. Furthermore some charitable organizations advertised research funding calls for OA research at that time.

Interest in sea-level rise (SLR) started to increase in 2011 and we have no idea why. Suggestions welcome!

The time-series displayed here are all available at weekly resolution. What particularly interests us in our time-series work is seasonality. Both the ocean-acidification and sea-level rise series are strongly seasonal with a single peak each year (unimodal) occurring between June and July. 

What causes this seasonality? Is it related to educational cycles in the northern hemisphere dictating when students undertake science related projects? Are there more climate change conferences in the northern hemisphere summer? Or perhaps it’s because there’s less football on TV distracting people from more pressing concerns? 

The other two series (‘climate change and ‘global warming’) are also seasonal but are different in having two peaks (bimodal). This is difficult to see in the plot above so we plotted the same data on another graph for a shorter time-period (2010-2014 below) which shows clearly that searches for climate-change most popular in March and October each year with troughs in January and July ? Does anyone have any idea why ? 


Could it be a confounding between seasonal educational cycles in the northern hemispheres (NH) and southern hemispheres (SH)? The argument might go like this: (i) Numbers of internet searches are far greater in developed countries; (ii) ‘The majority of developed countries are in the northern hemisphere; (iii) ‘Climate-change’ and ‘global-warming’ are much more popularly searched than the other two terms; (iv) The overwhelming predominance of northern-hemisphere searches somehow ‘drowns-out’ any seasonal signal from the south; and (v) The bimodal seasonal signals we see for OA and SLR are a combination of separate unimodal cycles from the NH and SHs. 

Clearly we need to subset and examine these data by location. This will form the subject of future blog articles.

In the meantime please feel free to suggest any alternative explanations.

As usual the R-code describing the extraction and plotting of the data is outlined below.

# Load required libraries
 
library(gtrend)
library(dplyr)
library(ggplot2)
library(scales)
library(ggthemes)
 
# Define terms to use for trend searches
 
terms <- c("Climate Change","Ocean Acidification","Sea Level Rise","Global Warming")
 
out <- gtrend_scraper("youremail@gmail.com", "yourpassword", terms) ## replace with your own google username and password
 
out %>%  trend2long() %>% plot() 
 
# Get plot of trends
 
a <- out %>%
     trend2long() %>%
     ggplot(aes(x=start, y=trend, color=term)) +
     xlab("\nYear") + ylab("Trend\n")+
     geom_line() + theme_stata()+
     facet_wrap(~term) + ggtitle("Phrase Search Trends on Google\n")+
     guides(color=FALSE)
 
a
 
# Save file to png
 
ggsave(a,file="GoogleTrend_Climate.png",dpi=500,w=10,h=6,unit="in",type="cairo-png")
 
 
# Extract just trends of 'climate change' search for years 2010 to 2014 and plot
# to observe seasonality
 
 
dat <- out[[1]][["trend"]]
colnames(dat)[3] <- "trend"
 
dat2 <- dat[dat[["start"]] > as.Date("2010-01-01"),]
 
rects <- dat2  %>% 
mutate(year=format(as.Date(start), "%y")) %>%
group_by(year) %>%
summarize(xstart = as.Date(min(start)), xend = as.Date(max(end)))
 
c <- ggplot() +
     geom_rect(data = rects, aes(xmin = xstart, xmax = xend, ymin = -Inf, 
     ymax = Inf, fill = factor(year)), alpha = 0.4) + theme_bw()+
     ylab ("Trend\n") + xlab("\nDate")+
     ggtitle("Search Trends of 'Climate Change' on Google (2010-2014)\n")+
     geom_line(data=dat2, aes(x=start, y=trend), size=0.75,colour="grey20",type="dashed") + 
     geom_point(data=dat2, aes(x=start, y=trend), size=1.25,colour="grey20") + 
     scale_x_date(labels = date_format("%b-%Y"), 
     breaks = date_breaks("3 months"),
     expand = c(0,0), 
     limits = c(as.Date("2010-01-01"), as.Date("2014-12-31"))) +
     stat_smooth(data=dat2,aes(x=start, y=trend), col="blue",method = "loess",span=0.1,se=T,size=1,linetype='twodash',alpha=0.5,fill="grey60")+
     scale_fill_discrete(guide = FALSE)+
     theme(axis.text.x = element_text(angle = -30, hjust =0,size=10),
     plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
     panel.border = element_rect(colour = "black",fill=F,size=0.5),
     panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
     panel.grid.minor = element_blank(),
     axis.title.y=element_text(size=10,colour="grey20"),
     axis.title.x=element_text(size=10,colour="grey20"),
     axis.text.y=element_text(size=10,colour="grey20"),
     panel.background = element_rect(fill = NA,colour = "black"))
 
c
 
# Save plot to file
 
ggsave(c,file="GoogleTrend_ClimateChange2010-14.png",dpi=500,w=10,h=6,unit="in",type="cairo-png")
Created by Pretty R at inside-R.org