This blog is run by Jason Jon Benedict and Doug Beare to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world. Jason currently works as a geospatial professional based in Malaysia. Doug lives in the United Kingdom and is currently Director of Globefish Consultancy Services which provides scientific advice to organisations that currently include the STECF [Scientific, Technical and Economic Committee for Fisheries, https://stecf.jrc.europe.eu/] and ICCAT, https://www.iccat.int/en/

Wednesday 26 November 2014

Trending climate-related keywords with GTrendsR





Key points of post

  • Google stores data on millions of internet searches throughout the world and these data are publicly available. 
  • Google trends (http://www.google.com/trends/?hl=en-GB)’ is a fascinating system that summarizes information from all these searches revealing what subjects are ‘trending’ worldwide.
  • Conveniently R has a library (GTrendsR) capable of extracting, analyzing and plotting these data. 
  • Here we explore time series data (2004-2014) summarizing Google searches for the keywords, ‘climate-change’, ‘global-warming’, ‘ocean-acidification’ and ‘sea-level rise’
  • All four series have trends and have either unimodal or bimodal seasonal cycles.
The other day Jason decided to play around (nerdy I know) with R’s relatively new GTrendR library and analyze the time-series statistics for the following four key words: ‘climate change’, ‘ocean acidification’, ‘global warming’, and ‘sea-level rise’. The results are displayed below for the last decade (details of how Google calculates these statistics are available here: https://support.google.com/trends/answer/4355164?hl=en). 


First let’s talk about what has happened to the ‘popularity’ of these searches over the long-term, ie. between 2004 and 2014. Searches for ‘climate change’ were fairly static between 2004 and 2008 and have since increased in popularity. ‘Global warming’ was similarly stable but then fell. In the past we think that ‘global warming’ was the more commonly used term to describe ‘anthropogenic heating’. As the wider consequences of increasing temperatures for the entire global weather and climate system became increasingly more recognized, however, it became clear that a more general term was needed and hence, ‘climate-change’.

Searches for ‘ocean-acidification (OA)’ – a product of CO2 pollution – and also known as the ‘ugly twin’ of climate-change have not changed substantially over the last decade or so, apart from a massive spike in mid-2013. My pal Prof. Hall-Spencer in Plymouth suggested the spike might be due to interest around this article (http://www.nbcnews.com/video/ann-curry-reports/54882960#54882960), and/or the fact that ocean acidification started getting discussed a lot in Washington DC in mid 2012. Furthermore some charitable organizations advertised research funding calls for OA research at that time.

Interest in sea-level rise (SLR) started to increase in 2011 and we have no idea why. Suggestions welcome!

The time-series displayed here are all available at weekly resolution. What particularly interests us in our time-series work is seasonality. Both the ocean-acidification and sea-level rise series are strongly seasonal with a single peak each year (unimodal) occurring between June and July. 

What causes this seasonality? Is it related to educational cycles in the northern hemisphere dictating when students undertake science related projects? Are there more climate change conferences in the northern hemisphere summer? Or perhaps it’s because there’s less football on TV distracting people from more pressing concerns? 

The other two series (‘climate change and ‘global warming’) are also seasonal but are different in having two peaks (bimodal). This is difficult to see in the plot above so we plotted the same data on another graph for a shorter time-period (2010-2014 below) which shows clearly that searches for climate-change most popular in March and October each year with troughs in January and July ? Does anyone have any idea why ? 


Could it be a confounding between seasonal educational cycles in the northern hemispheres (NH) and southern hemispheres (SH)? The argument might go like this: (i) Numbers of internet searches are far greater in developed countries; (ii) ‘The majority of developed countries are in the northern hemisphere; (iii) ‘Climate-change’ and ‘global-warming’ are much more popularly searched than the other two terms; (iv) The overwhelming predominance of northern-hemisphere searches somehow ‘drowns-out’ any seasonal signal from the south; and (v) The bimodal seasonal signals we see for OA and SLR are a combination of separate unimodal cycles from the NH and SHs. 

Clearly we need to subset and examine these data by location. This will form the subject of future blog articles.

In the meantime please feel free to suggest any alternative explanations.

As usual the R-code describing the extraction and plotting of the data is outlined below.

# Load required libraries
 
library(gtrend)
library(dplyr)
library(ggplot2)
library(scales)
library(ggthemes)
 
# Define terms to use for trend searches
 
terms <- c("Climate Change","Ocean Acidification","Sea Level Rise","Global Warming")
 
out <- gtrend_scraper("youremail@gmail.com", "yourpassword", terms) ## replace with your own google username and password
 
out %>%  trend2long() %>% plot() 
 
# Get plot of trends
 
a <- out %>%
     trend2long() %>%
     ggplot(aes(x=start, y=trend, color=term)) +
     xlab("\nYear") + ylab("Trend\n")+
     geom_line() + theme_stata()+
     facet_wrap(~term) + ggtitle("Phrase Search Trends on Google\n")+
     guides(color=FALSE)
 
a
 
# Save file to png
 
ggsave(a,file="GoogleTrend_Climate.png",dpi=500,w=10,h=6,unit="in",type="cairo-png")
 
 
# Extract just trends of 'climate change' search for years 2010 to 2014 and plot
# to observe seasonality
 
 
dat <- out[[1]][["trend"]]
colnames(dat)[3] <- "trend"
 
dat2 <- dat[dat[["start"]] > as.Date("2010-01-01"),]
 
rects <- dat2  %>% 
mutate(year=format(as.Date(start), "%y")) %>%
group_by(year) %>%
summarize(xstart = as.Date(min(start)), xend = as.Date(max(end)))
 
c <- ggplot() +
     geom_rect(data = rects, aes(xmin = xstart, xmax = xend, ymin = -Inf, 
     ymax = Inf, fill = factor(year)), alpha = 0.4) + theme_bw()+
     ylab ("Trend\n") + xlab("\nDate")+
     ggtitle("Search Trends of 'Climate Change' on Google (2010-2014)\n")+
     geom_line(data=dat2, aes(x=start, y=trend), size=0.75,colour="grey20",type="dashed") + 
     geom_point(data=dat2, aes(x=start, y=trend), size=1.25,colour="grey20") + 
     scale_x_date(labels = date_format("%b-%Y"), 
     breaks = date_breaks("3 months"),
     expand = c(0,0), 
     limits = c(as.Date("2010-01-01"), as.Date("2014-12-31"))) +
     stat_smooth(data=dat2,aes(x=start, y=trend), col="blue",method = "loess",span=0.1,se=T,size=1,linetype='twodash',alpha=0.5,fill="grey60")+
     scale_fill_discrete(guide = FALSE)+
     theme(axis.text.x = element_text(angle = -30, hjust =0,size=10),
     plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
     panel.border = element_rect(colour = "black",fill=F,size=0.5),
     panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
     panel.grid.minor = element_blank(),
     axis.title.y=element_text(size=10,colour="grey20"),
     axis.title.x=element_text(size=10,colour="grey20"),
     axis.text.y=element_text(size=10,colour="grey20"),
     panel.background = element_rect(fill = NA,colour = "black"))
 
c
 
# Save plot to file
 
ggsave(c,file="GoogleTrend_ClimateChange2010-14.png",dpi=500,w=10,h=6,unit="in",type="cairo-png")
Created by Pretty R at inside-R.org

Saturday 8 November 2014

Come hell with high water - Sea level rises at two stations in Bangladesh

Key points of post
  • Bangladesh is low-lying
  • Many people live along its river and coasts
  • These communities are vulnerable to rises in sea-level which is predicted to occur as a consequence of climate change

Coastal Bangladesh is extremely low-lying and its coastal communities in particular are vulnerable from any future rises in sea level which can cause flooding. Jason and I wondered, therefore, if there were any time-series on sea-level for Bangladesh available, and what information they might contain.

Via Google we discovered the University of Hawaii Sea Level Centre, which monitors seawater levels at stations throughout the world, exploiting its network and relationships with a range of local partners.  In this case it’s the Bangladesh Inland Water Transport Authority who provided the original data.


Here we plot average daily sea level available on the UHSLC site for two stations (please map below) at Charchanga (at the mouth of the Meghna River) and Hiron Point (in the Sundarbans). [Note there are data for 5 other stations in Bangladesh but the time-series data available were, unfortunately, too short a duration]. The data are also color-coded to correspond to the six seasons of Bangladesh (winter, spring, summer, monsoon, autumn, & late autumn). 

MapID1f246f024be6
Data: stations.google • Chart ID: MapID1f246f024be6googleVis-0.5.6
R version 3.1.2 (2014-10-31) • Google Terms of UseDocumentation and Data Policy

The first interesting feature, noted at both locations, is the pronounced seasonality with peaks in sea-level being recorded in the  monsoon, autumn and late autumn periods.  Winter and spring correspond to lowest average daily water levels at both locations.  The seasonal variation in sea-level is due primarily to the effect of the Monsoon; ie. the south-westerly winds tend to cause the seas to ‘pile-up’ along the coast of Bangladesh, while the flow from the major rivers pouring through the delta into the Bay of Bengal increases substantially also as a result of the Monsoon. 

The most worrying aspect of these data is the substantial change in long-term trend summarized here with a simple linear model (red line).  At Charchanga average daily sea-levels recorded rose by approximately 7 mm's per year between 1980 and 2000 while at Hiron Point the figure was approximately 5 mm's per year (1977 to 2003). Unfortunately more recent data were unavailable from the Sea Level Center and we are currently unsure if these trends are continuing. But please if you know of more data for Bangladesh let us know.

Certain aspects of the series are, however, rather curious and it’s not a straightforward story of increasing trends.  At Hiron Point, for example, extreme sea-level events (say above 2600 mm's) were relatively frequently between 1980 and 1995 whereas, since 1996 they have been observed less often. Contrast this with Charchanga where there were some exceptional peaks recorded in 1996, ’97, and ’98. 

All such changes could have profound implications for the livelihoods of coastal Bangladeshis.

Details of how to obtain the data and plot the graphs here in R are demonstrated below.  

As mentioned earlier in the post, the data was obtained from the University of Hawaii Sea Level Centre and we have used the research quality dataset which can be found at this link - http://uhslc.soest.hawaii.edu/data/download/rq

You can download the dataset in various formats which include csv and NetCDF and the associated metadata for each station is also there for you to check of any inconsistencies in the data collection or station relocation and instrumentation issues. The data can also be downloaded at two different time scales, which is hourly and daily. 

# Load required libraries
 
library(ggplot2)
library(scales)
library(grid)
library(plyr)
library(lubridate)
library(zoo)
 
# Set working directory
 
setwd("D:/ClimData/SeaLevel")
 
# Read csv file
 
sl<-read.csv("rqd0138a.csv",header=FALSE)
 
# Rename columns
 
colnames(sl)<-c("year","month","day","sl_mm")
 
# Format date columns
 
sl$date <- as.Date(paste(sl$year,sl$month,sl$day),format="%Y%m%d")
sl$month <- as.numeric(format(sl$date,"%m"))
sl$year <- as.numeric(format(sl$date,"%Y"))
sl$monthf <- factor(sl$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
sl$mday <- strptime(sl$date, "%Y-%m-%d")$mday
sl$jday <- strptime(sl$date, "%Y-%m-%d")$yday+1
sl$daymth <- as.character(paste(sl$month,sl$day,sep="-"))
sl$daymth <-as.Date(sl$daymth,format="%m-%d")
 
# Classify data into seasons
 
sl$season <- "Season"
 
sl$season[sl$month == 1 & sl$mday >= 1 |  sl$month == 2 & sl$mday <= 13| sl$month == 1]<-'Winter'
sl$season[sl$month == 2 & sl$mday >= 14 |  sl$month == 4 & sl$mday <= 14 | sl$month == 3]<-'Spring'
sl$season[sl$month == 4 & sl$mday >= 15 |  sl$month == 6 & sl$mday <= 14 | sl$month == 5]<-'Summer'
sl$season[sl$month == 6 & sl$mday >= 15 |  sl$month == 8 & sl$mday <= 17 | sl$month == 7]<-'Monsoon'
sl$season[sl$month == 8 & sl$mday >= 18 |  sl$month == 10 & sl$mday <= 18| sl$month == 9]<-'Autumn'
sl$season[sl$month == 10 & sl$mday >= 19 |  sl$month == 12 & sl$mday <= 16| sl$month == 11]<-'Late Autumn'
sl$season[sl$month == 12 & sl$mday >= 17 |  sl$month == 12 & sl$mday <= 31| sl$month == 1]<-'Winter'
 
sl$season = factor(sl$season, c("Winter", "Spring", "Summer", "Monsoon","Autumn","Late Autumn"))
 
## Plot Sea Level
 
hp_sl <-   ggplot(sl, aes(date, sl_mm,colour=season))+
           #geom_line(size=0.5)+
           geom_point(shape=5,size=1)+
           geom_smooth(method="lm",size=0.5,col="red")+
           scale_x_date(name="\n\n\n Source: University of Hawaii Sea Level Centre / Bangladesh Inland Water Transport Authority (BIWTA) - 2014",labels=date_format("%Y"),breaks = date_breaks("2 years"))+
           ylab("Milimetres (mm)\n")+
           xlab("\nYear")+
           theme_bw()+
           ggtitle("Sea Level at Charchanga - Bangladesh (1980-2000)\n")+
           theme(plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
           panel.border = element_rect(colour = "black",fill=F,size=1),
           panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
           panel.grid.minor = element_blank(),
           axis.title.y=element_text(size=11,colour="grey20"),
           axis.title.x=element_text(size=9,colour="grey20"),
           panel.background = element_rect(fill = NA,colour = "black"))
 
hp_sl
 
# Get gradient and add to plot
 
m <- lm(sl_mm~year, data=sl )
ms <- summary(m)
 
slope <- coef(m)[2]
lg <- list(slope = format(slope, digits=3))
eq <- substitute(italic(Gradient)==slope,lg)
eqstr <-as.character(paste(as.expression(eq),"/year"))
hp_sl <- hp_sl + annotate(geom="text",as.Date(-Inf, origin = '1970-01-01'), y = Inf, 
         hjust = -0.1, vjust = 2, label = eqstr,parse = TRUE,size=3)
 
hp_sl
 
# Save plot to png
 
ggsave(hp_sl, file="Charchanga_SeaLevel_Plot_Seasons.png", width=10, height=6,dpi=400,unit="in",type="cairo")
 
# Code to produce html code of embedded sea level stations map using googleVis
 
# Load libraries
 
library(RCurl)
library(XML)
library(leafletR)
library(googleVis)
 
# Convert html table into data frame
 
theurl <- "http://uhslc.soest.hawaii.edu/data/download/rq"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
 
tbl <- tables[[which.max(n.rows)]]
 
bgd.tbl <- subset(tbl, Country =="Bangladesh")
 
bgd.tbl$Latitude <- as.numeric(levels(bgd.tbl$Latitude)[bgd.tbl$Latitude])
bgd.tbl$Longitude <- as.numeric(levels(bgd.tbl$Longitude)[bgd.tbl$Longitude])
 
google.location <- paste(bgd.tbl$Latitude, bgd.tbl$Longitude, sep = ":")
stations.google <- data.frame(bgd.tbl, google.location)
 
# Plot map
 
map <- gvisMap(data = stations.google, locationvar = "google.location",tipvar = "Location",
       options=list(showTip=TRUE, enableScrollWheel=TRUE,mapType='terrain', useMapTypeControl=TRUE,width=100,height=400,
       icons=paste0("{","'default': {'normal': 'http://i.imgur.com/f3q6Oaj.gif',\n",
       "'selected': 'http://i.imgur.com/f3q6Oaj.gif'","}}")))
 
plot(map)
Created by Pretty R at inside-R.org