This blog is run by Jason Benedict and Doug Beare who are based in Penang, Malaysia to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world; with particular emphasis on developing countries. The work featured here was undertaken as part of the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS) -

However, the views presented here are of the authors only, and do not reflect the official position of CCAFS, CGIAR, its members, partners or donors.

Wednesday, 26 November 2014

Trending climate-related keywords with GTrendsR

Key points of post

  • Google stores data on millions of internet searches throughout the world and these data are publicly available. 
  • Google trends (’ is a fascinating system that summarizes information from all these searches revealing what subjects are ‘trending’ worldwide.
  • Conveniently R has a library (GTrendsR) capable of extracting, analyzing and plotting these data. 
  • Here we explore time series data (2004-2014) summarizing Google searches for the keywords, ‘climate-change’, ‘global-warming’, ‘ocean-acidification’ and ‘sea-level rise’
  • All four series have trends and have either unimodal or bimodal seasonal cycles.
The other day Jason decided to play around (nerdy I know) with R’s relatively new GTrendR library and analyze the time-series statistics for the following four key words: ‘climate change’, ‘ocean acidification’, ‘global warming’, and ‘sea-level rise’. The results are displayed below for the last decade (details of how Google calculates these statistics are available here: 

First let’s talk about what has happened to the ‘popularity’ of these searches over the long-term, ie. between 2004 and 2014. Searches for ‘climate change’ were fairly static between 2004 and 2008 and have since increased in popularity. ‘Global warming’ was similarly stable but then fell. In the past we think that ‘global warming’ was the more commonly used term to describe ‘anthropogenic heating’. As the wider consequences of increasing temperatures for the entire global weather and climate system became increasingly more recognized, however, it became clear that a more general term was needed and hence, ‘climate-change’.

Searches for ‘ocean-acidification (OA)’ – a product of CO2 pollution – and also known as the ‘ugly twin’ of climate-change have not changed substantially over the last decade or so, apart from a massive spike in mid-2013. My pal Prof. Hall-Spencer in Plymouth suggested the spike might be due to interest around this article (, and/or the fact that ocean acidification started getting discussed a lot in Washington DC in mid 2012. Furthermore some charitable organizations advertised research funding calls for OA research at that time.

Interest in sea-level rise (SLR) started to increase in 2011 and we have no idea why. Suggestions welcome!

The time-series displayed here are all available at weekly resolution. What particularly interests us in our time-series work is seasonality. Both the ocean-acidification and sea-level rise series are strongly seasonal with a single peak each year (unimodal) occurring between June and July. 

What causes this seasonality? Is it related to educational cycles in the northern hemisphere dictating when students undertake science related projects? Are there more climate change conferences in the northern hemisphere summer? Or perhaps it’s because there’s less football on TV distracting people from more pressing concerns? 

The other two series (‘climate change and ‘global warming’) are also seasonal but are different in having two peaks (bimodal). This is difficult to see in the plot above so we plotted the same data on another graph for a shorter time-period (2010-2014 below) which shows clearly that searches for climate-change most popular in March and October each year with troughs in January and July ? Does anyone have any idea why ? 

Could it be a confounding between seasonal educational cycles in the northern hemispheres (NH) and southern hemispheres (SH)? The argument might go like this: (i) Numbers of internet searches are far greater in developed countries; (ii) ‘The majority of developed countries are in the northern hemisphere; (iii) ‘Climate-change’ and ‘global-warming’ are much more popularly searched than the other two terms; (iv) The overwhelming predominance of northern-hemisphere searches somehow ‘drowns-out’ any seasonal signal from the south; and (v) The bimodal seasonal signals we see for OA and SLR are a combination of separate unimodal cycles from the NH and SHs. 

Clearly we need to subset and examine these data by location. This will form the subject of future blog articles.

In the meantime please feel free to suggest any alternative explanations.

As usual the R-code describing the extraction and plotting of the data is outlined below.

# Load required libraries
# Define terms to use for trend searches
terms <- c("Climate Change","Ocean Acidification","Sea Level Rise","Global Warming")
out <- gtrend_scraper("", "yourpassword", terms) ## replace with your own google username and password
out %>%  trend2long() %>% plot() 
# Get plot of trends
a <- out %>%
     trend2long() %>%
     ggplot(aes(x=start, y=trend, color=term)) +
     xlab("\nYear") + ylab("Trend\n")+
     geom_line() + theme_stata()+
     facet_wrap(~term) + ggtitle("Phrase Search Trends on Google\n")+
# Save file to png
# Extract just trends of 'climate change' search for years 2010 to 2014 and plot
# to observe seasonality
dat <- out[[1]][["trend"]]
colnames(dat)[3] <- "trend"
dat2 <- dat[dat[["start"]] > as.Date("2010-01-01"),]
rects <- dat2  %>% 
mutate(year=format(as.Date(start), "%y")) %>%
group_by(year) %>%
summarize(xstart = as.Date(min(start)), xend = as.Date(max(end)))
c <- ggplot() +
     geom_rect(data = rects, aes(xmin = xstart, xmax = xend, ymin = -Inf, 
     ymax = Inf, fill = factor(year)), alpha = 0.4) + theme_bw()+
     ylab ("Trend\n") + xlab("\nDate")+
     ggtitle("Search Trends of 'Climate Change' on Google (2010-2014)\n")+
     geom_line(data=dat2, aes(x=start, y=trend), size=0.75,colour="grey20",type="dashed") + 
     geom_point(data=dat2, aes(x=start, y=trend), size=1.25,colour="grey20") + 
     scale_x_date(labels = date_format("%b-%Y"), 
     breaks = date_breaks("3 months"),
     expand = c(0,0), 
     limits = c(as.Date("2010-01-01"), as.Date("2014-12-31"))) +
     stat_smooth(data=dat2,aes(x=start, y=trend), col="blue",method = "loess",span=0.1,se=T,size=1,linetype='twodash',alpha=0.5,fill="grey60")+
     scale_fill_discrete(guide = FALSE)+
     theme(axis.text.x = element_text(angle = -30, hjust =0,size=10),
     plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
     panel.border = element_rect(colour = "black",fill=F,size=0.5),
     panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
     panel.grid.minor = element_blank(),
     panel.background = element_rect(fill = NA,colour = "black"))
# Save plot to file
Created by Pretty R at

Saturday, 8 November 2014

Come hell with high water - Sea level rises at two stations in Bangladesh

Key points of post
  • Bangladesh is low-lying
  • Many people live along its river and coasts
  • These communities are vulnerable to rises in sea-level which is predicted to occur as a consequence of climate change

Coastal Bangladesh is extremely low-lying and its coastal communities in particular are vulnerable from any future rises in sea level which can cause flooding. Jason and I wondered, therefore, if there were any time-series on sea-level for Bangladesh available, and what information they might contain.

Via Google we discovered the University of Hawaii Sea Level Centre, which monitors seawater levels at stations throughout the world, exploiting its network and relationships with a range of local partners.  In this case it’s the Bangladesh Inland Water Transport Authority who provided the original data.

Here we plot average daily sea level available on the UHSLC site for two stations (please map below) at Charchanga (at the mouth of the Meghna River) and Hiron Point (in the Sundarbans). [Note there are data for 5 other stations in Bangladesh but the time-series data available were, unfortunately, too short a duration]. The data are also color-coded to correspond to the six seasons of Bangladesh (winter, spring, summer, monsoon, autumn, & late autumn). 

Data: • Chart ID: MapID1f246f024be6googleVis-0.5.6
R version 3.1.2 (2014-10-31) • Google Terms of UseDocumentation and Data Policy

The first interesting feature, noted at both locations, is the pronounced seasonality with peaks in sea-level being recorded in the  monsoon, autumn and late autumn periods.  Winter and spring correspond to lowest average daily water levels at both locations.  The seasonal variation in sea-level is due primarily to the effect of the Monsoon; ie. the south-westerly winds tend to cause the seas to ‘pile-up’ along the coast of Bangladesh, while the flow from the major rivers pouring through the delta into the Bay of Bengal increases substantially also as a result of the Monsoon. 

The most worrying aspect of these data is the substantial change in long-term trend summarized here with a simple linear model (red line).  At Charchanga average daily sea-levels recorded rose by approximately 7 mm's per year between 1980 and 2000 while at Hiron Point the figure was approximately 5 mm's per year (1977 to 2003). Unfortunately more recent data were unavailable from the Sea Level Center and we are currently unsure if these trends are continuing. But please if you know of more data for Bangladesh let us know.

Certain aspects of the series are, however, rather curious and it’s not a straightforward story of increasing trends.  At Hiron Point, for example, extreme sea-level events (say above 2600 mm's) were relatively frequently between 1980 and 1995 whereas, since 1996 they have been observed less often. Contrast this with Charchanga where there were some exceptional peaks recorded in 1996, ’97, and ’98. 

All such changes could have profound implications for the livelihoods of coastal Bangladeshis.

Details of how to obtain the data and plot the graphs here in R are demonstrated below.  

As mentioned earlier in the post, the data was obtained from the University of Hawaii Sea Level Centre and we have used the research quality dataset which can be found at this link -

You can download the dataset in various formats which include csv and NetCDF and the associated metadata for each station is also there for you to check of any inconsistencies in the data collection or station relocation and instrumentation issues. The data can also be downloaded at two different time scales, which is hourly and daily. 

# Load required libraries
# Set working directory
# Read csv file
# Rename columns
# Format date columns
sl$date <- as.Date(paste(sl$year,sl$month,sl$day),format="%Y%m%d")
sl$month <- as.numeric(format(sl$date,"%m"))
sl$year <- as.numeric(format(sl$date,"%Y"))
sl$monthf <- factor(sl$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
sl$mday <- strptime(sl$date, "%Y-%m-%d")$mday
sl$jday <- strptime(sl$date, "%Y-%m-%d")$yday+1
sl$daymth <- as.character(paste(sl$month,sl$day,sep="-"))
sl$daymth <-as.Date(sl$daymth,format="%m-%d")
# Classify data into seasons
sl$season <- "Season"
sl$season[sl$month == 1 & sl$mday >= 1 |  sl$month == 2 & sl$mday <= 13| sl$month == 1]<-'Winter'
sl$season[sl$month == 2 & sl$mday >= 14 |  sl$month == 4 & sl$mday <= 14 | sl$month == 3]<-'Spring'
sl$season[sl$month == 4 & sl$mday >= 15 |  sl$month == 6 & sl$mday <= 14 | sl$month == 5]<-'Summer'
sl$season[sl$month == 6 & sl$mday >= 15 |  sl$month == 8 & sl$mday <= 17 | sl$month == 7]<-'Monsoon'
sl$season[sl$month == 8 & sl$mday >= 18 |  sl$month == 10 & sl$mday <= 18| sl$month == 9]<-'Autumn'
sl$season[sl$month == 10 & sl$mday >= 19 |  sl$month == 12 & sl$mday <= 16| sl$month == 11]<-'Late Autumn'
sl$season[sl$month == 12 & sl$mday >= 17 |  sl$month == 12 & sl$mday <= 31| sl$month == 1]<-'Winter'
sl$season = factor(sl$season, c("Winter", "Spring", "Summer", "Monsoon","Autumn","Late Autumn"))
## Plot Sea Level
hp_sl <-   ggplot(sl, aes(date, sl_mm,colour=season))+
           scale_x_date(name="\n\n\n Source: University of Hawaii Sea Level Centre / Bangladesh Inland Water Transport Authority (BIWTA) - 2014",labels=date_format("%Y"),breaks = date_breaks("2 years"))+
           ylab("Milimetres (mm)\n")+
           ggtitle("Sea Level at Charchanga - Bangladesh (1980-2000)\n")+
           theme(plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
           panel.border = element_rect(colour = "black",fill=F,size=1),
           panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
           panel.grid.minor = element_blank(),
           panel.background = element_rect(fill = NA,colour = "black"))
# Get gradient and add to plot
m <- lm(sl_mm~year, data=sl )
ms <- summary(m)
slope <- coef(m)[2]
lg <- list(slope = format(slope, digits=3))
eq <- substitute(italic(Gradient)==slope,lg)
eqstr <-as.character(paste(as.expression(eq),"/year"))
hp_sl <- hp_sl + annotate(geom="text",as.Date(-Inf, origin = '1970-01-01'), y = Inf, 
         hjust = -0.1, vjust = 2, label = eqstr,parse = TRUE,size=3)
# Save plot to png
ggsave(hp_sl, file="Charchanga_SeaLevel_Plot_Seasons.png", width=10, height=6,dpi=400,unit="in",type="cairo")
# Code to produce html code of embedded sea level stations map using googleVis
# Load libraries
# Convert html table into data frame
theurl <- ""
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
tbl <- tables[[which.max(n.rows)]]
bgd.tbl <- subset(tbl, Country =="Bangladesh")
bgd.tbl$Latitude <- as.numeric(levels(bgd.tbl$Latitude)[bgd.tbl$Latitude])
bgd.tbl$Longitude <- as.numeric(levels(bgd.tbl$Longitude)[bgd.tbl$Longitude])
google.location <- paste(bgd.tbl$Latitude, bgd.tbl$Longitude, sep = ":") <- data.frame(bgd.tbl, google.location)
# Plot map
map <- gvisMap(data =, locationvar = "google.location",tipvar = "Location",
       options=list(showTip=TRUE, enableScrollWheel=TRUE,mapType='terrain', useMapTypeControl=TRUE,width=100,height=400,
       icons=paste0("{","'default': {'normal': '',\n",
       "'selected': ''","}}")))
Created by Pretty R at

Monday, 20 October 2014

Rainfall event frequencies recorded by the Worldfish PWS in September 2014

Key points of post

·         Wednesday and Saturdays were far wetter.
·         Far more rain fell between 6 and 6.59 am and between 11 and 11.59pm

It has been just over a month and a half since we got the Weather Station here at WorldFish up and running and we thought it would fun to have a quick look at our first complete month of data, ie. data collected at 5 minute intervals for all of September (2014) typically one of the wettest months in Penang. Remember that a couple of months ago we looked at the frequency distribution of rainfall events by hour of day based on data from Bayan Lepas International airport.

After aggregating of the rainfall data, we looked at the frequency distribution of rainfall events by hour of day, and also by the day of week and we noticed some features we thought were worth sharing. One might be forgiven for assuming that there would not be too much variation in the amount of rain that falls either during a specific hour of the day, or a specific day of the week. However, the rainfall data captured here at the WorldFish office at Batu Maung show that certain times of  day and days of the week saw significantly more rainfall. In September 2014, we saw much more rain between 6 and 6.59am (65.9 mm) and between 11 and 11.59pm (48.5 mm). The rain falling at these times contributed over  30% of the rainfall for that entire month (354.5 mm) with 19% of the total monthly rainfall falling between 6 and 6.59am. Bizarrely it was also much more likely to rain on Wednesdays (102.6mm) and Saturdays (106.2mm's). To put this into perspective over 60% of September 2014’s  rain fell on either Saturday or Wednesday!

The plots also show that there was very little rainfall on Fridays during September 2014 (5.6 mm). Interestingly north-east Penang Island experienced a once-in-40-year rainfall event recently (137 mm's in a few hours) on FRIDAY night (3 October 2014). At the same time, however, very little rain was recorded at our PWS at Worldfish. Rainfall events can thus be extremely localized indicating the importance of establishing well-distributed and densely placed weather stations for studying weather and climate.

Granted, the rainfall data we looked at only covers a short period of time and the large variations could be pure coincidence. Conversely, these could also indicate consistent patterns or trends with scientific explanations possible (lunar periodicity, seasonal effects?). We intend to keep an eye on these features of the data and it will be interesting to see what emerges when time periods are examined.

As usual, the R code used the produce the plots in this post is provided below. The data is pulled from Weather Underground using the 'weatherData' package. 
## Load required libraries
# Get data for PWS using weatherData package
pws <- getWeatherForDate("IPENANGB2", "2014-09-01","2014-09-30", station_type = "id",opt_detailed=T, opt_custom_columns=T, custom_columns=c(1,2,6,7,10,13))
# Rename columns
## Adding date columns
pws$time<-as.POSIXct(pws$time1,format="%Y-%m-%d %H:%M:%S",tz="Australia/Perth")
pws$year <- as.numeric(format(pws$time,"%Y"))
pws$date <-as.Date(pws$time,format="%Y-%m-%d",tz="Australia/Perth")
pws$year <- as.numeric(as.POSIXlt(pws$date)$year+1900)
pws$month <- as.numeric(as.POSIXlt(pws$date)$mon+1)
pws$monthf <- factor(pws$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
pws$weekday <- as.POSIXlt(pws$date)$wday
pws$weekdayf <- factor(pws$weekday,levels=rev(0:6),labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")))
pws$yearmonth <- as.yearmon(pws$date)
pws$yearmonthf <- factor(pws$yearmonth)
pws$week <- as.numeric(format(as.Date(pws$date),"%W"))
pws$weekf<- factor(pws$week)
pws$hour <- as.numeric(format(strptime(pws$time, format = "%Y-%m-%d %H:%M"),format = "%H"))
pws$min <- as.numeric(format(strptime(pws$time, format = "%Y-%m-%d %H:%M"),format = "%M"))
# Remove duplicate values
# Function to compute consecutive differences 
dif <- function(x) c(diff(x), NA) 
# Apply the function to each individual 
pws<-ddply(pws, 'jday', transform, actrain = dif(rain)) 
pws$actrain[pws$actrain<0]<- 0
pws$actrain[$actrain)] <- 0
# Summarize rainfall data by hour and by day
rd <- ddply(pws,.(weekday,weekdayf),summarize, raintot = sum(actrain,na.rm=T))
rh <- ddply(pws,.(hour),summarize, raintot = sum(actrain,na.rm=T))
# Plot rainfall by hour
h <-  ggplot(rh,aes(hour,raintot,fill=raintot))+geom_bar(stat="identity")+
      geom_text(aes(label=round(raintot,3)), vjust = -0.4, size = 2.5)+
      ggtitle("Total rainfall by hour for September 2014 - WorldFish Weather Station\n")+
      scale_fill_continuous(low='grey90', high='steelblue') + theme_bw()+
      xlab("\nHour of Day")+
      ylab("Total Precipitation (mm)\n")+
      theme(axis.text.x  = element_text(size=10), legend.position="none",
      panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
      plot.title = element_text(lineheight=1, face="bold",size = 13, colour = "grey20"))
# Plot rainfall by Week Day
d <- ggplot(rd,aes(weekday,raintot,fill=raintot))+geom_bar(stat="identity")+
     geom_text(aes(label=round(raintot,3)), vjust = -0.4, size = 2.5)+
     ggtitle("Total rainfall by day for September 2014 - WorldFish Weather Station\n")+
     scale_fill_continuous(low='grey90', high='steelblue') + theme_bw()+
     xlab("\nDay of Week")+
     ylab("Total Precipitation (mm)\n")+
     theme(axis.text.x  = element_text(size=10), legend.position="none",
     panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
     plot.title = element_text(lineheight=1, face="bold",size = 13, colour = "grey20"))
# Save plots to png
Created by Pretty R at