This blog is run by Jason Benedict and Doug Beare who are based in Penang, Malaysia to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world; with particular emphasis on developing countries. The work featured here was undertaken as part of the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS) -

However, the views presented here are of the authors only, and do not reflect the official position of CCAFS, CGIAR, its members, partners or donors.

Monday, 13 April 2015

Boat Lights from the VIIRS: Shedding some light on night-time fishing activity in the Indo-Malayan waters

Light emitted at night due to human activities can easily be seen from space and detected by satellite. Many compelling pictures of the Earth at night have been taken revealing cities, roads, and conurbations. Light emitted at sea has, however, received less attention. Applications relating to fisheries and their management might prove particularly useful. NOAA/NASA and partners have recently developed a boat detection product derived from satellite imagery at high temporal and spatial resolution and the data can be downloaded for free from here as either kml/csv files and processed in R. 

For many years (since 1972 to be precise), global low light night imaging was collected by the US Air Force's Defence Meteorological Satellite Programmes Operational Linescan System (DMSP-OLS). While night-time lights collected from DMSP-OLS has major shortcomings, there were no reasonable alternative products for mapping night-lights of lit fishing boats. This all changed in 2011 with the launch of NASA/NOAA Suomi National Polar Partnership satellite equipped with the Visible Infrared Radiometer Suite (VIIRS) as the primary imaging sensor. You can find out many of the advantages the new VIIRS sensor has over the DMPS-OLS here. One of the most important improvements is that the 'Day/Night band' on the VIIRS that is used for night-time light detection has a detection footprint 45 times smaller than that on the DMSP, which ultimately means it can detect light sources a lot more smaller than what the DMPS-OLS could! The VIIRS also has a better dynamic range and lower detection limits which means it can identify and discern more sources of light at night with less saturation when it gets too bright with surrounding lights. These two improvements are key in enabling better detection of boat lights at night.

Commercial fishermen the world over use very bright light to attract squid and small pelagics to their nets, for example, and these are visible from space. Here we use the NOAA boat light data to animate nightly boat lights during February 2015 (cumulative for entire night) for a part of south-east Asia including Indonesia, Malaysia and the northern tip of Australia. We think that many of these lights reflect fishing activity, but more work needs to be done to detect and isolate shipping etc. A concerted fishers' knowledge/consultation process would be essential for enabling us to understand these data properly. We think it would it would be worth the effort, though, and could yield fascinating insights into fleet behavior and the species targeted.

As usual, we provide the R code below on how we produced the above animation of the boat lights detection product.

##### Code to read and produce animation of boat night light detection data from NGDC site #####
# Load required libraries
# Set working directory
# Load fonts
# Set date range for csv files to be downloaded from NGDC site
date.range <- seq.Date(from=as.Date('2015-02-01'),as.Date('2015-02-28'),by='1 day')
# Get dates into relevant format
date <- as.character(date.range)
date1 <- gsub("-","",date)
# Get urls to scrape boat light csv files off  NGDC site
urls <- sprintf((paste("",c(date1),"/VNB_npp_d",c(date1),
# Read boat light data files in csv format and bind them
boats <-,lapply(urls,read.csv))
# Extract useful columns
boats.df <- boats.csv[-c(18,19)]
# Reformat date and add other date and time columns
boats.df$Date <- as.POSIXct(boats.df$Date_Mscan, format="%Y-%m-%d %H:%M:%S",tz="GMT")
boats.df$Dates <- as.Date(boats.df$Date)
boats.df$Year <- as.numeric(as.POSIXlt(boats.df$Date)$year+1900)
boats.df$Month <- as.numeric(as.POSIXlt(boats.df$Date)$mon+1)
boats.df$Monthf <- factor(boats.df$Month,levels=as.character(1:12),
boats.df$Weekday = as.numeric(format(as.POSIXlt(boats.df$Date),"%u"))
boats.df$Yearmonth <- as.yearmon(boats.df$Date)
boats.df$Yearmonthf <- factor(boats.df$Yearmonth)
boats.df$Week <- as.numeric(format(as.POSIXlt(boats.df$Date),"%W"))
boats.df$Day <- strftime(boats.df$Date,'%A')
boats.df$Jday <- strptime(boats.df$Date, "%Y-%m-%d")$yday+1
boats.df$Hour <- as.numeric(format(strptime(boats.df$Date, format = "%Y-%m-%d %H:%M"),format = "%H"))
# Extract only detections that are classified as 'boat' from quality flag for nightboat detection algorithm
boats.df1 <- subset(boats.df,QF_Detect==1)
# Import shapefiles
world <- readOGR(dsn="D:/GeoSpatial/Data/World_Boundaries", layer="TM_WORLD_BORDERS-0.3") # World boundary shp's download from
eez <- readOGR(dsn="D:/Projects/BoatLights", layer="EEZs") # EEZ boundary shp's downloaded from
mpa <- readOGR(dsn="D:/Projects/BoatLights", layer="MPAs") # MPA boundary shp's downloaded from CT Atlas website -
# Fortify layers for ggplot mapping
world.df <- fortify(world)
eez.df <- fortify(eez)
mpa.df <- fortify(mpa)
# Set theme options for map
theme_opts <- list(theme(panel.grid.minor = element_blank(),
                         panel.grid.major = element_blank(),
                         panel.background = element_rect(fill="black"),
                         panel.border = element_rect(colour="black"),
                         legend.text = element_text(size = 8,face="bold",family="Myriad Pro"),
                         legend.position = "right",
                         axis.line = element_blank(),
                         axis.text.x = element_blank(),
                         axis.text.y = element_blank(),
                         axis.ticks = element_blank(),
                         axis.title.x = element_blank(),
                         plot.title = element_text(size=20,face="bold",family="Graph Black",hjust=0,vjust=-0.5),
                         axis.title.y = element_blank()))
# Run loop to produce boat light maps for each night in February 2015
for (i in date)
b <- ggplot(subset(boats.df1,Dates ==i),aes(Lon_DNB,Lat_DNB))+
     geom_polygon(data=mpa.df, aes(long,lat, group=group,fill="MPA"),colour=NA,alpha=0.5)+
     geom_point(color = "gold", size = 0.5, alpha = 0.5) +
     geom_polygon(data=eez.df, aes(long,lat, group=group,colour="EEZ"),fill=NA,size=0.25,linetype="dashed")+
     geom_polygon(data=world.df, aes(long,lat, group=group),fill="black",colour="darkgray",size=0.35)+
     theme_bw()+ ylab("")+ xlab("")+
     ggtitle((paste("Boat Lights detected from VIIRS on ",i,"\n"))) +
bg <- arrangeGrob(b, sub = textGrob("Source: VIIRS Boat Detection (VBD) Data (PROTOTYPE) - NGDC (2015)\n", hjust = -0.33,
      vjust=-0.3,gp = gpar(fontface = "bold", fontsize = 8, fontfamily="Tw Cen MT Condensed")))
# Set path for Image Magick convert program <- paste0(shortPathName("C:/Program Files/ImageMagick-6.9.0-Q16)/"),"convert.exe")
# Convert png files to GIF animation
files = sprintf('boatlights_%s.png', c(date))
im.convert(files, output = 'BoatLights_Feb2015.gif')
Created by Pretty R at

Friday, 6 March 2015

Will I choke? – when is the best time for outdoor activities on Penang Island

Key points of post
  • To assess whether air pollution index (API) levels in Penang during 2014 are unhealthy
  • To assess how API varies seasonally and diurnally

Last year our blog post on air quality on Penang Island was particularly popular. In that post we used a comparatively crude ‘visibility index’ collected at Penang International Airport since 2001.  The data suggested that pollution (visibility) had gotten worse since 2001 and that it tended to be worse in June and July.

We discovered recently, however, that the Malaysian Government’s Department of the Environment (DOE) collects much more detailed data on air pollution at 52 stations across the country, see  Conveniently, one of the stations is located at Universiti Sains Malaysia (USM) and we've managed to acquire this data scraped off the DOE site by Keith Rozario (nice work Keith!) using python and MySQL. He provides a nice summary of the nationwide API data from August 2013 to February 2015 and the link to all the scripts he used (on github) to scrape the data on his site - please see He was also generous enough to make a zipped copy of the dataset available for the public to use under the Creative Commons Attribution 4.0 license.

Inspired by an analysis ( by ‘Typhoon Tony’ from Taiwan, we sought to investigate the magnitude of air pollution on Penang Island, and how it varies seasonally and diurnally

Air pollution is typically quantified using the air pollution index (API) which is calculated from the following five major air pollutants: sulphur dioxide, nitrogen dioxide, carbon monoxide, particulate matter, and ozone, for details see APIs between 101 and 200 are considered to be ‘unhealthy’ especially for those with existing heart and/or lung problems.

So how does Penang stack up? We’re always hearing how terrible pollution is on the island, but is it actually true ? In the ‘calendar’ plot below, we merged the API data with hourly observations on wind-speed and direction from Weather Underground.  The seasonality in API is clear. We had high API values in February and March 2014, but the highest were in June and July which corresponds to the south-west monsoon transporting smoke from forest fires in Sumatera.  

We summarize diurnal variability in API (and to a certain extent seasonality too) in the ‘rose’ plot below.  It suggests that only rarely can air pollution on the island be considered ‘unhealthy’ since it is typically below 100.  If you are, however, concerned your best bet for strenuous outdoor activities is during the night or in the morning up until 1pm when levels of API start to increase.  The least healthy time of day to be out is rush hour between 4pm and 6pm. You may also want to avoid February, March, June and July!

Do note, however, that the API monitoring station at USM is relatively isolated from the island’s urban pell-mell and may not be a good overall reflection/summary of air pollution on the island. 

The R code used to produce the plots shown above are provided below. We are currently working on trying to scrape the API data off the Malaysian DOE website using R (which would make things a lot more convenient!) and once we've managed to do that, we will update the code accordingly.

# Load required libraries
# Set working directory
# Read data from csv file 
df <- read.table('USM.csv',header=F,sep=":")
# Rename columns
colnames(df) <- c("station_id","station_name","dates","hour","api","unk","unk1")
# Remove last 2 columns - not relevant information
df <- df[-c(6,7)]
# Add minute column
df$minute <- as.character("00")
df$time <- as.character(paste(df$hour,df$minute,sep = ":" ))
df$date <- as.POSIXct(paste(df$dates,df$time, format = "%Y-%m-%d %H:%M" ))
# Convert 0 to NA
df$api[df$api == 0] <- NA
# Date and time columns
df$weekday = as.numeric(format(as.POSIXlt(df$date),"%u"))
df$week <- as.numeric(format(as.POSIXlt(df$date),"%W"))
df$day = strftime(df$date,'%A')
df$jday <- strptime(df$date, "%Y-%m-%d")$yday+1
df$hour <- as.numeric(format(strptime(df$date, format = "%Y-%m-%d %H:%M"),format = "%H"))
##### Creating roseplot of pollutant magnitude over 24 hours #####
# Subset data for 2014
df2014 <- subset(df,year==2014)
# Set Color Table
colortable = c("#99FFFF", "#00FFFF", "#00FF00", "#CCFF33", "#FFFF00",
               "#FFCC00", "#FF6600", "#FF3333", "#FF33CC", "#660033")
# Cut data into ten parts
Time_of_Day = df2014$hour[df2014$station_name==STN]
mag = cut_number(round(df2014$api,100)[df2014$station_name==STN],n = 10)
rosedata = data.frame(dir=Time_of_Day,mag=mag)
# Plot rose chart
rc <- ggplot(rosedata,aes(x=Time_of_Day,fill=mag))+  geom_bar(binwidth = 1, drop = TRUE) +
      coord_polar() + xlab("") + ylab("") + 
      scale_x_continuous(breaks = seq(0,23),
      "10:00pm","11:00pm")) +
      ggtitle("\nDaily API readings in 2014") +  scale_fill_manual(values=colortable,name="API Reading")+
      legend.title=element_text(size=13,colour="grey20",face="bold",family="Myriad Pro"),
      legend.text=element_text(size=11,colour="grey20",face="bold",family="Myriad Pro"),
      axis.text.y=element_text(size=10,colour="grey20",family="Clear Sans"),
      axis.text.x=element_text(size=10,colour="grey20",face="bold",family="Clear Sans"),
      plot.title = element_text(lineheight=1, face="bold",size = 20, colour = "grey20",family="Graphik-Black"),
      plot.margin = unit(c(-0.5,0.25,-0.25,0.25), "in"),
# Save plot to png
### Reading weather data from Weather Underground using weatherData library ####
# Get weather data for 2014
we <- getWeatherForDate("WMKP", "2014-01-01","2015-12-31", opt_detailed=T, opt_custom_columns=T, custom_columns=c(1,6,7,8,9,13,14))
# Rename columns
colnames(we) <- c("date","date_myt","vis_km","wda","ws","gspd_kmh","wd","date_utc")
# Create date and time columns
we$hour <- as.numeric(format(strptime(we$date, format = "%Y-%m-%d %H:%M"),format = "%H"))
we$min <- as.numeric(format(strptime(we$date, format = "%Y-%m-%d %H:%M"),format = "%M"))
# Only use data on the hour
we_df<- subset(we, min == 0)
# Remove duplicate data
# Merge API and weather data
we_api <- merge(we_df, df2014, by=c("date","date")) 
# Reformat yearmonth column for use in openair library
we_api$yearmonth <- as.factor(we_api$yearmonth)
# Reformat wind speed column
we_api$ws[we_api$ws == "Calm"] <- 0
we_api$ws <- as.numeric(we_api$ws)
# Plot 'calendar plot' and save as png
png(filename = "D:/Projects/AP/USM_API_2014_CalendarPlot.png",height=8,width=10,
    bg = "white",units='in', res = 400, family = "",  type = "cairo-png")
calendarPlot(we_api, pollutant = "api",year="2014",
             main = "Daily average API readings at USM\n
             with wind-speed scaled wind direction overlay (2014)\n",
             key.header = "API\n(Air Pollutant Index)",
             cols="heat", annotate = "ws")
Created by Pretty R at

Monday, 23 February 2015

Why are the British tweeting about climate change at night?

Points of post:

  •  To demonstrate how the library twitteR can be combined with mapping capabilities of ggplot
  • To reveal the locations of people tweeting about climate change at two points in time.
  • To show that climate change tweeters are mostly from the developed world.
  • To show that some (sad) tweeters are twittering about climate change in the middle of the night.

In a previous blog post, we used twitteR and wordcloud to summarize which other words were occurring in tweets (combined from around the world) together with the keywords, ‘climate-change’ and ‘Bangladesh’ at two arbitrarily selected time-points. 

The geo-locations of tweets are, however, also often available and are potentially very interesting and revealing. 

Fifteen-hundred (1500) is the maximum number of tweets that can be captured using the twitteR library with one call.  There probably are ways to get more data but we guess you probably have then to spend money. 

Here we used #climatechange because it coincided with the last days of COP20 in Lima, Peru which ran from 1st to 12th December 2014.  It is an extremely important global forum at which nations can meet and discuss their options for reducing carbon emissions, see

The first ‘tweet map’ (below) we produced is based on approximately 1500 geo-located tweets that contained the hash-tag, #climatechange, and which were ‘tweeted’ at about 10am GMT on the 11th December 2014. It shows that #climatechange tweets were coming from 4 main areas: North America, Europe, India and Australia. There didn’t appear either to be too many tweets coming out of Lima which surprised us. Maybe the delegates were too busy enjoying the South-American hospitality, and catching up with old mates to take much interest in Climate Change!

Geo-located tweets with #climatechange tweeted at around 10am GMT 11th December 2014

The second ‘tweet-map’ (below), also based on approximately 1500 geo-located #climatechange tweets, is for a snapshot that took place at 5 hours later at around 3am GMT on the final day of the conference (12th December 2014).  The overall pattern between the maps remains the same but the relative frequency of #climatechange tweeters from Europe, as compared to North America, has increased. People in the United Kingdom were particularly keen, twittering like mad about climate change at 3am. Why? We don’t know.

Geo-located tweets with #climatechange tweeted at around 3am GMT 12th December 2014

Note that tweets are geo-located, either by exploiting the users’ location as defined in their profile, or by ascertaining the exact location directly  if allowed by the user. This can be effected, either from GPS-enabled software which many people have installed on their smart-phones, or by using an IP-address.  This means that not all tweets can be geo-located with any great precision. Some are only geo-located at the National and/or regional levels, as evident from the large circle in the middle of Australia. That’s to say these cautious tweeters only gave ‘Australia’ as their location.

As we have explained in an earlier blog post on worldclouds and twitteR, to pull the data from Twitter using its API, you will need to have a Twitter account and carry out a 'twitter authentication'.  The R code to perform a search on twitter for the selected 'term(s)' and mapping them out is detailed below.

# Load required libraries
# Set working directory
#### Fonts on Windows ####
windowsFonts(ClearSans="TT Clear Sans")
# Load Credentials
load("D:/ClimData/Twitter/twitter authentification.Rdata")
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
# Search term on twitter
searchTerm <- "#climatechange"
searchResults <- searchTwitter(searchTerm,n=1500,since='2014-12-11', until='2014-12-12')  
tweetFrame <- twListToDF(searchResults) 
userInfo <- lookupUsers(tweetFrame$screenName)  
userFrame <- twListToDF(userInfo)
locatedUsers <- !$location)
# Geocode locations using 'ggpmap' library
locations <- geocode(userFrame$location[locatedUsers])
locations_robin <- project(as.matrix(locations), "+proj=robin")
locations_robin_df <-
# Import world boundaries
world <- readOGR(dsn="D:/Data/ne_10m_admin_0_countries", layer="ne_10m_admin_0_countries")
world_robin <- spTransform(world, CRS("+proj=robin"))
world_robin_df <- fortify(world_robin)
counts <- aggregate(locations_robin_df$V1,by=list(x=locations_robin_df$V1,y=locations_robin_df$V2),length)
names(counts)[3] <- "count"
# Theme options for Map
theme_opts <- list(theme(panel.grid.minor = element_blank(),
                         panel.grid.major = element_blank(),
                         panel.background = element_blank(),
                         panel.border = element_blank(),
                         plot.background = element_blank(),
                         axis.line = element_blank(),
                         axis.text.x = element_blank(),
                         axis.text.y = element_blank(),
                         axis.ticks = element_blank(),
                         axis.title.x = element_blank(),
                         axis.title.y = element_blank(),
                         legend.position = "bottom",
                         legend.key = element_blank(),
                         legend.title = element_text(colour="black", size=12, face="bold",family="Clear Sans"),
                         legend.text = element_text(colour="black", size=10, face="bold",family="Clear Sans"),
                         plot.title = element_text(size=15,face="bold",lineheight=0.5,family="Clear Sans")))
# Plot map and tweet counts 
tp <- ggplot(world_robin_df)+
      geom_polygon(aes(x = long, y = lat, group = group), fill = "grey20")+
      geom_path(aes(x = long, y = lat, group = group),colour = "grey40", lwd = 0.2)+
      geom_point(data= counts, aes(x=x,y=y,size=count),color="#32caf6", alpha=I(8/10))+
      scale_size_continuous(name="Number of tweets")+
      ggtitle("Twitter Map of #climatechange\n")+
      xlab("")+ ylab("")+
      theme_bw() + 
      guides(size = guide_legend(title.position = "top",title.hjust =0.5))+
# Save to png
Created by Pretty R at