This blog is run by Jason Jon Benedict and Doug Beare to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world. Jason currently works as a geospatial professional based in Malaysia. Doug lives in the United Kingdom and is currently Director of Globefish Consultancy Services which provides scientific advice to organisations that currently include the STECF [Scientific, Technical and Economic Committee for Fisheries, https://stecf.jrc.europe.eu/] and ICCAT, https://www.iccat.int/en/

Tuesday, 10 June 2014

A Days’ Weather in Penang: the utility of Weather Underground

Key points of post


  • Weather Underground combines a website, database and a global network of airport and Personal Weather Stations maintained largely by amateurs (http://www.wunderground.com/). 
  • The network is growing fast, and the data can be accessed freely.
  • The utility of Weather Underground is illustrated below using an example from one day’s weather in Penang, Malaysia.

Recently we’ve discovered Weather Underground (http://www.wunderground.com/) where you can access data from a global network of weather stations. Weather Underground is a ‘citizen science’ website specializing in the collection and dissemination of meteorological data.  It promotes a Personal Weather Station network whereby subscribers buy their own weather station, install it in their garden or wherever, and then plug into the global network via the internet. The data they collect are then automatically stored in the ‘cloud’ where they can be accessed, analyzed and visualized by anyone with a computer. Most Personal Weather Stations are in the United States, but other countries are catching up fast. There are three on Penang Island so far; one at Dalat International School, another in Tanjung Tokong, and the International Airport at Bayan Lepas is also connected. 

The data collected via the Personal Weather Stations can be imported automatically into our software of choice, R, (http://cran.r-project.org/) where they can be analyzed using the extensive statistical and data-visualisation tools available. 

As an example of the type of data available on Weather Underground, we downloaded and plotted data for a single day, collected at Bayan Lepas International Airport on 2nd June 2014.  Data for six important meteorological parameters  were selected and have been plotted below at half-hourly intervals. 



Sometimes it can seem that weather in Penang is rather uniform throughout the day (ie. very hot), but this is actually not the case.  Starting at midnight, Air Temperatures were ~ 27°C until 8am when they rose rapidly to peak between 1.30 and 3.30pm (best time for a siesta).  After that temperature fell slowly, changing little between 6pm and 10pm remaining at 29°C.  In the afternoon of 2nd June 2014 the wind speed increased from the south-west, after blowing very lightly from the north all night.  The rise in wind-speed during the day was related to a fall in barometric pressure, and this is of course the ‘South West Monsoon’ wind.  By dusk the wind speed fell away again.  ‘Heat-index’ is a measure of the comfort a human is likely to experience, combining measures of temperature and humidity. On 2nd June this was clearly, directly related to the overall ambient temperature (top graph versus bottom) but this is not always the case.
 
Note: as usual the R-code for accessing Weather Underground and creating the graphs is given below. Readers can use this to plot data for their own location, dependent on the proximity of a Personal Weather Station.

We have used the 'weatherData' package developed by Ram Narasimhan to fetch the data from Weather Underground. The package can be installed directly from the CRAN site of from his site at http://ram-n.github.io/weatherData/

You can also get more information about the package from the github site that goes into further detail with examples and some of the other additional functions that come with it.  
 
## Load required libraries
 
library(weatherData)
library(ggplot2)
library(mgcv)
library(scales)
library(plyr)
library(reshape2)
library(gridExtra)
library(lubridate)
library(weathermetrics)
 
### Find required 4 letter station code using function below
 
# Example getStationCode("Penang") or getStationCode("Honiara")
 
getStationCode("Penang")
 
station.id="WMKP"
date="2014-06-02"
 
s<-getStationCode(station.id)
 
s1<-(strsplit(s,split= " "))[[1]]
 
station.name<-paste(s1[4],s1[5])
 
# Get detailed weather station data
 
wd<-getDetailedWeather(station.id,date, opt_all_columns=T)
 
str(wd)
 
# Rename columns
 
colnames(wd)<-c("time","mytime","tempc","dewpc","hum","slp","vsb","wd","wspd","guspd","prcp","events",
                "conditions","wdd","date_utc")
 
 
## Plot temperatures
 
dt_p <-   ggplot(wd, aes(time, tempc)) + 
  xlab("Time") + ylab(as.expression(expression( paste("Temperature (", degree,"C)","\n")))) + 
  geom_line(colour="red",size=0.5) +
  geom_point(colour="red",size=3)+
  theme_bw() +
  ggtitle(paste('Plot of weather variables for',station.name,"(",station.id,")",  'on',strftime(date,"%d-%b-%Y"),'\n\n',"Temperature\n")) +
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dt_p
 
## Plot windspeed
 
wd$wspd[wd$wspd=="Calm"] <- 0
 
wd$wspd<-as.numeric(wd$wspd)
 
winds <- subset(melt(wd[,c("time","wspd")], id = "time"), value > 0)
 
dws_p <- ggplot(winds, aes(time, value))+
  geom_point(col="seagreen",size=3)+
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  ylab("km/hour\n")+
  xlab("Time")+
  scale_y_continuous(limits=c(0,30))+
  stat_smooth(aes(group = 1), col="seagreen4",method = "loess",span=0.3,se=T,size=1.2)+
  theme_bw()+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))+        
  ggtitle("Wind speed\n")
 
dws_p
 
## Plot wind vectors
 
wd$u <- (-1 * wd$wspd) * sin((wd$wdd) * pi / 180.0)
wd$v <- (-1 * wd$wspd) * cos((wd$wdd) * pi / 180.0)
dw = subset(wd, u != 0 & v != 0)
 
v_breaks = pretty_breaks(n = 5)(min(dw$v):max(dw$v))
v_labels = abs(v_breaks)
 
dwd_p <-  ggplot(data = dw, aes(x = time, y = 0)) +
  theme_bw() +
  theme(plot.margin = unit(c(0, 1, 0.5, 0.5), 'lines')) +
  geom_segment(aes(xend = time + u*360, yend = v), arrow = arrow(length = unit(0.25, "cm")), size = 0.75) + 
  geom_point(data = subset(dw, !is.na(u)), alpha = 0.5,size=3) +
  scale_x_datetime(name="Time",labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours")) +
  ggtitle("Wind vectors\n")+
  scale_y_continuous(name = "Wind vectors (km/h)\n", labels = v_labels, breaks = v_breaks)+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dwd_p
 
## Plot Sea Level Pressure
 
wd$slp <- as.numeric(as.character(wd$slp))
 
dslp_p <- ggplot(wd, aes(time, slp)) + 
  geom_point(col="purple4",size=3) + 
  stat_smooth(span = 0.3,method="loess",col="purple",size=1.2)+
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  ggtitle("Sea Level Pressure\n")+
  ylab("hPa\n\n")+
  xlab("Time")+
  theme_bw()+ 
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dslp_p
 
 
## Plot Relative Humidity
 
dh_p <-  ggplot(wd, aes(time, hum)) + 
  #geom_step(colour="royalblue",size=0.5) +
  geom_point(colour="royalblue4",size=3) +
  xlab("Time") + ylab("Relative Humidity (%)\n") + 
  ggtitle("Relative humidity\n")+
  geom_smooth(aes(group = 1), col="royalblue",method = "loess",span=0.3,se=T,size=1.2)+
  theme_bw() +
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dh_p
 
## Calculate Heat Index using weathermetrics package
 
wd$hi <- heat.index(t = wd$tempc,rh = wd$hum,temperature.metric = "celsius",output.metric = "celsius",round = 2)
 
### Plot Heat Index
 
dhi_p <- ggplot(wd, aes(time, hi)) + 
  xlab("Time") + ylab(as.expression(expression(paste("Temperature (", degree,"C)",)))) + 
  geom_line(colour="red",size=0.5) +
  geom_point(colour="red",size=3)+
  theme_bw() +
  ggtitle("Heat Index\n") +
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  scale_y_continuous(breaks=c(25,30,35,40,45))+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dhi_p
 
plot.name1 <- paste(station.id,"_WU_",date,".png",sep="")
 
tiff(plot.name1,width = 13, height = 18, units = "in",
     compression = "lzw",bg = "white", res = 600, family = "", restoreConsole = TRUE,
     type = "cairo")
 
grid.draw(rbind(ggplotGrob(dt_p),ggplotGrob(dh_p),ggplotGrob(dslp_p),ggplotGrob(dws_p),ggplotGrob(dwd_p),ggplotGrob(dhi_p),size="first"))
 
dev.off()
Created by Pretty R at inside-R.org

Thursday, 1 May 2014

Penang rainfall: is 2014 now catching up?

Key points of post

  • January, February and March 2014 were unusually dry and hot (see post from 12 March 2014);
  • Rainfall in April 2014 has been substantial, and in terms of cumulative rainfall, 2014 is now catching up with other years.

In previous articles we discussed the very dry start to 2014, but how have things progressed since?

A repetitive theme of these articles is how important it is to examine data in different ways, from different ‘vantage’ points,using different styles of plot.


Cumulative rainfall  is plotted in the graph above for selected years for the first five months of the year for the last 10 years. [This means that the rainfall each day is added to the day before and so on]. 2012 and 2009 were both wet years overall up to the end of May; but note that 2012 also had an exceptionally dry January (purple line). 2010 was the driest year out of these 10 for the five month period.

Despite the low rainfall in January and February 2014, relatively high rainfall since the start of April (day 90) has propelled the cumulative total rapidly upwards. It follows that, by December, 2014 could end up anywhere in terms of total rainfall.

We will keep you updated.

The data used for the cumulative rainfall plot is again sourced from the Global Surface Summary of the Day (GSOD) product developed by the Federal Climate Complex at the National Climatic Data Centre (NCDC).

And the R code used to create the above plot is shown below

# Load required libraries 
library(plyr)
library(ggplot2)
library(lubridate)
library(date)
 
# Setting work directory
 
setwd("d:\\ClimData")
 
# Reading and reformatting GSOD raw data downloaded from NCDC
 
dat<-read.table("CDO2812586929956.txt",header=F,skip=1)
 
colnames(dat)<-c("stn","wban","yearmoda","temp","tempc","dewp","dewpc","slp","slpc","stp","stpc","visib","visibc","wdsp","wdspc","mxspd","gust","maxtemp","mintemp","prcp","sndp","frshtt")
 
dat$yearmoda <- strptime(dat$yearmoda,format="%Y%m%d")
 
dat$prcp <- as.character(dat$prcp)
dat$prcp1 <-as.numeric(substr(dat$prcp,1,4))
dat$prcpflag <- substr(dat$prcp,5,5)
 
# Convert precipitation from inches to mms
 
dat$rain  <- dat$prcp1*25.4
 
# Remove erronous values
 
dat$rain[dat$rain > 1000 ] <- NA
 
dat$year <- as.numeric(format(dat$yearmoda,"%Y"))
dat$month <- as.numeric(format(dat$yearmoda,"%m"))
dat$day <- as.numeric(format(dat$yearmoda,"%d"))
 
# Getting cumulative sum of rain/year
 
dat$date<-as.Date(dat$yearmoda)
 
 
# Subsetting required period
 
dat2 <- subset(dat, year >= 2005 & month <= 4 )
 
# Extracting required columns for transforming data
 
dat3 <- dat2[, c(25,29)]
 
# Replace na's with 0's for ddply function
 
dat3$rain[is.na(dat3$rain)] <- 0
 
dat3 <- ddply(dat3,.(year(date)),transform, cumRain = cumsum(rain))
 
dat4 <- ddply(dat3,.(date,year(date)),summarize, max = max(cumRain))
 
dat5 <- dat4[c(diff(as.numeric(substr(dat4$date, 9, 10))) < 0, TRUE), ]
 
dat5$year <- as.numeric(format(dat5$date,"%Y"))
dat5$month <- as.numeric(format(dat5$date,"%m"))
dat5$day <- as.numeric(format(dat5$date,"%d"))
 
# Calculate julian day for labeling
 
dat5$jday <- strptime(dat5$date, "%Y-%m-%d")$yday+1
 
 
# Plot cumulative rainfall
 
plot.title = 'Cumulative Rainfall by Year - January to May (2005-2014)'
plot.subtitle = 'Data source : Federal Climate Complex, Global Surface Summary Of Day Data Version 7'
 
cr<-  ggplot(dat3, aes(x = yday(date), y = cumRain, color = factor(year(date)))) +
      geom_line(size=0.5,linetype='solid') + geom_point(size=1.5) + theme_bw() +
      ggtitle(bquote(atop(.(plot.title), atop(italic(.(plot.subtitle)), "")))) + theme(plot.title = element_text(face = "bold",size = 16,colour="black")) +
      guides(color = guide_legend(title = "Year", title.position = "top")) +
      scale_x_continuous(breaks=c(0,30,60,90,120,150,180,210,240,270,300,330,360)) +
      scale_y_continuous(limits=c(0,800))+
      xlab("Julian Day") + ylab("Rainfall (mm)\n")+
      theme(panel.border = element_rect(colour = "black",fill=F,size=1),
      panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
      panel.grid.minor = element_blank(),
      panel.background = element_rect(fill = "ivory",colour = "black"),
      legend.position="right") 
 
cr <- cr + geom_text(data = subset(dat5, jday > 100 | year == 2014 & jday > 120 ), (aes(x = yday(date), y = max, label = year(date))),size=3,vjust=-0.2, hjust=-0.2)
 
cr 
 
ggsave(cr, file="Cumulative_RF_Penang.png", width=13, height=7,type = "cairo-png")
Created by Pretty R at inside-R.org