This blog is run by Jason Jon Benedict and Doug Beare to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world. Jason currently works as a geospatial professional based in Malaysia. Doug lives in the United Kingdom and is currently Director of Globefish Consultancy Services which provides scientific advice to organisations that currently include the STECF [Scientific, Technical and Economic Committee for Fisheries, https://stecf.jrc.europe.eu/] and ICCAT, https://www.iccat.int/en/

Showing posts with label Rainfall. Show all posts
Showing posts with label Rainfall. Show all posts

Monday, 20 October 2014

Rainfall event frequencies recorded by the Worldfish PWS in September 2014

Key points of post


·         Wednesday and Saturdays were far wetter.
·         Far more rain fell between 6 and 6.59 am and between 11 and 11.59pm


It has been just over a month and a half since we got the Weather Station here at WorldFish up and running and we thought it would fun to have a quick look at our first complete month of data, ie. data collected at 5 minute intervals for all of September (2014) typically one of the wettest months in Penang. Remember that a couple of months ago we looked at the frequency distribution of rainfall events by hour of day based on data from Bayan Lepas International airport.

After aggregating of the rainfall data, we looked at the frequency distribution of rainfall events by hour of day, and also by the day of week and we noticed some features we thought were worth sharing. One might be forgiven for assuming that there would not be too much variation in the amount of rain that falls either during a specific hour of the day, or a specific day of the week. However, the rainfall data captured here at the WorldFish office at Batu Maung show that certain times of  day and days of the week saw significantly more rainfall. In September 2014, we saw much more rain between 6 and 6.59am (65.9 mm) and between 11 and 11.59pm (48.5 mm). The rain falling at these times contributed over  30% of the rainfall for that entire month (354.5 mm) with 19% of the total monthly rainfall falling between 6 and 6.59am. Bizarrely it was also much more likely to rain on Wednesdays (102.6mm) and Saturdays (106.2mm's). To put this into perspective over 60% of September 2014’s  rain fell on either Saturday or Wednesday!




The plots also show that there was very little rainfall on Fridays during September 2014 (5.6 mm). Interestingly north-east Penang Island experienced a once-in-40-year rainfall event recently (137 mm's in a few hours) on FRIDAY night (3 October 2014). At the same time, however, very little rain was recorded at our PWS at Worldfish. Rainfall events can thus be extremely localized indicating the importance of establishing well-distributed and densely placed weather stations for studying weather and climate.

Granted, the rainfall data we looked at only covers a short period of time and the large variations could be pure coincidence. Conversely, these could also indicate consistent patterns or trends with scientific explanations possible (lunar periodicity, seasonal effects?). We intend to keep an eye on these features of the data and it will be interesting to see what emerges when time periods are examined.

As usual, the R code used the produce the plots in this post is provided below. The data is pulled from Weather Underground using the 'weatherData' package. 
## Load required libraries
 
library(weatherData)
library(ggplot2)
library(scales)
library(plyr)
library(reshape2)
library(lubridate)
library(zoo)
 
# Get data for PWS using weatherData package
 
pws <- getWeatherForDate("IPENANGB2", "2014-09-01","2014-09-30", station_type = "id",opt_detailed=T, opt_custom_columns=T, custom_columns=c(1,2,6,7,10,13))
 
# Rename columns
 
colnames(pws)<-c("time","time1","tempc","wdd","wspd","prcp","rain")
 
## Adding date columns
 
pws$time<-as.POSIXct(pws$time1,format="%Y-%m-%d %H:%M:%S",tz="Australia/Perth")
pws$year <- as.numeric(format(pws$time,"%Y"))
pws$date <-as.Date(pws$time,format="%Y-%m-%d",tz="Australia/Perth")
pws$year <- as.numeric(as.POSIXlt(pws$date)$year+1900)
pws$month <- as.numeric(as.POSIXlt(pws$date)$mon+1)
pws$monthf <- factor(pws$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
pws$weekday <- as.POSIXlt(pws$date)$wday
pws$weekdayf <- factor(pws$weekday,levels=rev(0:6),labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")))
pws$yearmonth <- as.yearmon(pws$date)
pws$yearmonthf <- factor(pws$yearmonth)
pws$week <- as.numeric(format(as.Date(pws$date),"%W"))
pws$weekf<- factor(pws$week)
pws$jday<-yday(pws$date)
pws$hour <- as.numeric(format(strptime(pws$time, format = "%Y-%m-%d %H:%M"),format = "%H"))
pws$min <- as.numeric(format(strptime(pws$time, format = "%Y-%m-%d %H:%M"),format = "%M"))
 
# Remove duplicate values
 
# Function to compute consecutive differences 
dif <- function(x) c(diff(x), NA) 
 
# Apply the function to each individual 
 
pws<-ddply(pws, 'jday', transform, actrain = dif(rain)) 
 
pws$actrain[pws$actrain<0]<- 0
 
pws$actrain[is.na(pws$actrain)] <- 0
 
# Summarize rainfall data by hour and by day
 
rd <- ddply(pws,.(weekday,weekdayf),summarize, raintot = sum(actrain,na.rm=T))
 
rh <- ddply(pws,.(hour),summarize, raintot = sum(actrain,na.rm=T))
 
# Plot rainfall by hour
 
h <-  ggplot(rh,aes(hour,raintot,fill=raintot))+geom_bar(stat="identity")+
      geom_text(aes(label=round(raintot,3)), vjust = -0.4, size = 2.5)+
      ggtitle("Total rainfall by hour for September 2014 - WorldFish Weather Station\n")+
      scale_fill_continuous(low='grey90', high='steelblue') + theme_bw()+
      xlab("\nHour of Day")+
      ylab("Total Precipitation (mm)\n")+
      theme(axis.text.x  = element_text(size=10), legend.position="none",
      panel.background=element_blank(),
      axis.title.x=element_text(size=10,colour="grey20"),
      axis.title.y=element_text(size=10,colour="grey20"),
      panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
      axis.text.x=element_text(size=10,colour="grey20",face="bold"),
      plot.title = element_text(lineheight=1, face="bold",size = 13, colour = "grey20"))
 
h
 
# Plot rainfall by Week Day
 
d <- ggplot(rd,aes(weekday,raintot,fill=raintot))+geom_bar(stat="identity")+
     geom_text(aes(label=round(raintot,3)), vjust = -0.4, size = 2.5)+
     ggtitle("Total rainfall by day for September 2014 - WorldFish Weather Station\n")+
     scale_fill_continuous(low='grey90', high='steelblue') + theme_bw()+
     xlab("\nDay of Week")+
     ylab("Total Precipitation (mm)\n")+
     scale_x_continuous(breaks=c(0,1,2,3,4,5,6),labels=c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"))+
     theme(axis.text.x  = element_text(size=10), legend.position="none",
     panel.background=element_blank(),
     axis.title.x=element_text(size=10,colour="grey20"),
     axis.title.y=element_text(size=10,colour="grey20"),
     panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
     axis.text.x=element_text(size=10,colour="grey20",face="bold"),
     plot.title = element_text(lineheight=1, face="bold",size = 13, colour = "grey20"))
 
d
 
# Save plots to png
 
ggsave(h,file="WF-PWS-RFbyHr.png",width=7,height=5,dpi=400,type="cairo")
ggsave(d,file="WF-PWS-RFbyDay.png",width=7,height=5,dpi=400,type="cairo")
Created by Pretty R at inside-R.org

Monday, 4 August 2014

Will I get soaked? - Best time of day to commute on foot or by bike on Penang Island

Key points of post

  • Chances of getting wet are always highest during March, April, October and November;
  • On average 8am-9am is a particularly wet time of day on Penang Island.
  • The wettest hour is 8pm-9pm; especially in October and November.

Unfortunately not nearly enough of us cycle to work in Penang. If you do, however, what is the best time of day to go to avoid getting drenched, and how does this vary over the year ? 

To answer this question we needed to find rainfall data at sufficient resolution (hourly). We searched through a lot of online databases but eventually it was the citizen scientists who came up with the goods via Weather Underground.

From Weather Underground, Jason managed to download hourly rainfall ‘event’ data for Penang for 2002-2013. These data describe simply whether it was raining or not each hour of the day and these numbers can be summed into frequencies.

Unfortunately we do not how much rain fell per hour and it is difficult to acquire such data since rain gauges are usually set up to record daily quantities of rain. Nevertheless we think that the frequency data recording such rainfall ‘events’ are still instructive about whether or not you will get a soaking. 

The ‘event’ data (2002-2013) are plotted in the circular plot below. These types of plot have been discussed in previous blog post. The plot shows that the chance of rain depends on both the month (see previous blogs) and the time of day. As we’ve seen before, January, February, May, June and July are the driest while October and November are the wettest in Penang. In all months there seems to be slightly more chance of getting wet at 8am so it would be best to come into work either earlier or later. 2am and 2pm also seem to be particularly wet times of day. The worst time of day to commute by bike or on foot, however is 8pm and this is particularly true for the months of March, April, October, and November. Luckily most of us are at home by then!


As yet we have no idea what causes these differences but the data certainly suggest that they exist.

The code used to prepare this is very similar to the ones we produced in our previous post here, although you will need the 'weatherData' package to pull the hourly rainfall 'event' data from Weather Underground into R. The hourly data is particularly huge in this case and I would suggest running it over a stable internet connection overnight or getting the datasets year by year and then 'rbind-ing' the data together.  The R code used to produce the above plot is as follows. 

# Load required libraries
 
library(weatherData)
library(zoo)
library(lubridate)
library(plyr)
library(ggplot2)
library(circular)
library(grid)
library(RColorBrewer)
 
# Download hourly weather data from Weather Underground 
 
we <- getWeatherForDate("WMKP", "2002-01-01","2013-12-31", opt_detailed=T, opt_custom_columns=T, custom_columns=c(11))
 
# Convert to characters
 
we$Events<-as.character(we$Events)
 
# Assign values of '1' to Rain Event and 'NA' to Non-Rain Event
 
we$Events[we$Events == "" ] <- NA
 
we$Events[we$Events == "Rain"] <- 1
 
we$Events[we$Events == "Rain-Thunderstorm"] <- 1
 
we$Events[we$Events == "Thunderstorm"] <- NA
 
# Convert to numeric
 
we$Events<-as.numeric(we$Events)
 
# Create date and time columns
 
we$Dates<-as.POSIXct(we$Time)
 
we$year <- as.numeric(as.POSIXlt(we$Dates)$year+1900)
we$month <- as.numeric(as.POSIXlt(we$Dates)$mon+1)
we$monthf <- factor(we$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
we$weekday <- as.POSIXlt(we$Dates)$wday
we$weekdayf <- factor(we$weekday,levels=rev(0:6),labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")),ordered=TRUE)
we$week <- as.numeric(format(as.Date(we$Dates),"%W"))
we1$hour <- as.numeric(format(strptime(we1$Dates, format = "%Y-%m-%d %H:%M"),format = "%H"))
we1$min <- as.numeric(format(strptime(we1$Dates, format = "%Y-%m-%d %H:%M"),format = "%M"))
 
## Use only data on the hour
 
we1<- subset(we, min == 0)
 
we2 <- ddply(we1,.(monthf,hour),summarize, event = sum(Events,na.rm=T))
 
# Define colour palette
 
col<-brewer.pal(9,"Blues")
 
# Plot circular chart of rain event frequency
 
r1 <-  ggplot(we2, aes(x=monthf, y=hour, fill=event)) +
       geom_tile(colour="grey70") +
       scale_fill_gradientn(colours=col,guide=FALSE)+
       scale_y_continuous(breaks = seq(0,23),
       labels=c("12.00am","1:00am","2:00am","3:00am","4:00am","5:00am","6:00am",
       "7:00am","8:00am","9:00am","10:00am","11:00am","12:00pm",
       "1:00pm","2:00pm","3:00pm","4:00pm","5:00pm","6:00pm","7:00pm",
       "8:00pm","9:00pm","10:00pm","11:00pm")) +
       coord_polar(theta="x") +
       ylab("HOUR OF DAY")+
       xlab("Source: Weather Underground (2014)")+
       ggtitle("Rain Event Frequency by Month and Hour\n(Bayan Weather Station)\n")+
       guides(colour = guide_legend(show = FALSE)) +
       theme(panel.background=element_blank(),
       axis.title.y=element_text(size=10,hjust=0.75,colour="grey20"),
       axis.title.x=element_text(size=7,colour="grey20"),
       panel.grid=element_blank(),
       axis.ticks=element_blank(),
       axis.text.y=element_text(size=5,colour="grey20"),
       axis.text.x=element_text(size=10,colour="grey20",face="bold"),
       plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
       plot.margin = unit(c(0.5,0.5,0.5,0.5), "cm"))
 
r1
 
# Save to png file
 
ggsave(r1,file="Rain_Event_Frequency_Plot.png", width=6, height=6, dpi=400,type="cairo-png")
Created by Pretty R at inside-R.org

Tuesday, 10 June 2014

A Days’ Weather in Penang: the utility of Weather Underground

Key points of post


  • Weather Underground combines a website, database and a global network of airport and Personal Weather Stations maintained largely by amateurs (http://www.wunderground.com/). 
  • The network is growing fast, and the data can be accessed freely.
  • The utility of Weather Underground is illustrated below using an example from one day’s weather in Penang, Malaysia.

Recently we’ve discovered Weather Underground (http://www.wunderground.com/) where you can access data from a global network of weather stations. Weather Underground is a ‘citizen science’ website specializing in the collection and dissemination of meteorological data.  It promotes a Personal Weather Station network whereby subscribers buy their own weather station, install it in their garden or wherever, and then plug into the global network via the internet. The data they collect are then automatically stored in the ‘cloud’ where they can be accessed, analyzed and visualized by anyone with a computer. Most Personal Weather Stations are in the United States, but other countries are catching up fast. There are three on Penang Island so far; one at Dalat International School, another in Tanjung Tokong, and the International Airport at Bayan Lepas is also connected. 

The data collected via the Personal Weather Stations can be imported automatically into our software of choice, R, (http://cran.r-project.org/) where they can be analyzed using the extensive statistical and data-visualisation tools available. 

As an example of the type of data available on Weather Underground, we downloaded and plotted data for a single day, collected at Bayan Lepas International Airport on 2nd June 2014.  Data for six important meteorological parameters  were selected and have been plotted below at half-hourly intervals. 



Sometimes it can seem that weather in Penang is rather uniform throughout the day (ie. very hot), but this is actually not the case.  Starting at midnight, Air Temperatures were ~ 27°C until 8am when they rose rapidly to peak between 1.30 and 3.30pm (best time for a siesta).  After that temperature fell slowly, changing little between 6pm and 10pm remaining at 29°C.  In the afternoon of 2nd June 2014 the wind speed increased from the south-west, after blowing very lightly from the north all night.  The rise in wind-speed during the day was related to a fall in barometric pressure, and this is of course the ‘South West Monsoon’ wind.  By dusk the wind speed fell away again.  ‘Heat-index’ is a measure of the comfort a human is likely to experience, combining measures of temperature and humidity. On 2nd June this was clearly, directly related to the overall ambient temperature (top graph versus bottom) but this is not always the case.
 
Note: as usual the R-code for accessing Weather Underground and creating the graphs is given below. Readers can use this to plot data for their own location, dependent on the proximity of a Personal Weather Station.

We have used the 'weatherData' package developed by Ram Narasimhan to fetch the data from Weather Underground. The package can be installed directly from the CRAN site of from his site at http://ram-n.github.io/weatherData/

You can also get more information about the package from the github site that goes into further detail with examples and some of the other additional functions that come with it.  
 
## Load required libraries
 
library(weatherData)
library(ggplot2)
library(mgcv)
library(scales)
library(plyr)
library(reshape2)
library(gridExtra)
library(lubridate)
library(weathermetrics)
 
### Find required 4 letter station code using function below
 
# Example getStationCode("Penang") or getStationCode("Honiara")
 
getStationCode("Penang")
 
station.id="WMKP"
date="2014-06-02"
 
s<-getStationCode(station.id)
 
s1<-(strsplit(s,split= " "))[[1]]
 
station.name<-paste(s1[4],s1[5])
 
# Get detailed weather station data
 
wd<-getDetailedWeather(station.id,date, opt_all_columns=T)
 
str(wd)
 
# Rename columns
 
colnames(wd)<-c("time","mytime","tempc","dewpc","hum","slp","vsb","wd","wspd","guspd","prcp","events",
                "conditions","wdd","date_utc")
 
 
## Plot temperatures
 
dt_p <-   ggplot(wd, aes(time, tempc)) + 
  xlab("Time") + ylab(as.expression(expression( paste("Temperature (", degree,"C)","\n")))) + 
  geom_line(colour="red",size=0.5) +
  geom_point(colour="red",size=3)+
  theme_bw() +
  ggtitle(paste('Plot of weather variables for',station.name,"(",station.id,")",  'on',strftime(date,"%d-%b-%Y"),'\n\n',"Temperature\n")) +
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dt_p
 
## Plot windspeed
 
wd$wspd[wd$wspd=="Calm"] <- 0
 
wd$wspd<-as.numeric(wd$wspd)
 
winds <- subset(melt(wd[,c("time","wspd")], id = "time"), value > 0)
 
dws_p <- ggplot(winds, aes(time, value))+
  geom_point(col="seagreen",size=3)+
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  ylab("km/hour\n")+
  xlab("Time")+
  scale_y_continuous(limits=c(0,30))+
  stat_smooth(aes(group = 1), col="seagreen4",method = "loess",span=0.3,se=T,size=1.2)+
  theme_bw()+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))+        
  ggtitle("Wind speed\n")
 
dws_p
 
## Plot wind vectors
 
wd$u <- (-1 * wd$wspd) * sin((wd$wdd) * pi / 180.0)
wd$v <- (-1 * wd$wspd) * cos((wd$wdd) * pi / 180.0)
dw = subset(wd, u != 0 & v != 0)
 
v_breaks = pretty_breaks(n = 5)(min(dw$v):max(dw$v))
v_labels = abs(v_breaks)
 
dwd_p <-  ggplot(data = dw, aes(x = time, y = 0)) +
  theme_bw() +
  theme(plot.margin = unit(c(0, 1, 0.5, 0.5), 'lines')) +
  geom_segment(aes(xend = time + u*360, yend = v), arrow = arrow(length = unit(0.25, "cm")), size = 0.75) + 
  geom_point(data = subset(dw, !is.na(u)), alpha = 0.5,size=3) +
  scale_x_datetime(name="Time",labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours")) +
  ggtitle("Wind vectors\n")+
  scale_y_continuous(name = "Wind vectors (km/h)\n", labels = v_labels, breaks = v_breaks)+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dwd_p
 
## Plot Sea Level Pressure
 
wd$slp <- as.numeric(as.character(wd$slp))
 
dslp_p <- ggplot(wd, aes(time, slp)) + 
  geom_point(col="purple4",size=3) + 
  stat_smooth(span = 0.3,method="loess",col="purple",size=1.2)+
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  ggtitle("Sea Level Pressure\n")+
  ylab("hPa\n\n")+
  xlab("Time")+
  theme_bw()+ 
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dslp_p
 
 
## Plot Relative Humidity
 
dh_p <-  ggplot(wd, aes(time, hum)) + 
  #geom_step(colour="royalblue",size=0.5) +
  geom_point(colour="royalblue4",size=3) +
  xlab("Time") + ylab("Relative Humidity (%)\n") + 
  ggtitle("Relative humidity\n")+
  geom_smooth(aes(group = 1), col="royalblue",method = "loess",span=0.3,se=T,size=1.2)+
  theme_bw() +
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dh_p
 
## Calculate Heat Index using weathermetrics package
 
wd$hi <- heat.index(t = wd$tempc,rh = wd$hum,temperature.metric = "celsius",output.metric = "celsius",round = 2)
 
### Plot Heat Index
 
dhi_p <- ggplot(wd, aes(time, hi)) + 
  xlab("Time") + ylab(as.expression(expression(paste("Temperature (", degree,"C)",)))) + 
  geom_line(colour="red",size=0.5) +
  geom_point(colour="red",size=3)+
  theme_bw() +
  ggtitle("Heat Index\n") +
  scale_x_datetime(labels = date_format("%I:%M:%p"),breaks = date_breaks("4 hours"))+
  scale_y_continuous(breaks=c(25,30,35,40,45))+
  theme(plot.title = element_text(lineheight=1.2, face="bold",size = 16, colour = "grey20"),
  panel.border = element_rect(colour = "black",fill=F,size=1),
  panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
  panel.grid.minor = element_blank(),
  panel.background = element_rect(fill = "ivory",colour = "black"))
 
dhi_p
 
plot.name1 <- paste(station.id,"_WU_",date,".png",sep="")
 
tiff(plot.name1,width = 13, height = 18, units = "in",
     compression = "lzw",bg = "white", res = 600, family = "", restoreConsole = TRUE,
     type = "cairo")
 
grid.draw(rbind(ggplotGrob(dt_p),ggplotGrob(dh_p),ggplotGrob(dslp_p),ggplotGrob(dws_p),ggplotGrob(dwd_p),ggplotGrob(dhi_p),size="first"))
 
dev.off()
Created by Pretty R at inside-R.org