This blog is run by Jason Jon Benedict and Doug Beare to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world. Jason currently works as a geospatial professional based in Malaysia. Doug lives in the United Kingdom and is currently Director of Globefish Consultancy Services which provides scientific advice to organisations that currently include the STECF [Scientific, Technical and Economic Committee for Fisheries,] and ICCAT,

Monday, 4 August 2014

Will I get soaked? - Best time of day to commute on foot or by bike on Penang Island

Key points of post

  • Chances of getting wet are always highest during March, April, October and November;
  • On average 8am-9am is a particularly wet time of day on Penang Island.
  • The wettest hour is 8pm-9pm; especially in October and November.

Unfortunately not nearly enough of us cycle to work in Penang. If you do, however, what is the best time of day to go to avoid getting drenched, and how does this vary over the year ? 

To answer this question we needed to find rainfall data at sufficient resolution (hourly). We searched through a lot of online databases but eventually it was the citizen scientists who came up with the goods via Weather Underground.

From Weather Underground, Jason managed to download hourly rainfall ‘event’ data for Penang for 2002-2013. These data describe simply whether it was raining or not each hour of the day and these numbers can be summed into frequencies.

Unfortunately we do not how much rain fell per hour and it is difficult to acquire such data since rain gauges are usually set up to record daily quantities of rain. Nevertheless we think that the frequency data recording such rainfall ‘events’ are still instructive about whether or not you will get a soaking. 

The ‘event’ data (2002-2013) are plotted in the circular plot below. These types of plot have been discussed in previous blog post. The plot shows that the chance of rain depends on both the month (see previous blogs) and the time of day. As we’ve seen before, January, February, May, June and July are the driest while October and November are the wettest in Penang. In all months there seems to be slightly more chance of getting wet at 8am so it would be best to come into work either earlier or later. 2am and 2pm also seem to be particularly wet times of day. The worst time of day to commute by bike or on foot, however is 8pm and this is particularly true for the months of March, April, October, and November. Luckily most of us are at home by then!

As yet we have no idea what causes these differences but the data certainly suggest that they exist.

The code used to prepare this is very similar to the ones we produced in our previous post here, although you will need the 'weatherData' package to pull the hourly rainfall 'event' data from Weather Underground into R. The hourly data is particularly huge in this case and I would suggest running it over a stable internet connection overnight or getting the datasets year by year and then 'rbind-ing' the data together.  The R code used to produce the above plot is as follows. 

# Load required libraries
# Download hourly weather data from Weather Underground 
we <- getWeatherForDate("WMKP", "2002-01-01","2013-12-31", opt_detailed=T, opt_custom_columns=T, custom_columns=c(11))
# Convert to characters
# Assign values of '1' to Rain Event and 'NA' to Non-Rain Event
we$Events[we$Events == "" ] <- NA
we$Events[we$Events == "Rain"] <- 1
we$Events[we$Events == "Rain-Thunderstorm"] <- 1
we$Events[we$Events == "Thunderstorm"] <- NA
# Convert to numeric
# Create date and time columns
we$year <- as.numeric(as.POSIXlt(we$Dates)$year+1900)
we$month <- as.numeric(as.POSIXlt(we$Dates)$mon+1)
we$monthf <- factor(we$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
we$weekday <- as.POSIXlt(we$Dates)$wday
we$weekdayf <- factor(we$weekday,levels=rev(0:6),labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")),ordered=TRUE)
we$week <- as.numeric(format(as.Date(we$Dates),"%W"))
we1$hour <- as.numeric(format(strptime(we1$Dates, format = "%Y-%m-%d %H:%M"),format = "%H"))
we1$min <- as.numeric(format(strptime(we1$Dates, format = "%Y-%m-%d %H:%M"),format = "%M"))
## Use only data on the hour
we1<- subset(we, min == 0)
we2 <- ddply(we1,.(monthf,hour),summarize, event = sum(Events,na.rm=T))
# Define colour palette
# Plot circular chart of rain event frequency
r1 <-  ggplot(we2, aes(x=monthf, y=hour, fill=event)) +
       geom_tile(colour="grey70") +
       scale_y_continuous(breaks = seq(0,23),
       "8:00pm","9:00pm","10:00pm","11:00pm")) +
       coord_polar(theta="x") +
       ylab("HOUR OF DAY")+
       xlab("Source: Weather Underground (2014)")+
       ggtitle("Rain Event Frequency by Month and Hour\n(Bayan Weather Station)\n")+
       guides(colour = guide_legend(show = FALSE)) +
       plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
       plot.margin = unit(c(0.5,0.5,0.5,0.5), "cm"))
# Save to png file
ggsave(r1,file="Rain_Event_Frequency_Plot.png", width=6, height=6, dpi=400,type="cairo-png")
Created by Pretty R at