This blog is run by Jason Jon Benedict and Doug Beare to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world. Jason currently works as a geospatial professional based in Malaysia. Doug lives in the United Kingdom and is currently Director of Globefish Consultancy Services which provides scientific advice to organisations that currently include the STECF [Scientific, Technical and Economic Committee for Fisheries, https://stecf.jrc.europe.eu/] and ICCAT, https://www.iccat.int/en/

Saturday, 19 July 2014

Examining seasonal & daily change in weather on Penang Island using circular plots


Key points of post

  • Circular plots are useful for summarizing weather data simultaneously exposing seasonal and daily changes;
  • The most uncomfortable (heat index) time of year for people living in Penang are the months of April and May although January and February are the hottest in absolute terms.

In this blog we demonstrate the use of circular plots which are handy for summarizing how any one variable is affected simultaneously by two others. The four circular plots below show how changes in average (based on the years 1980-2014) air temperature, wind-speed, relative humidity and heat index typically depend on both month and time of day. The circles are divided into 12 segments for each month, and into 24 concentric circles for each hour of the day, midnight being the center. The magnitudes of each variable are coded by color. The warmest months are February, March, and April, temperatures peaking between around 1pm and 4pm every day. The nights after about 11pm are clearly much cooler. Penang is only around 5 degrees north of the equator but there is nevertheless some variation around day-length and this is reflected in the relatively large light blue band on the outer most rings of the plot in October, November, and December. 

The patterns for wind speed are rather curious, presumably reflecting the Monsoons which typically switch between the North-East (December to February) and the South-West (April to August). Wind-speeds in Penang are apparently highest between January and March with a smaller peak seen in July and August.  They are also strongest between late morning and early afternoon in all seasons/months. Each day winds typically ‘get up’ around 11am blowing relatively strongly until around 5pm coinciding with the hottest part of the day.

Circular plots for selected weather variables showing how average (1980-2014) daily and seasonal patterns tend to covary


Typical seasonal daily changes in relative humidity are described by the third circular plot. Relative humidity is based on the difference between ‘dry bulb’ and ‘wet bulb’ air temperatures.  It describes how easily water can evaporate from any surface.  Since humans cool by sweating (the change in state of water as it evaporates from human skin causes a change in state from liquid to gas which requires a considerable amount of energy/heat, link to article on latent heat of evaporation) relative humidity is an important variable to consider.  High levels of relative humidity indicate that the air is already saturated with water vapour, and absorbing more will be difficult.  Similarly water will evaporate into air with low relative humidity more readily leading to improved function of the human air-conditioning system.  Air movement (or wind) also encourages evaporative loss from human skin improving cooling efficacy as anyone who has switched on a fan will know.

The combination of (dry bulb) air temperatures with data for relative humidity, and wind-speed facilitates the calculation of a ‘heat index’ which is more directly related to ‘human comfort’ than any one of its constituent variables. When the heat index is high the weather will be uncomfortable and, when low, more bearable.  In a previous blog post, we examined average air temperatures and rainfall in Penang suggesting that January and February were the worst times of year to visit Penang since they are usually the hottest and October the best since it is usually coolest. The heat index plotted in the circular plot suggests a slightly different story, however.  April and May have the highest heat indices and are probably potentially the most uncomfortable times of year to visit Penang while the entire period between July and December have similar heat indices.

To produce the plot above, we have again sourced the data from NOAA's National Climatic Data Centre with the hourly dataset coming from the Integrated Surface Database.

We also provide the R code as usual (below) if you would like to produce similar plots for your area of interest.

# Load package libraries
 
library(ggplot2)
library(scales)
library(plyr)
library(dplyr)
library(reshape2)
library(circular)
library(lubridate)
library(grid)
library(zoo)
library(gridExtra)
library(weathermetrics)
library(RColorBrewer)
 
# Setting work directory
 
setwd("d:\\ClimData")
 
# Reading and reformatting raw hourly data downloaded from NCDC
 
dat<-read.table("831677118578dat.txt",header=TRUE,fill=TRUE,na.strings=c("*","**","***","****","*****","******","0.00T*****"))
 
colnames(dat)<-tolower(colnames(dat))
 
Sys.setenv(TZ = "UTC")
dat$dates <- as.POSIXct(strptime(dat$yr..modahrmn,format="%Y%m%d%H%M"))  + 8 * 60 * 60
 
dat$year <- as.numeric(format(dat$dates,"%Y"))
 
dat$month <- as.numeric(format(dat$dates,"%m"))
 
dat$hour<-substring(as.character(dat$dates),12,13)
 
dat$min<-substr(dat$dates,15,16)
 
dat$time<-paste(dat$hour,dat$min,sep=":")
 
dat$tempc <- (dat$temp-32) * (5/9)
 
dat$dewpc <- (dat$dewp-32) * (5/9)
 
dat$tempc[dat$tempc<=10] <- NA
 
dat$tempc[dat$tempc>=40] <- NA
 
dat$dir[dat$dir == 990.0] <- NA 
 
dat$wspd <- (dat$spd)*0.44704
 
# Convert precipitation from inches to mms
 
dat$rain  <- dat$pcp24*25.4
 
# Calculate relative humidity & heat index using weathermetrics package
 
dat$rh <- dewpoint.to.humidity(t = dat$tempc, dp = dat$dewpc, temperature.metric = "celsius")
 
dat$hi <- heat.index(t = dat$tempc,rh = dat$rh,temperature.metric = "celsius",output.metric = "celsius",round = 2)
 
# Commands to reformat dates
 
dat$year <- as.numeric(as.POSIXlt(dat$dates)$year+1900)
dat$month <- as.numeric(as.POSIXlt(dat$dates)$mon+1)
dat$monthf <- factor(dat$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
dat$weekday <- as.POSIXlt(dat$dates)$wday
dat$weekdayf <- factor(dat$weekday,levels=rev(0:6),labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")),ordered=TRUE)
dat$yearmonth <- as.yearmon(dat$dates)
dat$yearmonthf <- factor(dat$yearmonth)
dat$week <- as.numeric(format(as.Date(dat$dates),"%W"))
dat$hour <- as.numeric(format(strptime(dat$dates, format = "%Y-%m-%d %H:%M"),format = "%H"))
 
dat <- ddply(dat,.(yearmonthf),transform,monthweek=1+week-min(week))
 
# Extract data from 1980 onwards
 
dat1 <- subset(dat, year >= 1980 )
 
# Summarize data for weather variables by hour and month
 
dat2 <- ddply(dat1,.(monthf,hour),summarize, wspd = mean(wspd,na.rm=T))
 
dat3 <- ddply(dat1,.(monthf,hour),summarize, temp = mean(tempc,na.rm=T))
 
dat4 <- ddply(dat1,.(monthf,hour),summarize, hi = mean(hi,na.rm=T))
 
dat6 <- ddply(dat1,.(monthf,hour),summarize, rh = mean(rh,na.rm=T))
 
## Plot Temperature circular Plot
 
p1 = ggplot(dat3, aes(x=monthf, y=hour, fill=temp)) +
  geom_tile(colour="grey70") +
  scale_fill_gradientn(colours = c("#99CCFF","#81BEF7","#FFFFBD","#FFAE63","#FF6600","#DF0101"),name="Temperature\n(Degree C)\n")+
  scale_y_continuous(breaks = seq(0,23),
                     labels=c("12.00am","1:00am","2:00am","3:00am","4:00am","5:00am","6:00am","7:00am","8:00am","9:00am","10:00am","11:00am","12:00pm",
                              "1:00pm","2:00pm","3:00pm","4:00pm","5:00pm","6:00pm","7:00pm","8:00pm","9:00pm","10:00pm","11:00pm")) +
  coord_polar(theta="x") +
  ylab("HOUR OF DAY")+
  xlab("")+
  ggtitle("Temperature")+
  theme(panel.background=element_blank(),
  axis.title.y=element_text(size=10,hjust=0.75,colour="grey20"),
  axis.title.x=element_text(size=7,colour="grey20"),
  panel.grid=element_blank(),
  axis.ticks=element_blank(),
  axis.text.y=element_text(size=5,colour="grey20"),
  axis.text.x=element_text(size=10,colour="grey20",face="bold"),
  plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
  plot.margin = unit(c(-0.25,0.1,-1,0.25), "in"),
  legend.key.width=unit(c(0.2,0.2),"in"))
 
p1
 
 
## Plot Wind Speed circular plot
 
p2 = ggplot(dat2, aes(x=monthf, y=hour, fill=wspd)) +
  geom_tile(colour="grey70") +
  scale_fill_gradientn(colours = rev(topo.colors(7)),name="Wind Speed\n(meter/sec)\n")+
  scale_y_continuous(breaks = seq(0,23),
                     labels=c("12.00am","1:00am","2:00am","3:00am","4:00am","5:00am","6:00am","7:00am","8:00am","9:00am","10:00am","11:00am","12:00pm",
                              "1:00pm","2:00pm","3:00pm","4:00pm","5:00pm","6:00pm","7:00pm","8:00pm","9:00pm","10:00pm","11:00pm")) +
  coord_polar(theta="x") +
  ylab("HOUR OF DAY")+
  xlab("")+
  ggtitle("Wind Speed")+
  theme(panel.background=element_blank(),
  axis.title.y=element_text(size=10,hjust=0.75,colour="grey20"),
  axis.title.x=element_text(size=7,colour="grey20"),
  panel.grid=element_blank(),
  axis.ticks=element_blank(),
  axis.text.y=element_text(size=5,colour="grey20"),
  axis.text.x=element_text(size=10,colour="grey20",face="bold"),
  plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
  plot.margin = unit(c(-0.25,0.25,-1,0.25), "in"),
  legend.key.width=unit(c(0.2,0.2),"in"))
 
p2
 
 
# Plot Heat Index circular plor
 
p3 = ggplot(dat4, aes(x=monthf, y=hour, fill=hi)) +
  geom_tile(colour="grey70") +
  scale_fill_gradientn(colours = c("#99CCFF","#81BEF7","#FFFFBD","#FFAE63","#FF6600","#DF0101"),name="Temperature\n(Degree C)\n")+
  scale_y_continuous(breaks = seq(0,23),
                     labels=c("12.00am","1:00am","2:00am","3:00am","4:00am","5:00am","6:00am","7:00am","8:00am","9:00am","10:00am","11:00am","12:00pm",
                              "1:00pm","2:00pm","3:00pm","4:00pm","5:00pm","6:00pm","7:00pm","8:00pm","9:00pm","10:00pm","11:00pm")) +
  coord_polar(theta="x") +
  ylab("HOUR OF DAY")+
  xlab("")+
  ggtitle("Heat Index")+
  theme(panel.background=element_blank(),
  axis.title.y=element_text(size=10,hjust=0.75,colour="grey20"),
  axis.title.x=element_text(size=7,colour="grey20"),
  panel.grid=element_blank(),
  axis.ticks=element_blank(),
  axis.text.y=element_text(size=5,colour="grey20"),
  axis.text.x=element_text(size=10,colour="grey20",face="bold"),
  plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
  plot.margin = unit(c(-0.5,0.25,-0.25,0.25), "in"),
  legend.key.width=unit(c(0.2,0.2),"in"))
 
p3
 
 
# Plot Relative Humidity circular plor
 
col<-brewer.pal(11,"Spectral")
 
p4 = ggplot(dat6, aes(x=monthf, y=hour, fill=rh)) +
  geom_tile(colour="grey70") +
  scale_fill_gradientn(colours=col,name="Relative Humidity\n(%)\n")+
  scale_y_continuous(breaks = seq(0,23),
                     labels=c("12.00am","1:00am","2:00am","3:00am","4:00am","5:00am","6:00am","7:00am","8:00am","9:00am","10:00am","11:00am","12:00pm",
                              "1:00pm","2:00pm","3:00pm","4:00pm","5:00pm","6:00pm","7:00pm","8:00pm","9:00pm","10:00pm","11:00pm")) +
  coord_polar(theta="x") +
  ylab("HOUR OF DAY")+
  xlab("")+
  ggtitle("Relative Humidity")+
  theme(panel.background=element_blank(),
  axis.title.y=element_text(size=10,hjust=0.75,colour="grey20"),
  axis.title.x=element_text(size=7,colour="grey20"),
  panel.grid=element_blank(),
  axis.ticks=element_blank(),
  axis.text.y=element_text(size=5,colour="grey20"),
  axis.text.x=element_text(size=10,colour="grey20",face="bold"),
  plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"),
  plot.margin = unit(c(-0.5,0.1,-0.25,0.25), "in"),
  legend.key.width=unit(c(0.2,0.2),"in"))
 
p4
 
## Plot and export to png file
 
png(file="Weather_Variables_Circular_Plot.png",width = 12, height = 10, units = "in",
    bg = "white", res = 400, family = "", restoreConsole = TRUE,
    type = "cairo")
 
grid.arrange(p1,p2,p4,p3,nrow=2,main=textGrob("\nWeather Variables by Month and Hour\n(Bayan Lepas Weather Station)",
gp=gpar(fontsize=18,col="grey20",fontface="bold")),sub=textGrob("Source: NOAA National Climatic Data Centre (NCDC)\n",
                                                                                                              gp=gpar(fontsize=9,col="grey20",fontface="italic")))
 
dev.off()
Created by Pretty R at inside-R.org

Friday, 11 July 2014

Hot events in Malaysia

Key points of post

  • On Penang Island, the number of particularly hot days at or above 35⁰C has reduced since 1975, while near Kuala Lumpur, in the Klang Valley (Subang Weather Station) the trend has been strongly upwards.
  • Long-term changes in average temperatures are not necessarily related to the number of hot events.

Commonly in reports and papers about climate change it is stated that, ‘climate variability and the number extreme events (very hot days, typhoons) has increased in the past and will continue to do so in the future’ but we wonder whether or not it is actually true everywhere.  

Jason recently chanced upon a post published on RPubs by James Elsner that produced analyses to examine this question. We decided to adopt this approach to examine whether or not the frequency of exceptionally hot days on Penang Island has actually gone up over the last few decades.  We know from our previous post that average air temperatures have increased and I (Doug) naively assumed that this must be true also for extremely hot days. Not so, however. In the picture below, using data from Bayan Lepas airport, we plot the number of days when air temperature was at or above 35C at least once (top), the average duration of those hot periods (middle), and the number of such hot events with a line summarizing long-term trend (bottom). Clearly the number of such extreme events has gone down, contrary to popular wisdom about the effects of climate change. Interestingly the middle plot which summarizes successive days which had maximum temperatures of 35⁰C or more show that the average is only 1 day, the exception occurring in 2002 which had an average of 2. Surprisingly the average temperatures as shown in our post from February, were unexceptional.





But have the numbers of hot events decreased everywhere in Malaysia? At the Subang Weather Station, near Kuala Lumpur the answer is an emphatic, “No”. In 1998 which was a hot year (also see average air temperatures) there were 86 days when temperatures (at least once) were at or above 35C when the average contiguous duration of hot days was ~ 2.5.  We don’t know the reason for these differences. The point is to be aware that these sort of differences exist. Subang located in the Klang Valley (refer to location map above) is inland, and is much less influenced by the moderating effects of the sea than Penang Island.  Differential ‘urban heat island’ effects may also be a factor. Or it could even be climate change.

I know where I’d rather live!

The R code used to produce the plots are shown below and the hourly weather data was again sourced from NOAA's National Climatic Data Centre (NCDC).

# Load required packages
 
library(ggplot2)
library(scales)
library(plyr)
library(dplyr)
library(reshape2)
library(lubridate)
library(grid)
library(zoo)
library(gridExtra)
 
# Setting work directory
 
setwd("d:\\ClimData")
 
# Reading and reformatting raw daily data downloaded from NCDC
 
dat<-read.table("CDO2812586929956.txt",header=F,skip=1)
 
colnames(dat)<-c("stn","wban","yearmoda","temp","tempc","dewp","dewpc","slp","slpc","stp","stpc","visib","visibc","wdsp","wdspc","mxspd","gust","maxtemp","mintemp","prcp","sndp","frshtt")
 
dat$yearmoda <- strptime(dat$yearmoda,format="%Y%m%d")
 
# Create date columns
 
dat$dates <- as.Date(dat$yearmoda)
 
dat$year <- as.numeric(format(dat$yearmoda,"%Y"))
dat$month <- as.numeric(format(dat$yearmoda,"%m"))
dat$day <- as.numeric(format(dat$yearmoda,"%d"))
dat$monthf <- factor(dat$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
 
# Remove erronous values & convert units
 
dat$maxtemp[dat$maxtemp == 9999.9 ] <- NA
dat$mintemp[dat$mintemp == 9999.90 ] <- NA
 
dat$tempdc <- (dat$temp-32) * (5/9)
 
dat$maxtemp <- gsub("[*]","", dat$maxtemp) 
dat$mintemp <- gsub("[*]","", dat$mintemp) 
 
dat$maxtemp <- as.numeric(dat$maxtemp)
dat$mintemp <- as.numeric(dat$mintemp)
 
dat$maxtempc <- (dat$maxtemp-32) * (5/9)
dat$mintempc <- (dat$mintemp-32) * (5/9)
 
dat$maxtempc[is.na(dat$maxtempc)] <- as.numeric(dat$tempdc[is.na(dat$maxtempc)])
 
dat$mintempc[is.na(dat$mintempc)] <- as.numeric(dat$tempdc[is.na(dat$mintempc)])
 
str(dat)
 
dat<- dat[-c(3)]
 
# Subset data from 1975 onwards
 
dat <- subset(dat, year > 1975)
 
## Create a data frame table
 
dat = tbl_df(dat)
 
hd = dat %>% 
  group_by(year) %>%
  summarize(N35 = sum(maxtempc >= 35, na.rm = TRUE),
            N34 = sum(maxtempc >= 35, na.rm = TRUE),
            avgTmaxC = mean(maxtempc, na.rm = TRUE))
 
c1 <- ggplot(hd, aes(x = year, y = N35, fill = N35)) + 
      theme_bw() + 
      geom_bar(stat='identity') + 
      scale_fill_continuous(low='orange', high='red') +
      geom_text(aes(label = N35), vjust = 1.5, size = 3) +
      scale_x_continuous(breaks = seq(1975, 2015, 5)) +
      ylab("Days\n") +
      xlab("") +
      ggtitle("Number of days at or above 35°C\n")+
      theme(axis.text.x  = element_text(size=11), legend.position="none",
      panel.background=element_blank(),
      axis.title.x=element_text(size=7,colour="grey20"),
      panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
      axis.text.x=element_text(size=10,colour="grey20",face="bold"),
      plot.title = element_text(lineheight=1, face="bold",size = 12, colour = "grey20"))
 
c1
 
# Determine hot events
 
hotEvents = rle(dat$maxtempc >= 35)
eventLength = hotEvents$lengths[hotEvents$values]
eventNo = rep(1:length(eventLength), eventLength)
Events.df = subset(dat, maxtempc >= 35)
Events.df$eventNo = eventNo
 
# Determine the number of days between successive 35C days. Add this as another column.
 
t1 = Events.df$dates[-length(Events.df$dates)]
t2 = Events.df$dates[-1]
dd = difftime(t2, t1, units = "days")
Events.df$dbe = c(NA, dd)
Events.df = Events.df %>% select(maxtempc, mintempc, dates, year, month, eventNo, dbe)
 
# Length and intensity of events
 
LI.df = Events.df %>%
  group_by(eventNo) %>%
  summarize(eventLength = length(maxtempc),
            avgEventT = mean(maxtempc),
            maxEventT = max(maxtempc),
            whenMaxEvT = which.max(maxtempc),
            Year = year[1])
 
cor(LI.df$eventLength, LI.df$maxEventT)
 
ggplot(LI.df, aes(x = eventLength, y = whenMaxEvT)) +
  geom_point() +
  geom_smooth(method = "lm") +
  xlab("Event Length (days)") +
  ylab("Day of Event When Maximum Occurs") +
  scale_x_continuous(breaks = 1:10) +
  theme_bw()
 
# Average length of hot events
 
LI.df2 = LI.df %>%
  group_by(Year) %>%
  summarize(count = length(Year),
            avgEL = mean(eventLength))
 
AllYears = data.frame(Year = 1975:2014)
LI.df3 = merge(AllYears, LI.df2, by = "Year", all.x = TRUE)
LI.df3$count[is.na(LI.df3$count)] = 0
 
# Number of hot events

suppressMessages(library(MASS))
 
ne <- ggplot(LI.df3, aes(x = Year, y = count)) +
      geom_bar(stat = "identity") +
      ylab("Count\n") +
      scale_x_continuous(breaks = seq(1975, 2015, 5)) +
      ggtitle("Number of Hot Events in Penang since 1975\n")+
      stat_smooth(method = "glm.nb",formula = y ~ x, data = LI.df3, se = TRUE) + 
      theme_bw()+
      theme(legend.position = "none",
      axis.text.x  = element_text(size=11), legend.position="none",
      panel.background=element_blank(),
      axis.title.x=element_text(size=12,colour="grey20"),
      panel.grid.major = element_line(colour = "grey",size=0.25,linetype='longdash'),
      axis.text.x=element_text(size=10,colour="grey20",face="bold"),
      plot.title = element_text(lineheight=1, face="bold",size = 12, colour = "grey20"))
 
ne
 
var(LI.df3$count)/mean(LI.df3$count)
 
summary(glm.nb(count ~ Year, data = LI.df3))
 
## Export file to png
 
png(file="Hot_Events_Penang_1975-2014.png",width = 12, height = 10, units = "in",
    bg = "white", res = 500, family = "", restoreConsole = TRUE,type = "cairo")
 
grid.arrange(c1,el,ne,ncol=1,main=textGrob("\nHot Events in Penang (Bayan Weather Station) from 1975 - 2014",gp=gpar(fontsize=18,col="grey20",fontface="bold")),sub=textGrob("Source: NOAA National Climatic Data Centre (NCDC)\n",gp=gpar(fontsize=9,col="grey20",fontface="italic")))
dev.off()
Created by Pretty R at inside-R.org