This blog is run by Jason Jon Benedict and Doug Beare to share insights and developments on open source software that can be used to analyze patterns and trends in in all types of data from the natural world. Jason currently works as a geospatial professional based in Malaysia. Doug lives in the United Kingdom and is currently Director of Globefish Consultancy Services which provides scientific advice to organisations that currently include the STECF [Scientific, Technical and Economic Committee for Fisheries, https://stecf.jrc.europe.eu/] and ICCAT, https://www.iccat.int/en/

Showing posts with label Haze. Show all posts
Showing posts with label Haze. Show all posts

Friday, 6 March 2015

Will I choke? – when is the best time for outdoor activities on Penang Island

Key points of post
  • To assess whether air pollution index (API) levels in Penang during 2014 are unhealthy
  • To assess how API varies seasonally and diurnally

Last year our blog post on air quality on Penang Island was particularly popular. In that post we used a comparatively crude ‘visibility index’ collected at Penang International Airport since 2001.  The data suggested that pollution (visibility) had gotten worse since 2001 and that it tended to be worse in June and July.

We discovered recently, however, that the Malaysian Government’s Department of the Environment (DOE) collects much more detailed data on air pollution at 52 stations across the country, see http://bit.ly/1wUMxYG.  Conveniently, one of the stations is located at Universiti Sains Malaysia (USM) and we've managed to acquire this data scraped off the DOE site by Keith Rozario (nice work Keith!) using python and MySQL. He provides a nice summary of the nationwide API data from August 2013 to February 2015 and the link to all the scripts he used (on github) to scrape the data on his site - please see http://bit.ly/1EnwTLs. He was also generous enough to make a zipped copy of the dataset available for the public to use under the Creative Commons Attribution 4.0 license.

Inspired by an analysis (http://bit.ly/1B6o2eM) by ‘Typhoon Tony’ from Taiwan, we sought to investigate the magnitude of air pollution on Penang Island, and how it varies seasonally and diurnally

Air pollution is typically quantified using the air pollution index (API) which is calculated from the following five major air pollutants: sulphur dioxide, nitrogen dioxide, carbon monoxide, particulate matter, and ozone, for details see http://bit.ly/1wUMxYG. APIs between 101 and 200 are considered to be ‘unhealthy’ especially for those with existing heart and/or lung problems.

So how does Penang stack up? We’re always hearing how terrible pollution is on the island, but is it actually true ? In the ‘calendar’ plot below, we merged the API data with hourly observations on wind-speed and direction from Weather Underground.  The seasonality in API is clear. We had high API values in February and March 2014, but the highest were in June and July which corresponds to the south-west monsoon transporting smoke from forest fires in Sumatera.  


We summarize diurnal variability in API (and to a certain extent seasonality too) in the ‘rose’ plot below.  It suggests that only rarely can air pollution on the island be considered ‘unhealthy’ since it is typically below 100.  If you are, however, concerned your best bet for strenuous outdoor activities is during the night or in the morning up until 1pm when levels of API start to increase.  The least healthy time of day to be out is rush hour between 4pm and 6pm. You may also want to avoid February, March, June and July!



Do note, however, that the API monitoring station at USM is relatively isolated from the island’s urban pell-mell and may not be a good overall reflection/summary of air pollution on the island. 

The R code used to produce the plots shown above are provided below. We are currently working on trying to scrape the API data off the Malaysian DOE website using R (which would make things a lot more convenient!) and once we've managed to do that, we will update the code accordingly.

# Load required libraries
 
library(ggplot2)
library(reshape)
library(lubridate)
library(zoo)
library(openair)
library(scales)
library(grid)
library(extrafont)
library(weatherData)
 
loadfonts()
 
# Set working directory
 
setwd("D:/Projects/AP/ReadingsByRegion/")
 
# Read data from csv file 
 
df <- read.table('USM.csv',header=F,sep=":")
 
# Rename columns
 
colnames(df) <- c("station_id","station_name","dates","hour","api","unk","unk1")
 
# Remove last 2 columns - not relevant information
 
df <- df[-c(6,7)]
 
# Add minute column
 
df$minute <- as.character("00")
 
df$time <- as.character(paste(df$hour,df$minute,sep = ":" ))
 
df$date <- as.POSIXct(paste(df$dates,df$time, format = "%Y-%m-%d %H:%M" ))
 
# Convert 0 to NA
 
df$api[df$api == 0] <- NA
 
# Date and time columns
 
df$year<-as.numeric(as.POSIXlt(df$date)$year+1900)
df$month<-as.numeric(as.POSIXlt(df$date)$mon+1)
df$monthf<-factor(df$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
df$weekday = as.numeric(format(as.POSIXlt(df$date),"%u"))
df$yearmonth<-as.yearmon(df$dates)
df$yearmonthf<-factor(df$yearmonth)
df$week <- as.numeric(format(as.POSIXlt(df$date),"%W"))
df$day = strftime(df$date,'%A')
df$jday <- strptime(df$date, "%Y-%m-%d")$yday+1
df$hour <- as.numeric(format(strptime(df$date, format = "%Y-%m-%d %H:%M"),format = "%H"))
 
 
##### Creating roseplot of pollutant magnitude over 24 hours #####
 
# Subset data for 2014
 
df2014 <- subset(df,year==2014)
 
# Set Color Table
 
colortable = c("#99FFFF", "#00FFFF", "#00FF00", "#CCFF33", "#FFFF00",
               "#FFCC00", "#FF6600", "#FF3333", "#FF33CC", "#660033")
 
# Cut data into ten parts
 
STN = "USM"
Time_of_Day = df2014$hour[df2014$station_name==STN]
mag = cut_number(round(df2014$api,100)[df2014$station_name==STN],n = 10)
rosedata = data.frame(dir=Time_of_Day,mag=mag)
 
# Plot rose chart
 
rc <- ggplot(rosedata,aes(x=Time_of_Day,fill=mag))+  geom_bar(binwidth = 1, drop = TRUE) +
      coord_polar() + xlab("") + ylab("") + 
      scale_x_continuous(breaks = seq(0,23),
      labels=c("12.00am","1:00am","2:00am","3:00am","4:00am","5:00am","6:00am",
      "7:00am","8:00am","9:00am","10:00am","11:00am","12:00pm","1:00pm",
      "2:00pm","3:00pm","4:00pm","5:00pm","6:00pm","7:00pm","8:00pm","9:00pm",
      "10:00pm","11:00pm")) +
      ggtitle("\nDaily API readings in 2014") +  scale_fill_manual(values=colortable,name="API Reading")+
      theme(panel.background=element_blank(),
      axis.title.y=element_text(size=10,hjust=0.75,colour="grey20"),
      axis.title.x=element_text(size=7,colour="grey20"),
      legend.title=element_text(size=13,colour="grey20",face="bold",family="Myriad Pro"),
      legend.text=element_text(size=11,colour="grey20",face="bold",family="Myriad Pro"),
      panel.grid=element_blank(),
      axis.ticks=element_blank(),
      axis.text.y=element_text(size=10,colour="grey20",family="Clear Sans"),
      axis.text.x=element_text(size=10,colour="grey20",face="bold",family="Clear Sans"),
      plot.title = element_text(lineheight=1, face="bold",size = 20, colour = "grey20",family="Graphik-Black"),
      plot.margin = unit(c(-0.5,0.25,-0.25,0.25), "in"),
      legend.key.width=unit(c(0.2,0.2),"in"))
 
rc
 
# Save plot to png
 
ggsave(file="D:/rc.png",dpi=300,w=10,h=8,unit="in",type="cairo-png")
 
 
### Reading weather data from Weather Underground using weatherData library ####
 
# Get weather data for 2014
 
we <- getWeatherForDate("WMKP", "2014-01-01","2015-12-31", opt_detailed=T, opt_custom_columns=T, custom_columns=c(1,6,7,8,9,13,14))
 
# Rename columns
 
colnames(we) <- c("date","date_myt","vis_km","wda","ws","gspd_kmh","wd","date_utc")
 
# Create date and time columns
 
we$date<-as.POSIXct(we$date)
we$hour <- as.numeric(format(strptime(we$date, format = "%Y-%m-%d %H:%M"),format = "%H"))
we$min <- as.numeric(format(strptime(we$date, format = "%Y-%m-%d %H:%M"),format = "%M"))
 
# Only use data on the hour
 
we_df<- subset(we, min == 0)
 
# Remove duplicate data
 
we_df$date[duplicated(we_df$date)]
 
# Merge API and weather data
 
we_api <- merge(we_df, df2014, by=c("date","date")) 
 
# Reformat yearmonth column for use in openair library
 
we_api$yearmonth <- as.factor(we_api$yearmonth)
 
# Reformat wind speed column
 
we_api$ws[we_api$ws == "Calm"] <- 0
we_api$ws <- as.numeric(we_api$ws)
 
# Plot 'calendar plot' and save as png
 
png(filename = "D:/Projects/AP/USM_API_2014_CalendarPlot.png",height=8,width=10,
    bg = "white",units='in', res = 400, family = "",  type = "cairo-png")
 
calendarPlot(we_api, pollutant = "api",year="2014",
             main = "Daily average API readings at USM\n
             with wind-speed scaled wind direction overlay (2014)\n",
             key.header = "API\n(Air Pollutant Index)",
             cols="heat", annotate = "ws")
 
dev.off()
Created by Pretty R at inside-R.org

Friday, 8 August 2014

Blowing in the wind - How many times must we keep experiencing the haze before the Governments do something about it?

Key points of post

  • Visibility in Penang is affected by the seasonal monsoon winds
  • Visibility has got worse since 2001.
  • The governments of south-east Asia must urgently solve this potentially serious health issue.
Every year South-East Asians suffer from the effects of fires burning in the forests of Indonesia. The more intense fires are lit deliberately to clear land for palm oil plantations. It is illegal to use fire to clear land in Indonesia, but the practice is unfortunately widespread, leading to habitat destruction and loss of biodiversity. A satellite image captured by the Moderate Resolution Imaging Spectroradiomater (MODIS) from NASA below for March 2014 illustrates when the fires were particularly bad (fires shown in red). 

Source: NASA Earth Observatory (2014) - http://1.usa.gov/1kqfdYA

Most fires occur during the dry season between April and October. The fires in 2014 started in February, and actually forced Indonesia to declare a state of emergency because of poor air quality. 

But what about the situation for us in Penang ? Our island is subject to two seasonal monsoon winds: the north-east and the south-west. These patterns are clear from the wind-rose diagram below which shows the average monthly wind-speeds and directions for the period 1949-2014. 


Given the location of Indonesia relative to Penang, it follows that visibility will be worst during the south-west monsoon which blows hardest, and most consistently, between May and August each year. Conversely we might expect visibility to be best during the north-east monsoon. This is indeed exactly what we observe. Average visibility readings from Bayan Lepas International Airport by week and year 2001-2014 are plotted below. The white squares represent good visibility and cleaner air; the dark squares poor visibility and high levels of air pollution.


It is clear that visibility has gotten worse since 2001, ie. the numbers of white squares have decreased. The pattern is also strongly seasonal which reflects: (a) the monsoons; and (b) the fact that fires are used more in Indonesia’s dry season. The relatively long periods of good visibility (>10 km's) that Penangites must have enjoyed especially during January, November and December seem now to be a thing of the past. I’m (Doug) a keen hiker regularly gazing down on the Island from the heights of Tea Station 5, 89, and Penang Hill. I’ve only been here since January 2012 and, judging from these data, I guess I've probably never seen a really good view!

Let’s hope the Government of Malaysia / Penang will continue to apply pressure and work with the Indonesians and other countries in the region to solve this terrible problem.

If you're keen on producing similar plots like the ones we've included in this blog post, you can use the code provided below. The wind-rose plot was produced using the 'openair' package for R which is an open-source air pollution analysis tool. There are a number of useful plots that can be produced using this package to show magnitude and direction of wind / air pollutants which you can further explore on your own. We also provide you with the code to produce the 'heatmap' plot of the visibility data which was pulled into R from Weather Underground (using the 'weatherData' package).  

# Load libraries
 
library(ggplot2)
library(RColorBrewer)
library(openair)
library(weatherData)
library(mgcv)
library(scales)
library(plyr)
library(reshape2)
library(circular)
library(gridExtra)
library(lubridate)
library(weathermetrics)
library(zoo)
 
# Setting work directory
 
setwd("d:\\ClimData")
 
# Reading and reformatting raw daily data downloaded from NCDC
 
dat<-read.table("2422706962434dat.txt",header=TRUE,fill=TRUE,na.strings=c("*","**","***","****","*****","******","0.00T*****"))
 
colnames(dat)<-tolower(colnames(dat))
 
Sys.setenv(TZ = "UTC")
dat$date <- as.POSIXct(strptime(dat$yr..modahrmn,format="%Y%m%d%H%M"))  + 8 * 60 * 60
 
dat$dir[dat$dir == 990.0] <- NA 
 
# Convert windspeed in miles/hour to metres/second
 
dat$wspd <- (dat$spd)*0.44704
 
# Create columns for year, month and dat
 
dat$year <- as.numeric(format(dat$date,"%Y"))
 
dat$month <- as.numeric(format(dat$date,"%m"))
 
dat$day <- as.numeric(format(dat$date,"%d"))
 
# Subset year of interest
 
datsub <- subset(dat, year >= 1949 )
 
# Remove NA's
 
datsub$dir[is.na(datsub$dir)] <- 0
 
# Plot and produce png file of Wind Rose
 
png(filename = "Monthly_WindRose_Penang_1949-2014.png",height=10,width=10,
    bg = "white",units='in', res = 600, family = "", restoreConsole = TRUE,
    type = "cairo-png")
 
windRose(datsub,ws="wspd",wd="dir",bias.corr = TRUE,border=TRUE,type = "month",width = 0.5,grid.line = 10,
         statistic = "prop.mean",offset = 10, paddle =FALSE,cols="hue",annotate=FALSE,auto.text=FALSE,
         ws.int=2,breaks=c(0,2,4,6,8,10,12),key = TRUE, key.footer = "(meter/second)", key.position = "bottom",na.action=NULL,
         key.header = "Wind Speed",main="       Average Monthly Wind Rose plots for Penang (Bayan Station)\n from years 1949 - 2014 \n",font.main=1,
         sub="\nData source: Integrated Surface Database (ISD) - National Climatic Data Centre (NCDC)",font.sub=2)
 
dev.off()
 
###########################################
#### Produce visibility 'heatmap' plot ####
###########################################
 
### Find required 4 letter station code using function below
 
# Example getStationCode("Penang") or getStationCode("Honiara")
 
# Information can also be acquired from the following link - http://weather.rap.ucar.edu/surface/stations.txt
 
getStationCode("Butterworth")
 
# Use station code below to get required plot data and parameters
 
station.id="WMKP"
 
s<-getStationCode(station.id)
 
s1<-(strsplit(s,split= " "))[[1]]
 
station.name<-paste(s1[4],s1[5])
 
### Getting summarized weather data for WU
 
ws2001<-getSummarizedWeather(station.id, "2001-01-01", "2001-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2002<-getSummarizedWeather(station.id, "2002-01-01", "2002-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2003<-getSummarizedWeather(station.id, "2003-01-01", "2003-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2004<-getSummarizedWeather(station.id, "2004-01-01", "2004-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2005<-getSummarizedWeather(station.id, "2005-01-01", "2005-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2006<-getSummarizedWeather(station.id, "2006-01-01", "2006-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2007<-getSummarizedWeather(station.id, "2007-01-01", "2007-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2008<-getSummarizedWeather(station.id, "2008-01-01", "2008-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2009<-getSummarizedWeather(station.id, "2009-01-01", "2009-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2010<-getSummarizedWeather(station.id, "2010-01-01", "2010-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2011<-getSummarizedWeather(station.id, "2011-01-01", "2011-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2012<-getSummarizedWeather(station.id, "2012-01-01", "2012-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2013<-getSummarizedWeather(station.id, "2013-01-01", "2013-12-31", opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
ws2014<-getSummarizedWeather(station.id, "2014-01-01", Sys.Date(), opt_custom_columns=T,
custom_columns=c(1:23), opt_verbose=T)
 
ws<-rbind(ws2001,ws2002,ws2003,ws2004,ws2005,ws2006,ws2007,ws2008,ws2009,ws2010,ws2011,ws2012,ws2013,ws2014) 
 
summary(ws)
 
str(ws)
 
### List of variables from Wunderground ###
 
# [1] "MYT"                        "Max_TemperatureC"           "Mean_TemperatureC"          "Min_TemperatureC"          
# [5] "Dew_PointC"                 "MeanDew_PointC"             "Min_DewpointC"              "Max_Humidity"              
# [9] "Mean_Humidity"              "Min_Humidity"               "Max_Sea_Level_PressurehPa"  "Mean_Sea_Level_PressurehPa"
# [13] "Min_Sea_Level_PressurehPa"  "Max_VisibilityKm"           "Mean_VisibilityKm"          "Min_VisibilitykM"          
# [17] "Max_Wind_SpeedKm_h"         "Mean_Wind_SpeedKm_h"        "Max_Gust_SpeedKm_h"         "Precipitationmm"           
# [21] "CloudCover"                 "Events"                     "WindDirDegrees"            
 
colnames(ws)<-c("date","date1","maxtemp","meantemp","mintemp","dewp","meandewp","maxdewp","maxhum","meanhum","minhum","maxslp","meanslp",
                "minslp","maxvsb","meanvsb","minvsb","maxwspd","meanwspd","maxgust","prcp","cc","events","wd")
 
## Adding date columns

ws$dates <- as.Date(ws$date)
ws$year <- as.numeric(as.POSIXlt(ws$dates)$year+1900) ws$month <- as.numeric(as.POSIXlt(ws$dates)$mon+1) ws$monthf <- factor(ws$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE) ws$weekday <- as.POSIXlt(ws$dates)$wday ws$weekdayf <- factor(ws$weekday,levels=rev(0:6),labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")),ordered=TRUE) ws$yearmonth <- as.yearmon(ws$date) ws$yearmonthf <- factor(ws$yearmonth) ws$week <- as.numeric(format(as.Date(ws$dates),"%W")) ws$weekf<- factor(ws$week) ws$jday<-yday(ws$dates)   # Define colour palette   col<-c("black","grey10","grey20","grey50","white")   # Plot 'heatmap'   v <- ggplot(data=ws,aes(x=week,y=year,fill=meanvsb))+ geom_tile(colour="black",size=0.65)+ theme_bw()+ scale_fill_gradientn(colours=col,name="Visibility\n(km)\n")+coord_equal(ratio=1)+ ylab("YEAR\n")+ xlab("\nWEEK OF YEAR\n\nSource: Weather Underground (2014)")+ scale_y_continuous(expand = c(0,0),breaks = seq(2000, 2015, 1)) + scale_x_discrete(expand = c(0,0),breaks = seq(0,52,2))+ ggtitle("Average weekly visibility in Penang\n")+ theme(panel.background=element_rect(fill="transparent"), panel.border=element_blank(), axis.title.y=element_text(size=10,colour="grey20"), axis.title.x=element_text(size=10,colour="grey20"), axis.text.y=element_text(size=10,colour="grey20",face="bold"), axis.text.x=element_text(size=10,colour="grey20",face="bold"), plot.title = element_text(lineheight=1.2, face="bold",size = 14, colour = "grey20"), panel.grid.major = element_blank(), legend.key.width=unit(c(0.2,0.2),"in"))   v   # Save plot to png   ggsave(v, file="Penang_Weekly_Average_Visibility.png", width=15, height=5,dpi=400,type = "cairo-png")
Created by Pretty R at inside-R.org