About the Exercise
In this exercise, we will explore building part of the data analysis tool Indiana Precipitation Explorer (INCIP), which is available on the myGeoHUB.org website at https://mygeohub.org/tools/incip. Along the way, we will get to know a few core components of the R Programming Language like the data frame data structure, learn a little bit about manipulating data with R using the dplyr library, and build a web application interface that can be deployed on the HUB.
In the previous presentation, you saw examples of Shiny applications, and RMarkdown Documents that could be compiled into static web pages, used as notebooks for a more literate style of computing, or combined with Shiny widgets to create interactive documents. Today we will use historical precipitation data from the National Oceanic and Atmospheric Administration (NOAA) to build a map showing the mean hourly precipitation for hours with measurable precipitation. As we transform our map, we’ll keep the old versions and, in the end, we’ll build a web page to display them using one of the RMarkdown templates.
RStudio IDE And RMarkdown Tips
If you are new to the RStudio IDE, here are some tips to get you started. If you know all this stuff, skip down to the next section labeled 1. Loading the Data.
Expand the Code Pane
When you first open RStudio, give yourself some room by expanding the code pane. Press console pane’s minimize button to shrink it and pull the main horizontal divider to the right to expand the code pane.
Markdown and Code Chunks
The RMarkdown document we will be editing has both Markdown and code chunks in it. The markdown looks like plain text or wiki syntax. The code chunks start and end with three backticks and have a grey background, like this:
```{r}
I'm a code chunk!
```
In a code chunk, there are two controls you can use to run code. They are both located in the upper right corner of the code chunk. Pressing the button will execute all code chunks above the current chunk. Pressing the button will execute the current code chunk.
Knitting the Document
To compile or render the document, use the Knit () button located just above the Code Pane. This will turn your completed RMarkdown file into a HTML document and launch the document in a new window for you to view.
Clear the Session’s Variables
If you run the code chunks out of order, you may find that some of the variables have been modified unexpectedly. This could be the cause of maps not showing up correctly in this tutorial. One possible resolution is to clear the session’s variables and run all of the code chunks again in order. Use the Session->Clear Workspace menu item from the top of the RStudio IDE to clear out the values for all variables. Then press the button to run the code chunks above the current chunk, and press the button to run the current code chunk.
1. Loading the data
The precipitation data is held in comma separated value (CSV) files that are stored on MyGeoHUB. From the RStudio IDE running on MyGeoHUB, we can access the files in the directory /data/tools/incip/indiana_precipitation
. Inside the tutorial’s setup code chunk, on line 19, you’ll see the variable precip.dir is setup to hold the path of the directory where the data files live. It looks like this:
# setup the directory where the precipitation data lives
data.dir = "/data/tools/incip"
precip.dir = file.path(data.dir,'indiana_precipitation')
The directory holds a file named stations.csv.clean with the id, name, latitude, and longitude for each station. On lines 25-27, the CSV file is read into a data structure called a Data Frame. You can think of a Data Frame as a table that you can query. The R language’s dplyr library has a number of functions, like filter
, mutate
, and select
, that are helpful for querying and manipulating information in data frames.
The stations.df data frame looks like this:
Along with the stations.csv.clean file, the precip.dir directory holds the precipitation data files for each station. The data files have two columns, Time and Precipitation. Lines 29-59 read in the precipitation data files and merge them with information about the monitoring station like the id, name, latitude and longitude. On lines 52-54, we use the filter
function find the row with information about the monitoring station we are currently processing and the mutate
function to add a new column, named data, to the row. At the end we store the modified row in a the precip.details.df data frame.
Our new data frame, precip.details.df, holds station information and precipitation data in a single location. If we take a look at it, we see that the precipitation data is stored in a column named data with the type list. We can index into the list to access the Time and Precipitation columns of the data frame for each monitor station. This is what precip.details.df looks like:
The monitor station information comes with longitude and latitude coordinates. Let’s see what the monitor station locations look like when plotted on a map. We can use the R’s Leaflet library to setup a view of the state of Indiana and place markers at the location of each monitor station. By default, the markers will be colored blue.
Your Turn: Display the Station Locations
Find the code chunk labeled create_map_object, around line 68, and execute it pressing the green play button () in the upper right corner of the chunk. Your map will show up underneath the code chunk and should look like this:
2. Calculating Precipitation Means
Seeing the locations of each monitor station only tells half the story. It would be interesting to finding out if there are trends in precipitation amounts across the state. Since we have the precipitation data for each monitor station, we can find the average hourly precipitation over the years the data was collected. Then we can color each marker based on this average to help us visually compare the the precipitation around the state.
Let’s start by updating our data frame, precip.details.df, with the mean precipitation for each station. Here is a reminder of what the data frame looks like:
In the code chunk labeled calculate_precipitaion_means, line 87, we calculate the mean of the precipitation values held in the data column and store the values back into our data frame in a column named mean. The updated data frame is shown below.
precip.details.df %<>%
group_by(id) %>%
mutate(mean = mean(data[[1]][2]$Precipitation)) %>%
ungroup()
precip.details.df
Your Turn: Calculate the Means
For the code chunk labeled calculate_precipitation_means, press the button to run the code chunk and update the data frame. There is no output from this code chunk.
A convenient way of displaying the mean precipitation data on the map is to color the markers based on a range of precipitation values. This allows users to visually inspect the map and estimate which stations may have a higher or lower mean precipitation. Let’s use the colorBin() function to group the mean values into five equal sized bins and assign a color to each bin. The colorBin() function accepts a color palette, a list of values, and the number of bins to group values into. It returns a function, that we will store in the variable qpal, that can be called to query the color bin that a value falls into.
For example, in the code below, we create a queryable palette, named qpal, by specifying a color palette (palette = "Blues"
), the domain or range of our values (a random uniform distribution of 10 numbers between 0 and 1), and the number of bins we want the values to fall within (nbins = 5
):
palette = "Blues"
values = runif(10,min=0,max=1)
nbins = 5
qpal = colorBin(palette,values,nbins)
Above, we used random numbers for our values variable. Let’s see what color bins they end up falling into:
Your Turn: Create a Queryable Palette
Now it’s your turn to create a queryable palette for our map markers. In the code chunk labeled calculate_color_from_mean, line 97, use the colorBin() function to generate a palette for our precipitation means. Be sure to fill in these three items:
- Choose a color palette (Line 105)
- Specify the domain for our precipication mean values (Line 116)
- Set the number of bins the values can fall into (Line 121)
Your code should look something like this:
# YOUR TURN:
# On the next line, try setting palette to one of the following:
# "RdYlBu"
# "Blues"
# "YlOrRd"
# check out http://colorbrewer2.org/ for more palettes
# I like "YlOrRd"
palette = "YlOrRd"
# YOUR TURN:
# Set the domain of values to the mean column in our data frame:
# precip.details.df$mean
values = precip.details.df$mean
# YOUR TURN:
# Choose the number of bins. A good choice is an integer between 2 and 7
# I like 5
nbins = 5
# Create a palette we can query for which bin a value falls into.
qpal = colorBin(palette,values,nbins)
# Add a column named 'color' to our data frame. It holds
# the bin each station's mean precipitation falls into.
precip.details.df %<>%
mutate(color = qpal(mean))
When you have completed updating the code chunk, run it by pressing the button in the code chunk controls.
Great! Our updated data frame now has a column named color, which describes which bin the mean precipitation value falls into. Let’s have a look:
The next task is to update our map to show our custom color for the marker instead of the default blue color. In the code chunk labeled plot_station_locations_with_colors, line 135, we create a map using the leaflet()
function and then use the addCircleMarkers()
function to place circle markers on the map at the locations of the monitor stations. We need to update this map declaration to use the values we stored in the color column of our data frame.
Your Turn: Color the Map Markers
Add the following argument to the addCircleMarkers()
function to tell the map to color the markers using the color column we generated earlier from our custom color palette and bins:
color = ~color
After you have made the change, your code chunk should look like this:
map <- leaflet(precip.details.df %>% select(-data)) %>%
addTiles()
map %>%
# create the binned color marker
addCircleMarkers(
label = ~name,
lng = ~longitude,
lat = ~latitude,
radius = 4,
opacity = 1,
fillOpacity = 1,
# YOUR TURN:
# On the next line, add the argument: color = ~color,
color = ~color,
weight = 1
)
Your Turn: Generate an Updated Map
Run the code chunk, using the button in the upper right corner of the chunk, to see what the updated map looks like. The markers should now be a range of colors, similar to this one:
3. Add Marker Borders
The markers now show another dimension of information from our data frame, the binned mean precipitation. If you used the ‘YlOrRd’ palette, you may notice that some of the markers are a little hard to see. For example, the light yellow marker in the north central part of Indiana, near the city of South Bend, almost blends in with base map background. One easy way to solve this problem is to place a slightly larger black marker behind the marker displaying our bin color. This will give the effect of the marker having a border, helping to distinguish it from other artifacts on the map.
In the code chunk labeled add_black_background_marker, line 161, we create two types of markers on the map. The first marker is a black marker that will act as a border. The second marker sits on top of the black marker and shows the binned color for the mean precipitation value.
Your Turn: Create Background Markers
Modify the first marker in the code chunk to:
- Set addCircleMarkers()’s color argument to “#000000”, the hexadecimal value for the HTML color representing black.
- Set the line width (weight) to 3, a value slightly larger than that of the second marker.
After you have made the change, the code chunk should look something like this:
map %<>%
# create the black border marker
addCircleMarkers(
lng = ~longitude,
lat = ~latitude,
radius = 4,
opacity=1,
fillOpacity = 0,
# YOUR TURN:
# 1. Set the color argument to "#000000"
color = "#000000",
# YOUR TURN:
# 2. Set the weight argument to 3
weight = 3
) %>%
# create the bin color marker
addCircleMarkers(
label = ~name,
lng = ~longitude,
lat = ~latitude,
radius = 4,
opacity = 1,
fillOpacity = 1,
color = ~color,
weight = 1
)
Your Turn: View Map with Updated Markers
Now run the code chunk to confirm that the updated map shows a border around the markers. Press the button in the upper right corner of the chunk to execute the code.
4. Add a Legend to the Map
The black border around the colored marker helps users distinguish them from a similarly colored background. The map is missing one more item. No map is complete without a legend and it is especially important in this case. Having the colored markers on the map looks nice, but without the legend, users don’t know what the colors mean.
We’ll use the addLegend()
function to add a legend to the map. In the code chunk labeled add_legend_to_map, line 194, the addLegend()
function accepts a number of parameters telling the map how to configure the legend, like title, placement, and background opacity.
Your Turn: Setup Map Legend
Adjust the following parameters:
- Set the pal argument to qpal, the name of our custom binned color palette function.
- Set the values argument to ~mean, the column in our data frame that holds the calculated mean precipitation values.
After making the adjustments, the code chunk should look like this:
map %<>%
addLegend(
"bottomright",
title = "<center>Mean Hourly<br>
Precipitation<br/>
<small>
for hours with<br/>
measurable<br/>
precipitation
</small></center>",
opacity = 1,
labFormat = labelFormat(suffix="in"),
# YOUR TURN:
# Set the pal argument to our custom binned palette function, qpal
pal = qpal,
# YOUR TURN:
# Set the values argument to the mean precipitation values, ~mean
values = ~mean
)
Your Turn: View Map with Legend
Execute the code in the code chunk using the button in the upper right corner of the chunk. Do you see our new legend on the map? It should show what values the marker colors correspond to.
5. Putting it all Together
You may not have noticed, but the file we have been editing is actually an RMarkdown document. We have been mixing our R code with Markdown wiki syntax and executing the code chunks just as we would in a literate computing style Notebook.
If you go back through the code you may notice syntax in between the code chunks like this:
Row
-------------------------------------
### Step #1 - Precipitation Monitor Stations
and like this:
### Step #4 - Add a legend to the map
These are RMarkdown wiki syntax that give hints about how to format the file when rendered as a web page. At the top of the file, exists this YAML declaration:
---
title: Exploring Precipitation Across Indiana
output:
flexdashboard::flex_dashboard:
orientation: rows
---
All RMarkdown files have a YAML declaration at the top. This one states that the file should be rendered as a FlexDashboard, which is one of the RMarkdown templates that allow developers to quickly generate a dashboard from the R and Markdown code in the file. There are a number of different styles of FlexDashboards and you can learn more about them at http://rmarkdown.rstudio.com/flexdashboard/.
Our FlexDashboard contains a 2x2 grid of the maps we have created today. We can render it to HTML by using the Knit () button near the top of the RStudio IDE.
Your Turn: Generate an HTML Document
Press the button, just above the code pane, to generate and view our HTML document. A new window should pop up with all of the maps you generated in this exercise.
6. Use the Map for Analysis
Now that we have built a web page that displays our station locations on a map and shows us mean hourly precipitation for hours with measurable precipitation, let’s use that map to learn something new!
Visually inspect the map.
- Is there a trend in the recorded mean hourly precipication across the state of Indiana?
- Can you tell which station looks like it has the highest mean hourly precipitation?
We can use the filter()
and select()
functions to query this information from the data frame to confirm our suspicion. The filter()
function finds the row where the mean column is equal to the max()
of all values in the column. The select()
function returns the name column for that row.
highmean.station.name <- precip.details.df %>%
filter(mean == max(mean)) %>%
select(name) %>%
.[[1]]
Did you pick the PRINCETON 1 W station?
Exercise Completed!
Congratulations, you have completed this exercise. In the next section, you will learn more about publishing applications like this on the HUBzero Platform.
