class: center, middle, inverse, title-slide # Data Visualization in R with ggplot2 ## Stanford University Libraries ### Josh Quan ### 23 February 2021
cidr.link/ggplot2 --- class: center, middle # Data visualization --- ## Data visualization > *"The simple graph has brought more information to the data analyst’s mind than any other device." > — John Tukey* - Data visualization is the creation and study of the visual representation of data. - Many tools for visualizing data (R is one of them) - Many approaches/systems within R for making data visualizations, **ggplot2** is one of them --- ## Data visualization ![](https://d2f99xq7vri1nk.cloudfront.net/Anscombe_1_0_0.png) --- ## Data visualization ![](https://d2f99xq7vri1nk.cloudfront.net/CloudToCircle.gif) --- ## Data visualization ![](https://d2f99xq7vri1nk.cloudfront.net/DinoSequentialSmaller.gif) --- ## ggplot2 `\(\in\)` tidyverse .pull-left[ <img src="./img/ggplot2-part-of-tidyverse.png" width="80%" /> ] .pull-right[ - **ggplot2**: tidyverse's data visualization package - `gg` in "ggplot2" stands for Grammar of Graphics - Inspired by the book **Grammar of Graphics** by Leland Wilkinson - A grammar of graphics is a tool that enables concise description of components of a graphic <img src="./img/grammar-of-graphics.png" width="80%" /> ] --- ## ggplot2 `\(\in\)` tidyverse .pull-left[ <img src="./img/tidy-workflow.png" width="100%" /> ] .pull-right[ <img src="./img/tidy-workflow-packages.png" width="90%" /> ] --- ## Following along... - Download the materials at cidr.link/ggplot2 and launch `ggplot2.Rproj` ```r install.packages("ggplot2", repos = 'http://cran.us.r-project.org') install.packages("dplyr", repos = 'http://cran.us.r-project.org') install.packages("readr", repos = 'http://cran.us.r-project.org') install.packages("ggrepel", repos = 'http://cran.us.r-project.org') install.packages("plotly", repos = 'http://cran.us.r-project.org') install.packages("knitr", repos = 'http://cran.us.r-project.org') install.packages("patchwork", repos = 'http://cran.us.r-project.org') ``` - Load libraries ```r library(ggplot2) library(dplyr) library(readr) library(plotly) library(ggrepel) library(patchwork) ``` - Open `ggplot2.Rmd` to follow along in markdown document --- ## Dataset [Stanford Open Policing Project](https://openpolicing.stanford.edu/) [Police Searches Drop Dramatically in States that Legalized Marijuana](https://www.nbcnews.com/news/us-news/police-searches-drop-dramatically-states-legalized-marijuana-n776146) * Police Stop Data - state, driver race, stop rate, marijuana legalization status ```r stops <- read_csv("../data/opp-search-marijuana_state.csv") %>% filter(state %in% c("WA", "CO")) %>% mutate(legalization_status = ifelse(quarter <= "2013-01-01", "pre","post"), search_rate_100 = search_rate * 100) ``` --- class: center, middle # Layer up! --- ``` ## `geom_smooth()` using formula 'y ~ x' ``` <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="100%" /> --- ## Basic ggplot2 syntax * DATA * MAPPING * GEOM --- ## Your turn! **Exercise:** Determine which variable is mapped to which aesthetic (x-axis, y-axis, etc.) element of the dataset. <img src="index_files/figure-html/unnamed-chunk-9-1.png" width="70%" /> --- class: center, middle # Step-by-step --- ```r ggplot(data = stops) ``` ![](index_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ```r ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100)) ``` ![](index_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- ```r ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100)) + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-13-1.png)<!-- --> --- ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth() ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth(method = "loess") ``` ``` ## `geom_smooth()` using formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-15-1.png)<!-- --> --- ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth(method = "loess", se = FALSE) ``` ``` ## `geom_smooth()` using formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth(method = "loess", se = FALSE) + scale_color_viridis_d() ``` ``` ## `geom_smooth()` using formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-17-1.png)<!-- --> --- ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth(method = "loess", se = FALSE) + scale_color_viridis_d() + theme_minimal() ``` ``` ## `geom_smooth()` using formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-18-1.png)<!-- --> --- ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth(method = "loess", se = FALSE) + scale_color_viridis_d() + theme_minimal() + labs(x = "Year", y = "Search Rate", color = "Driver Race", title = "Washington Highway Patrol Searches", subtitle = "Searches Per Hundred stops") ``` ``` ## `geom_smooth()` using formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-19-1.png)<!-- --> --- ## ggplot, the making of 1. "Initialize" a plot with ggplot() 2. Add layers with geom_ functions ``` ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))+ geom_point(mapping = aes(x = displ, y = hwy)) ``` --- class: center, middle # Mapping --- ## Size data points by a numerical variable ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, size = search_rate_100)) + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-20-1.png)<!-- --> --- ## Set alpha value ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, size = search_rate_100)) + geom_point(alpha = 0.5) ``` ![](index_files/figure-html/unnamed-chunk-21-1.png)<!-- --> --- ## Your turn! **Exercise:** Using information from https://ggplot2.tidyverse.org/articles/ggplot2-specs.html add color, size, alpha, and shape aesthetics to your graph. Experiment. Do different things happen when you map aesthetics to discrete and continuous variables? What happens when you use more than one aesthetic? ```r stops %>% ggplot(aes(x = quarter , y = search_rate_100, color = driver_race)) + geom_point() + theme_minimal(base_size = 12) ``` --- <img src="./img/aesthetic-mappings.png" width="80%" /> --- ## Mappings can be at the `geom` level ```r ggplot(data = stops) + geom_point(mapping = aes(x = quarter, y = search_rate_100)) ``` ![](index_files/figure-html/unnamed-chunk-24-1.png)<!-- --> --- ## Different mappings for different `geom`s ```r ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100)) + geom_point() + geom_smooth(aes(color = driver_race), method = "loess", se = FALSE) ``` ``` ## `geom_smooth()` using formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-25-1.png)<!-- --> --- ### Set vs. map - To **map** an aesthetic to a variable, place it inside `aes()` ```r ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-26-1.png)<!-- --> --- - To **set** an aesthetic to a value, place it outside `aes()` ```r ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100)) + geom_point(color = "red") ``` ![](index_files/figure-html/unnamed-chunk-27-1.png)<!-- --> --- - Can specify HTML color codes (tip: Add-ins > colour picker) ```r ggplot(data = stops, mapping = aes(x = quarter, y = search_rate_100)) + geom_point(color = "#63B3E8") ``` ![](index_files/figure-html/unnamed-chunk-28-1.png)<!-- --> --- ## Data can be passed in ```r stops %>% ggplot(aes(x = quarter, y = search_rate_100)) + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-29-1.png)<!-- --> --- ## Parameters can be unnamed ```r ggplot(stops, aes(x = quarter, y = search_rate_100)) + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-30-1.png)<!-- --> --- ## Assign ggplot() to objects for layering ```r p <- ggplot(stops, aes(x = quarter, y = search_rate_100)) + geom_point() p + geom_smooth() ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-31-1.png)<!-- --> --- class: center, middle # Common early pitfalls --- ## Mappings that aren't ```r ggplot(data = stops) + geom_point(aes(x = quarter, y = search_rate_100, color = "blue")) ``` ![](index_files/figure-html/unnamed-chunk-32-1.png)<!-- --> --- ## Mappings that aren't ```r ggplot(data = stops) + geom_point(aes(x = quarter, y = search_rate_100), color = "blue") ``` ![](index_files/figure-html/unnamed-chunk-33-1.png)<!-- --> --- ## Your turn! **Exercise:** What is wrong with the following? ```r stops %>% ggplot(aes(x = quarter, y = search_rate_100, color = legalization_status)) %>% geom_point() ``` --- ## + and %>% What is wrong with the following? ```r stops %>% ggplot(aes(x = quarter, y = search_rate_100, color = legalization_status)) %>% geom_point() ``` ``` ## Error: `mapping` must be created by `aes()` ## Did you use %>% instead of +? ``` --- ## Building up layer by layer ### Basic plot ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100)) + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-36-1.png)<!-- --> --- ## Two layers! ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100)) + geom_point() + geom_line() ``` ![](index_files/figure-html/unnamed-chunk-37-1.png)<!-- --> --- ## Grouping with colors ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() + geom_line() ``` ![](index_files/figure-html/unnamed-chunk-38-1.png)<!-- --> --- ## Now we've got it ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth(span = .2, se = FALSE) ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-39-1.png)<!-- --> --- ## Control data by layer ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point(data = filter(stops, search_rate_100 < .2), size = 5, color = "gray") + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-40-1.png)<!-- --> --- ## Your turn! **Exercise:** Work with your neighbor to sketch what the following plots will look like. Do not run the code, just think through the code for the time being. ```r pre_legalization_high <- stops %>% filter((quarter < "2013-01-01" & search_rate_100 > 1.0)) ``` ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point(data = pre_legalization_high, size = 5, color = "gray") + geom_point() + geom_text(data = pre_legalization_high, aes(y = search_rate_100 + .05, label = search_rate_100), size = 2, color = "black") ``` --- ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-43-1.png)<!-- --> --- ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() + geom_point(data = pre_legalization_high, size = 5, color = "gray") ``` ![](index_files/figure-html/unnamed-chunk-44-1.png)<!-- --> --- ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point(data = pre_legalization_high, size = 5, color = "gray") + geom_point() ``` ![](index_files/figure-html/unnamed-chunk-45-1.png)<!-- --> --- ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point(data = pre_legalization_high, size = 5, color = "gray") + geom_point() + geom_text(data = pre_legalization_high, aes(y = search_rate_100, label = search_rate_100), size = 2, color = "black") ``` ![](index_files/figure-html/unnamed-chunk-46-1.png)<!-- --> --- ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point(data = pre_legalization_high, size = 5, color = "gray") + geom_point() + geom_text(data = pre_legalization_high, aes(y = search_rate_100 + .05, label = search_rate_100), size = 2, color = "black") ``` ![](index_files/figure-html/unnamed-chunk-47-1.png)<!-- --> --- ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point(data = pre_legalization_high, size = 5, color = "gray") + geom_point() + geom_text_repel(data = pre_legalization_high, aes(x = quarter, y = search_rate_100, label = as.character(quarter)), size = 3, color = "black") ``` ![](index_files/figure-html/unnamed-chunk-48-1.png)<!-- --> --- ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point(data = pre_legalization_high, size = 5, color = "gray") + geom_point() + geom_label_repel(data = pre_legalization_high, aes(x = quarter, y = search_rate_100, label = as.character(quarter)), size = 3, color = "black") ``` ![](index_files/figure-html/unnamed-chunk-49-1.png)<!-- --> --- **Exercise:** How would you fix the following plot? ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth(color = "blue") ``` ``` ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` ![](index_files/figure-html/unnamed-chunk-50-1.png)<!-- --> --- ### More on colors ```r ggplot(stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + scale_color_manual(values = c("#FF6EB4", "#00BFFF", "#008B8B")) + geom_smooth(se = FALSE) ``` ![](index_files/figure-html/unnamed-chunk-51-1.png)<!-- --> --- ### Splitting over facets ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100)) + geom_smooth() + facet_wrap( ~ driver_race) ``` ![](index_files/figure-html/unnamed-chunk-52-1.png)<!-- --> --- ### facet_grid ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100)) + geom_line() + facet_grid(state ~ driver_race) ``` ![](index_files/figure-html/unnamed-chunk-53-1.png)<!-- --> --- ### facet_grid ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100)) + geom_line() + facet_grid(driver_race ~ state) ``` ![](index_files/figure-html/unnamed-chunk-54-1.png)<!-- --> --- ## facet_wrap vs. facet_grid ![](https://i.stack.imgur.com/oUXZU.png) --- # Scales and legends --- ## Scale transformation ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() + scale_y_reverse() ``` ![](index_files/figure-html/unnamed-chunk-55-1.png)<!-- --> --- ## Scale transformation ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() + scale_y_sqrt() ``` ![](index_files/figure-html/unnamed-chunk-56-1.png)<!-- --> --- ## Scale details ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() + scale_y_continuous(breaks = c(0, 0.25, 0.5, .75, 1.0)) ``` ![](index_files/figure-html/unnamed-chunk-57-1.png)<!-- --> --- ## Overall themes ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() + theme_bw() ``` ![](index_files/figure-html/unnamed-chunk-58-1.png)<!-- --> --- ## Overall themes ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() + theme_dark() ``` ![](index_files/figure-html/unnamed-chunk-59-1.png)<!-- --> --- ## Customizing theme elements ```r ggplot(data = stops, aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_point() + theme(axis.text.x = element_text(angle = 45)) ``` ![](index_files/figure-html/unnamed-chunk-60-1.png)<!-- --> --- ## Combining several plots to a grid ```r wa_stops <- stops %>% filter(state == "WA") %>% ggplot(aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth(se = FALSE) + labs(title = "Washington") co_stops <- stops %>% filter(state == "CO") %>% ggplot(aes(x = quarter, y = search_rate_100, color = driver_race)) + geom_smooth(se = FALSE) + labs(title = "Colorado") + theme(legend.position = "none") ``` --- ## Combining several plots to a grid ```r (wa_stops / co_stops) ``` <img src="index_files/figure-html/unnamed-chunk-62-1.png" width="75%" /> --- ## Combining several plots to a grid ```r wa_stops + co_stops ``` <img src="index_files/figure-html/unnamed-chunk-63-1.png" width="75%" /> --- ### Interactivity ```r wa_stops <- wa_stops + geom_point() plotly::ggplotly(wa_stops) ```
--- ## Your turn! --- .pull-left[ **Final Exercise:** [Recreate this chart](https://www.nbcnews.com/news/us-news/police-searches-drop-dramatically-states-legalized-marijuana-n776146) <img src="./img/cnbc.png" width="75%" height="85%" /> ] .pull-right[ #### Some code to get you started: ```r stops %>% filter(state == "WA") %>% ggplot(aes(quarter, search_rate_100, color = driver_race)) + geom_point() + geom_smooth(method = lm, se = FALSE) ``` - '?labs' layer controls title, subtitle, caption, etc. - '?scale_color_manual' layer allows you to specify your own colors for the levels - '?geom_vline' layer draws a vertical line across the plot. (hint: the x-axis is a date data type) - '?theme' controls the non-data elements of the plot like size of text, angle of axis ticks, etc. - '?annotate' creates a text annotation layer. Same trick with coordinates as geom_vline - Experiment with [themes](https://ggplot2.tidyverse.org/reference/ggtheme.html) ] --- ## Keep practicing [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday) <img src="./img/tidytuesday.png" width="75%" /> --- ## Get help [SSDS Consulting](https://ssds.stanford.edu/consulting-workshops/walk-consulting) <img src="./img/cidr.png" width="50%" /> --- ## Themes Vignette To really master themes: [ggplot2.tidyverse.org/articles/extending-ggplot2.html#creating-your-own-theme](https://ggplot2.tidyverse.org/articles/extending-ggplot2.html#creating-your-own-theme) --- class: center, middle # Recap --- ## The basics * map variables to aethestics * add "geoms" for visual representation layers * scales can be independently managed * legends are automatically created * statistics are sometimes calculated by geoms --- ## ggplot2 template Make any plot by filling in the parameters of this template <img src="./img/ggplot2-template.png" width="100%" /> --- ## Learn more * Books: - [ggplot2 documentation](https://ggplot2.tidyverse.org/reference/) - [R for Data Science](https://r4ds.had.co.nz) by Grolemund and Wickham - [R Graphics Cookbook](http://www.cookbook-r.com/Graphs/) by Chang - [Data Visualization: A Practical Introduction](https://kieranhealy.org/publications/dataviz/) by Healy * [ggplot2.tidyverse.org](https://ggplot2.tidyverse.org/) * [ggplot2 Cheat sheet](https://rstudio.com/resources/cheatsheets/) --- ## Thanks