-
Notifications
You must be signed in to change notification settings - Fork 4
/
More_Data_Vis.Rmd
133 lines (110 loc) · 5.51 KB
/
More_Data_Vis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: "Highlighting Different Taxa in ggplot"
author: "Zach Compton"
date: "11/29/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
I got stuck on another comparative oncology script I was working on so I thought I would render a markdown based off a Medium article I read this morning on how to emphasize certain data groups in a beautiful way. The original article can be found here https://medium.com/@giulia-ruggeri/better-data-communication-with-ggplot2-part-2-615a5180ccb
Before we start remember this is not a statistical test, we wont be using any phylogenetic trees nor constructing any regression models. This is simply an exercise in data visualization. Of course, these data visualizations can always be paired with actual statistical tests but I have covered those in other tutorials.
Today's scatterplot visualization is going to focus on highlighting or emphasizing particular groups or taxa within your comparative data. This could mean emphasizing one class, clade, family, or species within your scatter plot. You will see what I mean by emphasize here shortly.
### Libraries
Several new packages used for this, forgive me. In the original article the packages 'readr', 'ggplot', and 'stringr' are all listed seperately. However, as an ACE scholar you are all well aware that these packages are all inlcuded when we load the 'tidyerse' library.
The "devtools" package allows you to download packages that have been privately created (usually on Github) but have not been officially uploaded to the CRAN server
```{r, message=FALSEYe}
install.packages("devtools")
install.packages('ggfx')
install.packages('ggforce')
install.packages('gghighlight')
library(ggfx)
library(ggforce)
library(gghighlight)
library(tidyverse)
library(devtools)
devtools::install_github('thomasp85/ggfx')
```
Load our dataset for tutorials
```{r}
Data <- read.csv("dummyData.csv")
head(Data)
```
Before we build our graph let's take advantage of an awesome set of color palettes inspired from National Parks! These awesome palettes come from Katie Jolly at https://github.com/katiejolly/nationalparkcolors
```{r}
install.packages("devtools")
library(devtools)
devtools::install_github(katiejolly/nationalparkcolors)
library(nationalparkcolors)
```
Obviously, we will be using the color palette inspired by Saguaro National Park
```{r}
palette<- park_palette("Saguaro")
```
Okay lets build our ggplot graph using different colors for each of the 3 clades (Mammalia, Sauropsida, Amphibia). Reminder that Sauropsida is the clade that contains reptiles and birds.
```{r}
ggplot(data = Data,
aes(x = log10(adult_weight),
y= NeoplasiaPrevalence,
color = Clade)) +
geom_point() +
geom_point(data = Data %>% filter(Clade == "Mammalia")) +
scale_color_manual(values = palette)+
labs(y = "Neoplasia Prevalence %",
x = "Log10 Body Mass (g)",
color = "",
caption = "Just want to show how to add a caption below") +
theme_minimal() +
theme(legend.position = "top",
plot.title = ggtext::element_markdown(),
plot.caption = element_text(size = 8))
```
#### What do we mean *Emphasize* one clade?
Okay this does a decent job showing the different clades based on color assignment. But what if you want to emphasize the trend of body mass vs neoplasia prevalence in mammals within the context of the other non-mammal data points. Emphasizing one clade while also showing the rest of the data points can be useful in showing some pattern in the data that is unique to that clade.
One way of accomplishing this is to wrap the geom_points with the blur function, then adding a second geom_point argument naming the clade you wish to be non-blurred
```{r}
ggplot(data = Data,
aes(x = log10(adult_weight),
y= NeoplasiaPrevalence,
color = Clade)) +
with_blur(
geom_point(), sigma = unit(0.9, 'mm') #0.9 is the amount of blur
)+
geom_point(data = Data %>% filter(Clade == "Mammalia")) +
scale_color_manual(values = palette)+
labs(y = "Neoplasia Prevalence %",
x = "Log10 Body Mass (g)",
color = "",
caption = "Just want to show how to add a caption below") +
theme_minimal() +
theme(legend.position = "top",
plot.title = ggtext::element_markdown(),
plot.caption = element_text(size = 8))
```
The mammal data points are rather spread out here which doesn't work well for the next way method of emphasizing a data group, circling it. If you are going to circle a data group they need to be fairly well placed together on the graph. So for this next graph lets emphasize the Amphibians
```{r}
ggplot(data = Data,
aes(x = log10(adult_weight),
y= NeoplasiaPrevalence,
color = Clade)) +
with_blur(
geom_point(), sigma = unit(0.9, 'mm') #0.9 is the amount of blur
)+
geom_mark_ellipse(aes(
label = Clade,
filter = Clade == 'Amphibia'),
con.cap = 0,
con.arrow = arrow(ends = "last",
length = unit(0.5, "cm")),
show.legend = FALSE) +
geom_point(data = Data %>% filter(Clade == "Amphibia")) +
scale_color_manual(values = palette)+
labs(y = "Neoplasia Prevalence %",
x = "Log10 Body Mass (g)",
color = "",
caption = "Just want to show how to add a caption below") +
theme_minimal() +
theme(legend.position = "top",
plot.title = ggtext::element_markdown(),
plot.caption = element_text(size = 8))
```