forked from s-gluck/project_draft
-
Notifications
You must be signed in to change notification settings - Fork 0
/
project_draft.rmd
308 lines (220 loc) · 12.6 KB
/
project_draft.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
---
title: "Final Project - Draft"
author: "Stephanie Gluck"
date: "2/24/2020"
output: html_document
---
```{r setup}
#General packages
library(rio)
library(here)
library(ggplot2)
library(tidyverse)
library(readr)
library(pander)
library(psych)
#Alluvial plot
#library(alluvial)
# devtools::install_github('thomasp85/ggforce')
library(ggforce)
# devtools::install_github("corybrunson/ggalluvial")
library(ggalluvial)
library(ggparallel)
#Venn diagram
library(VennDiagram)
#Mosaic plot
library(ggmosaic)
```
```{r load_data}
d <- read_csv(here("data", "adj_percent.csv"))
d_valence <- read_csv(here("data", "adj_percent_valence.csv"))
mmd <- read_csv(here("data", "proj_data.csv"))
```
```{r data_wrangling}
d <- d %>% mutate(self_total = o_self + o_both,
other_total = o_other + o_both)
d_valence <- d_valence %>% mutate(self_total = o_self + o_both,
other_total = o_other + o_both)
```
## Project Summary
### Background
I have a dataset I collected that examines how individuals evaluate their social relationships that vary in the degree of closeness – a person they are close with, an acquaintance and a disliked person – and the extent to which they report a self-other overlap with that person. Self-other overlap, in close relationship can be broadly defined as the extent that one sees the other person as part of the self, a sense of shared identity or interconnectedness.
Participants completed the same measure of relationship closeness and self-other overlap for all three targets: 1) a person they are close with (significant other or best friend), 2) an acquaintance, and 3) a disliked person.
Based on previous studies (Myers & Hodges, 2011) the questionnaires I used are meant to assess two separate factors of self-other overlap (see section below for sample items). A more direct and conscious perception of closeness and a less direct measure meant to tap the cognitive representations of the self and other (e.g., a trait adjective checklist).
### Sample Items
**Sample items for direct perception of closeness:**
Please circle the picture below which best describes your relationship with [the target]
Inclusion of Other in the Self Scale (Aron et al., 1992)
![IOS](IOS.png)
Indicate on 7-pt scale – 1 (not at all) to 7 (extremely) – the extent to which:
You would use the term “we” to describe your relationship with this person.
You think this person is similar to you
You and this person share many of the same interests
**Sample items for cognitive representation of the self and other:**
Trait Adjective Checklist sample items (total of 114 adjectives).
Which of the following adjectives do you consider to be descriptive of yourself / [target]? Select all that applies.
Adaptable
Aggressive
Clever
Curious
Quiet
Reserved
Outspoken
Rude
Tense
Wholesome
## Visualization Idea
For the final project, I plan to create three different type of visualization to help me visualize my relationship data:
1) Venn Diagram
2) Alluvial / Sankey Diagram
3) Mosaic Plot
With the visualization, I hope to summarize the number of trait adjectives (total of 114) that a person selects for themselves relative to the number of adjectives the person selects for the other targets (close, acquaintance and dislike) and the amount of overlap between those trait adjectives. The trait adjectives are also categorized by valence (positive, neutral, negative) so I also hope to be able to visualize my data by valence and types of social relationship (e.g., positive adjectives for acquaintance or negative adjectives for a disliked person).
I also want to calculate a self-percentage and an other-percentage score from the Adjective Checklist. The self-percentage is the proportion of traits that one attributes to the self that is later ascribed to the target while the other-percentage is the proportion of the target’s trait that is shared with the self. I have included an example below:
For example, Sally selected 20 adjectives for both her and her best friend (close target), 10 adjectives that was unique to Sally and 30 adjectives unique to her best friend.
The total # of adjective Sally selected for herself would be:
20 (shared) + 10 (self unique) = 30
Total # of adjective for her best friend would be:
20 (shared) + 30 (close unique) = 50
Self-percentage:
20 / 30 = 0.66 or 66%
Other-percentage:
20 / 50 = 0.4 or 40%
## Venn diagram
How I envision the venn diagram to look like such that the size of the circle is relative to the number of adjective that is selected for the self and the other person (close, acquaintance, disliked person)
Taking the Sally example from above, the self-circle will be smaller with overall less number of total adjectives selected (30) while the close person circle will be bigger because more total number of adjectives are selected (50). The amount of overlap for the self-circle will be more (66% overlap) compared to the target circle (40% overlap) relative to its overall size.
30 = adjectives unique to the close target
20 = adjectives shared by self and target
10 = adjectives unique to the self
```{r venn_diagram, fig.width = 5, fig.height = 5}
grid.newpage()
draw.pairwise.venn(area1 = 50,
area2 = 30,
cross.area = 20,
category = c("Close Person", "Self"),
fill = c("#e7298a", "#1b9e77"),
cex = 2)
```
## Alluvial / Sankey diagram
For the Alluvial Diagram, I want to first visualize the overall average self-percentage across all participants separted by relationship closeness (close, acquaintance,disliked person) and valence (postive, neutral, negative) and I later hope to select around 3 specific particpiants to plot their data ontop of the overall average (so to show how a specific individuals self-percentage score compares to the overall scores). For my draft, I'm not quite at the individual participant levels yet as I'm still learning how to plot the Alluvial diagram.
```{r alluvial_data}
alluvial_d <- d_valence %>%
group_by(relationship, valence) %>%
summarize(mean = mean(self_percent, na.rm = T)*100) %>%
mutate(self = paste("Self")) %>%
ungroup() %>%
mutate(valence = str_to_title(valence),
relationship = as.factor(relationship),
valence = as.factor(valence)) %>%
mutate(relationship = fct_relevel(relationship, "close_overlap",
"acq_overlap",
"dislike_overlap"),
valence = fct_relevel(valence, "Positive", "Neutral", "Negative"),
relationship = recode(relationship,
"close_overlap" = "Close Person",
"acq_overlap" = "Acquaintance",
"dislike_overlap" = "Disliked Person"))
levels(alluvial_d$relationship)
levels(alluvial_d$valence)
```
This is the data I intended to summarize with my Alluvial plot
```{r alluvial_table}
pander(alluvial_d)
```
```{r alluvial, fig.width = 10, fig.height = 10}
ggplot(alluvial_d, aes(y = mean, axis1 = self, axis2 = relationship, axis3 = valence)) +
geom_alluvium(aes(fill = relationship), width = 1/12, color = "gray40", knot.pos = .2) +
geom_stratum(width = 1/6, fill = "gray70", color = "gray40") +
geom_label(stat = "stratum", infer.label = TRUE) +
scale_x_discrete(limits = c("Self", "Relationship", "Valence"),
expand = c(.05, .05)) +
theme_minimal(15) +
scale_fill_brewer(palette = "Dark2") +
guides(fill = FALSE) +
labs(x = "",
y = "Percent",
title = "Self-Other Overlap Scores from Adjective Checklist",
subtitle = "By Relationship Closeness and Valence",
caption = "N = 155") +
coord_flip()+
theme(plot.title.position = "plot",
legend.position = "none",
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank())
```
Hi Stephanie,
I'm really excited to learn about this ggalluvial package but somehow I cannot produce this plot. If your computer has no problem with it please ignore this message.
From you code I can see and I really like the way you modify the labs and theme to make this plot more interpretable on its own. And the Alluvial plot is amazing!
I know nothing about this package but I find the seven white boxes labelling the three sections such as "close person" and the other six are a bit distracting and taking away my attention on the beautiful relationship flow. My suggestion is to coord_flip the plot and then you have more space inside each bucket to insert the labels. I also changed the fig-height to make the plot more proportional.
Claire
## Mosaic Plot
For the Mosaic Plot, I want to visualize the response that participants (N = 155) selected for the Inclusion of Other in the Self Scale (Aron et al., 1992; IOS) for the different targets that varied in closeness. I hypothesize that participatns are more likely to indicate closeness -- greater overlap in their cricles -- in the close relatioship (a best friend or significant other) compared to an acquaintance or a disliked person.
The IOS is a one item measure consisting of seven pair of circle -- with one circle representing the self and the other representing another person -- that vary in the extent to which the circle overlap with each other.
Inclusion of Other in the Self Scale (Aron et al., 1992)
![IOS](IOS.png)
1 = Self and Other as separte circles that do not overlap (top left)
7 = Self and Other as the most overlapped cirle (bottom right)
```{r mosaic_data}
#data wrangling
mosaic_d <- mmd %>% select(contains("IOS")) %>%
pivot_longer(1:3, names_to = "relationship", values_to = "IOS") %>%
mutate_if(is.numeric, as.factor) %>%
mutate(relationship = as.factor(relationship)) %>%
mutate(relationship = recode(relationship, "close_IOS" = "Best Friend or Significant Other",
"acq_IOS" = "Acquaintance",
"dislike_IOS" = "Disliked Person"),
relationship = fct_relevel(relationship, "Best Friend or Significant Other",
"Acquaintance",
"Disliked Person"))
```
This is the data I intended to summarize with my mosaic plot
```{r mosaic_table}
mosaic_table <- mosaic_d %>%
group_by(relationship, IOS) %>%
summarise
pander(mosaic_table)
```
```{r mosaic_plot, fig.width = 10, fig.height = 8}
#plot
ggplot(data = mosaic_d) +
geom_mosaic(aes(x = product(IOS), fill=IOS), na.rm=TRUE) +
facet_wrap(~relationship, nrow = 3) +
scale_fill_viridis_d() +
theme_minimal(15) +
labs(x = "Frequency",
y = "",
title = "Distribution of IOS Response by Relationship Closeness",
caption = "N = 155") +
theme(plot.title.position = "plot",
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank())
```
Hi Stephanie,
Somehow I get error message again so this plot doesn't display on my screen. The error message is ""plot.title.position" is not a valid theme element name."
I only ran the plot WITHOUT theme() but still can see that it is BEAUTIFUL. I can say it's perfect.
```{r, include = F, eval = F}
labs(x = "Difference in log odds of a crime being committed",
y = "",
title = "Probability of differential crime rates between neighborhoods",
subtitle = "<span style = 'color : #009E73'>Regis</span> compared to <span style = 'color : #CC79A7'>Barnum</span>",
caption = "Each ball represents 5% probability") +
theme(plot.title.position = "plot",
legend.position = "none",
plot.subtitle = element_markdown(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank())
grid.newpage()
draw.triple.venn(area1 = 3472, area2 = 3528, area3 = 3492, n12 = 3176, n23 = 3323, n13 = 3182, n123 = 3096, category = c("sample1", "sample2", "sample3"), lty = "blank", fill = c("skyblue", "pink1", "mediumorchid") , cex=2, cat.cex=2, cat.fontfamily = rep("serif", 3))
grid.newpage()
venn.diagram(list(B = 1:1800, A = 1571:2020),fill = c("red", "green"),
alpha = c(0.5, 0.5), cex = 2, cat.fontface = 4,lty =2, fontfamily =3,
filename = "trial2.emf")
```
It's so unfortunate that I cannot display this plot, which I believe will be amazing. the error message I got is "Warning message:
In grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database", then it stuck there.
Overall, this is an excellent work! I really learned a lot from your code. Thanks!!
Claire