analysis.Rmd

---
title: "Responses to Virtual Environments - Preliminary Analysis"
author: Tiernan J. Cahill
output: html_notebook
---

```{r setup, include=FALSE}
library(dplyr)
library(knitr)
library(ggplot2)
library(rstatix)
library(ggpubr)
library(rcompanion)

study_data <- readRDS("data/congruity_data.rds")
```

> **AUTHOR'S NOTE**: This notebook was originally prepared based on results derived from the unweighted calculation of SSQ. Consequently, not all commentary may apply to computed results at present.

# Diagnostics
Before we beginning our analysis, we check to see if the measured variables of interest are nearly-normally distributed, so we can make decisions about what sort of models will be appropriate. We can begin with a quick visual inspection using histograms.

```{r diag}
# ITQ (Control variable)
ggplot(study_data, aes(x = ITQ)) +
  geom_histogram(binwidth = 5, fill = "#0AA57F", alpha = 0.8) +
  labs(x="Immersive Tendencies", y = "Count") 

# Spatal presence (Resposne variable)
ggplot(study_data, aes(x = SPQ)) +
  geom_histogram(binwidth = 0.5, fill = "skyblue", alpha = 0.8) +
  facet_grid(rows = vars(condition)) +
  labs(x = "Spatial Presence", y = "Count")

# Spatial situation (Response variable)
ggplot(study_data, aes(x = SSM)) +
  geom_histogram(binwidth = 0.5, fill = "#5964B3", alpha = 0.8) +
  facet_grid(rows = vars(condition)) +
  labs(x = "Spatial Situation", y = "Count")

# Self-presence (Response variable)
ggplot(study_data, aes(x = SPSL)) +
  geom_histogram(binwidth = 0.5, fill = "#3439C8", alpha = 0.8) +
  facet_grid(rows = vars(condition)) +
  labs(x="Self-Presence", y = "Count")

# Suspension of disbelief (Response variable)
ggplot(study_data, aes(x = SoD)) +
  geom_histogram(binwidth = 0.5, fill = "#004DC3", alpha = 0.8)+
  facet_grid(rows = vars(condition)) +
  labs(x="Suspension of Disbelief", y = "Count")

# Simulator sickness (Response variable)
ggplot(study_data, aes(x = sim_sick)) +
  geom_histogram(binwidth = 5, fill = "#490415", alpha = 0.8) +
  labs(x="Simulator Sickness", y = "Count")
```

From the visualization, it appears that most of the variables of interest allow us the assumption of near-normality, with the exception of simulator sickness, which is skewed left (since many more participants experience a few minor symptoms than many major symptoms). We will deal with this later. We can confirm this with a Shapiro-Wilk test.

```{r sw-test, warning=FALSE}
shapiro.test(study_data$sim_sick)
```

# Hypothesis Tests
## Presence
We are interested in whether measures of presence vary significantly between control and breach conditions in each of the three experiments. This is a one-sided test, since we can reasonably hypothesize that presence will be lower in the breach condition.

### Covariates
Establish, firstly, that there is a correlation between ITQ and SPQ, indicating that this is a relavent control variable.
```{r cor-test}
cor.test(study_data$ITQ, y = study_data$SPQ, alternative = "greater", method = "pearson")
```

### Overall Breaching Effects

```{r presence-viz}
study_data %>%
  ggplot(aes(x = control, y = SPQ)) + 
  geom_violin(fill = "lightgreen", alpha = 0.8) +
  xlab("Condition") + ylab("Spatial Presence") +
  stat_summary(fun.y=mean, geom="point", shape=23, size=2)
```

```{r presence-test}
bartlett.test(SPQ ~ control, study_data)
t.test(SPQ ~ control, study_data, alternative = "greater", var.equal = T)
study_data %>%
  anova_test(SPQ ~ ITQ + control , type = 2, effect.size = "ges", observed = "ITQ")
```

For the sake of simplicity, it's worth looking at an aggregate measure of spatial presence.

Let's see how this measure stacks up, visually, in the different breaching experiments.
```{r spq-viz}
sensory.plot <- study_data %>%
  filter(condition == "SENSORY") %>%
  ggplot(aes(x = control, y = SPQ)) +
  geom_boxplot(fill = "#E69F00", alpha = 0.8) + 
  ylim(1,5) +
  labs(title="Sensory", x = "Condition", y = "Spatial Presence")

enviro.plot <- study_data %>%
  filter(condition == "ENVIRONMENTAL") %>%
  ggplot(aes(x = control, y = SPQ)) +
  geom_boxplot(fill = "#56B4E9", alpha = 0.8) + 
  ylim(1,5) +
  labs(title="Environmental", x = "Condition", y = "Spatial Presence")

thematic.plot <- study_data %>%
  filter(condition == "THEMATIC") %>%
  ggplot(aes(x = control, y = SPQ)) +
  geom_boxplot(fill = "#009E73", alpha = 0.8) + 
  ylim(1,5) +
  labs(title="Thematic", x = "Condition", y = "Spatial Presence")

ggarrange(sensory.plot, enviro.plot, thematic.plot,
        # labels = c("Sensory", "Environmental", "Thematic"),
          ncol = 3)
```

<!-- This looks like it might give us a clearer picture of the differences between the conditions; however, it also appears that there may be some outliers in the data. Let's first remove those. -->

<!-- ```{r spq-clean} -->
<!-- study_data.rm_out <- study_data %>% -->
<!--   filter(SPQ < median(study_data$SPQ, na.rm=T) + -->
<!--            2*IQR(study_data$SPQ, na.rm=T)) %>% -->
<!--   filter(SPQ > median(study_data$SPQ, na.rm=T) - -->
<!--            2*IQR(study_data$SPQ, na.rm=T)) -->

<!-- sensory_data.rm_out <- study_data.rm_out %>% -->
<!--   filter(condition == "SENSORY") -->

<!-- enviro_data.rm_out <- study_data.rm_out %>% -->
<!--   filter(condition == "ENVIRONMENTAL") -->

<!-- thematic_data.rm_out <- study_data.rm_out %>% -->
<!--   filter(condition == "THEMATIC") -->
<!-- ``` -->

Let's try running some tests.

```{r spq-tests}
sensory_data <- study_data %>%
  filter(condition == "SENSORY")

enviro_data <- study_data %>%
  filter(condition == "ENVIRONMENTAL")

thematic_data <- study_data %>%
  filter(condition == "THEMATIC")

# Sensory breach
bartlett.test(SPQ ~ control, sensory_data)
t.test(SPQ ~ control, sensory_data, alternative = "greater", var.equal = TRUE)

# Environmental breach
bartlett.test(SPQ ~ control, enviro_data)
t.test(SPQ ~ control, enviro_data, alternative = "greater", var.equal = TRUE)

# Thematic breach
bartlett.test(SPQ ~ control, thematic_data)
t.test(SPQ ~ control, thematic_data, alternative = "greater", var.equal = TRUE)
```

This seems to indicate that the differences in overall spatial presence are negligible for the spatial and environmental breaches, but quite significant for the thematic breach. Let's examine this again, controlling for individual differences in immersive tendencies.

```{r spq-tests-control}
sensory_data %>%
  anova_test(SPQ ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")

enviro_data %>%
  anova_test(SPQ ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")

thematic_data %>%
  anova_test(SPQ ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")
```

The results from the previous tests are mirrored here.

### Sensory Congruity
First, looking at the experiment in which sensory congruity was breached, we can visualize the presence measures reported by the two groups (in the control and breach conditions).

```{r sensory-viz}
# Spatial situation
sensory_data %>%
  ggplot(aes(x = control, y = SSM)) +
  geom_boxplot(fill = "#5964B3", alpha = 0.8) + 
  xlab("Condition") + ylab("Spatial Situation")

# Self-presence / Self-location
sensory_data %>%
  ggplot(aes(x = control, y = SPSL)) +
  geom_boxplot(fill = "#3439C8", alpha = 0.8) + 
  xlab("Condition") + ylab("Self-Presence / Self-Location")

# Suspension of disbelief
sensory_data %>%
  ggplot(aes(x = control, y = SoD)) +
  geom_boxplot(fill = "#004DC3", alpha = 0.8) + 
  xlab("Condition") + ylab("Suspension of Disbelief")
```

Next, we can run some statistical tests, starting with simple t-tests.

```{r sensory-tests}
bartlett.test(SSM ~ control, sensory_data)
t.test(SSM ~ control, sensory_data, alternative = "greater", var.equal = TRUE)

bartlett.test(SPSL ~ control, sensory_data)
t.test(SPSL ~ control, sensory_data, alternative = "greater", var.equal = TRUE)

bartlett.test(SoD ~ control, sensory_data)
t.test(SoD ~ control, sensory_data, alternative = "greater", var.equal = TRUE)
```

It looks like we can assume equal variances based on the results of Bartlett's tests, but that the differences between groups aren't significant. Let's try again, controlling for individual differences in immersive tendencies using an ANCOVA.

```{r sensory-tests-control}
sensory_data %>%
  anova_test(SSM ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")

sensory_data %>%
  anova_test(SPSL ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")

sensory_data %>%
  anova_test(SoD ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")
```

It still doesn't look like there's much of a significant difference between the experimental groups when controlling for immersive tendencies.

### Environment Congruity
We repeat the above procedure for the second experiment, where environmental congruity was breached.

```{r enviro-viz, warnings = FALSE}
# Spatial situation
enviro_data %>%
  ggplot(aes(x = control, y = SSM)) +
  geom_boxplot(fill = "#5964B3", alpha = 0.8) + 
  xlab("Condition") + ylab("Spatial Situation")

# Self-presence / Self-location
enviro_data %>%
  ggplot(aes(x = control, y = SPSL)) +
  geom_boxplot(fill = "#3439C8", alpha = 0.8) + 
  xlab("Condition") + ylab("Self-Presence / Self-Location")

# Suspension of disbelief
enviro_data %>%
  ggplot(aes(x = control, y = SoD)) +
  geom_boxplot(fill = "#004DC3", alpha = 0.8) + 
  xlab("Condition") + ylab("Suspension of Disbelief")
```

```{r enviro-tests}
bartlett.test(SSM ~ control, enviro_data)
t.test(SSM ~ control, enviro_data, alternative = "greater", var.equal = TRUE)

bartlett.test(SPSL ~ control, enviro_data)
t.test(SPSL ~ control, enviro_data, alternative = "greater", var.equal = TRUE)

bartlett.test(SoD ~ control, enviro_data)
t.test(SoD ~ control, enviro_data, alternative = "greater", var.equal = TRUE)
```

It looks like there might be a significant difference in suspension of disbelief, based on the T test. Let's see what happens when we control for immersive tendencies.

```{r enviro-tests-control}
enviro_data %>%
  anova_test(SSM ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")

enviro_data %>%
  anova_test(SPSL ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")

enviro_data %>%
  anova_test(SoD ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")
```

In this case, because the test is one-sided, a p-value < 0.1 is considered significant, which remains the case for Suspension of Disbelief in the environmnental breach condition, even when controlling for immersive tendencies.


### Thematic
Finally, let's repeat all of the above steps for the thematic breach condition.

```{r thematic-viz, warnings = FALSE}
# Spatial situation
thematic_data %>%
  ggplot(aes(x = control, y = SSM)) +
  geom_boxplot(fill = "#5964B3", alpha = 0.8) + 
  xlab("Condition") + ylab("Spatial Situation")

# Self-presence / Self-location
thematic_data %>%
  ggplot(aes(x = control, y = SPSL)) +
  geom_boxplot(fill = "#3439C8", alpha = 0.8) + 
  xlab("Condition") + ylab("Self-Presence / Self-Location")

# Suspension of disbelief
thematic_data %>%
  ggplot(aes(x = control, y = SoD)) +
  geom_boxplot(fill = "#004DC3", alpha = 0.8) + 
  xlab("Condition") + ylab("Suspension of Disbelief")
```

```{r thematic-tests}
bartlett.test(SSM ~ control, thematic_data)
t.test(SSM ~ control, thematic_data, alternative = "greater", var.equal = TRUE)

bartlett.test(SPSL ~ control, thematic_data)
t.test(SPSL ~ control, thematic_data, alternative = "greater", var.equal = TRUE)

bartlett.test(SoD ~ control, thematic_data)
t.test(SoD ~ control, thematic_data, alternative = "greater", var.equal = TRUE)
```

It appears that self-presence / self-location is significantly lower in the thematic breach condition. Let's now control for immersive tendencies, as before.

```{r thematic-tests-control}
thematic_data %>%
  anova_test(SSM ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")

thematic_data %>%
  anova_test(SPSL ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")

thematic_data %>%
  anova_test(SoD ~ ITQ + control, type = 2, effect.size = "ges", observed = "ITQ")
```

The previous findings are mirrored in the ANCOVA test.

## Simulator Sickness
We are also interested in whether simulator sickness varies significantly between breaching and control conditions. Because the data is so heavily skewed left, we will begin by performing a log transform.

```{r ss-clean}
ggplot(study_data, aes(x = sim_sick)) +
  geom_histogram(binwidth = 20, fill = "#490415", alpha = 0.8) +
  labs(x = "Simulator Sickness", y = "Frequency")

study_data <- study_data %>% 
  mutate(sim_sick.log = log10(sim_sick+1)) # This is necessary because of the possibility of a 0 value

# Simulator sickness (Response variable)
ggplot(study_data, aes(x = sim_sick.log)) +
  geom_histogram(binwidth = 0.25, fill = "#490415", alpha = 0.8) +
  labs(x="Simulator Sickness [Log-Transformed]", y = "Count")

shapiro.test(study_data$sim_sick.log)
```

```{r ss-viz}
sensory.ss.plot <- study_data %>%
  filter(condition == "SENSORY") %>%
  ggplot(aes(x = control, y = sim_sick.log)) +
  ylim(0, 4) +
  geom_boxplot(fill = "#E69F00", alpha = 0.8) + 
  labs(title="Sensory", x = "Condition", y = "Simulator Sickness")

enviro.ss.plot <- study_data %>%
  filter(condition == "ENVIRONMENTAL") %>%
  ggplot(aes(x = control, y = sim_sick.log)) +
  ylim(0, 4) +
  geom_boxplot(fill = "#56B4E9", alpha = 0.8) +
  labs(title="Environmental", x = "Condition", y = "Simulator Sickness")

thematic.ss.plot <- study_data %>%
  filter(condition == "THEMATIC") %>%
  ggplot(aes(x = control, y = sim_sick.log)) +
  ylim(0, 4) +
  geom_boxplot(fill = "#009E73", alpha = 0.8) + 
  labs(title="Thematic", x = "Condition", y = "Simulator Sickness")

ggarrange(sensory.ss.plot, enviro.ss.plot, thematic.ss.plot,
          ncol = 3)
```

<!-- We can also remove some outliers here as well, using the same IQR method. -->
<!-- ```{r ss-clean2} -->
<!-- study_data.ss_rm_out <- study_data %>% -->
<!--   filter(sim_sick < median(study_data$sim_sick, na.rm=T) + -->
<!--            2*IQR(study_data$sim_sick, na.rm=T)) %>% -->
<!--   filter(sim_sick > median(study_data$sim_sick, na.rm=T) - -->
<!--            2*IQR(study_data$sim_sick, na.rm=T)) -->
<!-- ``` -->

This is certainly better, but we probably still shouldn't assume normality even for the transformed variable. This means using non-parametric tests (below, a Mann-Whitney U Test).

```{r ss-tests}
study_data %>%
  wilcox.test(sim_sick ~ control, data =., correct = FALSE, alternative = "less")

study_data %>% 
  cliffDelta(sim_sick ~ control, data=.)

study_data %>%
  filter(condition== "SENSORY") %>%
  wilcox.test(sim_sick ~ control, data=., correct = FALSE, alternative = "less")

# Calculate Cliff's delta for effect size
study_data %>%
  filter(condition== "SENSORY") %>%
  cliffDelta(sim_sick ~ control, data=.)

study_data %>%
  filter(condition== "ENVIRONMENTAL") %>%
  wilcox.test(sim_sick ~ control, data=., correct = FALSE, alternative = "less")

# Calculate Cliff's delta for effect size
study_data %>%
  filter(condition== "ENVIRONMENTAL") %>%
  cliffDelta(sim_sick ~ control, data=.)

study_data %>%
  filter(condition== "THEMATIC") %>%
  wilcox.test(sim_sick ~ control, data=., correct = FALSE, alternative = "less")

# Calculate Cliff's delta for effect size
study_data %>%
  filter(condition== "THEMATIC") %>%
  cliffDelta(sim_sick ~ control, data=.)

```

It appears that these findings roughly mirror those of the Presence tests, in that only the thematic breach resulted in a significantly higher level of simulator sickness in participants.