-
Notifications
You must be signed in to change notification settings - Fork 26
/
packages.Rmd
201 lines (145 loc) · 12.5 KB
/
packages.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# R Packages
Here I will go into a bit of detail regarding <span class="pack">rstanarm</span> and <span class="pack">brms</span>. For standard models, these should be your first choice, rather than using Stan directly. Why? For one, the underlying code that is used will be more optimized and efficient than what you come up with, and also, that code has had multiple individuals developing it and hundreds actually using it. Furthermore, you can still, and probably should, set your priors as you wish.
The nice thing about both is that you use the same syntax that you do for [R modeling](https://m-clark.github.io/R-models/) in general. Here is a a basic GLM in both.
```{r pack_demo, eval=FALSE}
stan_glm(y ~ x + z, data = d)
brm(y ~ x + z, data = d)
```
And here are a couple complexities thrown in to show some minor differences. For example, the priors are specified a bit differently, and you may have options for one that you won't have in the other, but both will allow passing standard arguments, like cores, chains, etc. to <span class="pack">rstan</span>.
```{r pack_demo2, eval=FALSE}
stan_glm(
y ~ x + z + (1|g),
data = d,
family = binomial(link = "logit"),
prior = normal(0, 1),
prior_covariance = decov(regularization = 1, concentration = .5),
QR = TRUE,
chains = 2,
cores = 2,
iter = 2000
)
brm(
y ~ x + z + (1|g),
data = d,
family = bernoulli(link = 'logit'),
prior = prior(normal(0, 1), class = b, cauchy(0, 5), class = sd),
chains = 2,
cores = 2,
iter = 2000
)
```
So the syntax is easy to use for both of them, and to a point identical to standard R modeling syntax, and both have the same <span class="pack">rstan</span> arguments. However, you'll need to know what's available to tweak and how to do so specifically for each package.
## Standard Regression and GLM
A good starting point for getting more comfortable with Bayesian analysis is to use it on what you're already more comfortable with, e.g. the standard linear or generalized linear model, and <span class="pack">rstanarm</span> and <span class="pack">brms</span> both will do this for you. In general, for these models I would suggest <span class="pack">rstanarm</span>, as it will run much faster and is optimized for them.
It's not a good thing that for the most common linear models R has multiple functions and even an additional packages. So we have the following for standard linear, glm, and categorical models:
- <span class="func">aov</span>: ANOVA
- <span class="func">lm</span>: standard regression (linear model)
- <span class="func">glm</span>: generalized linear model
- <span class="pack">MASS</span>::<span class="func">glm.nb</span>: negative binomial for count data
- <span class="pack">MASS</span>::<span class="func">polr</span>: ordinal regression model
- <span class="pack">nnet</span>::<span class="func">nnet</span>: multinomial regression model
- <span class="pack">biglm</span>::<span class="func">biglm</span>: big data lm
<span class="pack">rstanarm</span> keeps this nomenclature unfortunately, and currently doesn't offer anything for [multinomial models](https://github.com/stan-dev/rstanarm/issues/20). Thus we have:
- <span class="func">stan_aov</span>: ANOVA
- <span class="func">stan_lm</span>: standard regression (linear model)
- <span class="func">stan_glm</span>: generalized linear model
- <span class="func">stan_glm.nb</span>: negative binomial for count data or neg_binomial_2 family for <span class="func">stan_glm</span>
- <span class="func">stan_polr</span>: ordinal regression model
- <span class="func">stan_biglm</span>: big data lm
Contrast this with <span class="pack">brms</span>, which only requires the <span class="func">brm</span> function and appropriate family, e.g. 'poisson' or 'categorical', and which can do multinomial models also.
However, if you want to do a standard linear regression, I would probably not recommend using <span class="func">stan_lm</span>, as it requires a prior for the $R^2$, which is unfamiliar and only explained in technical ways that are likely going to be lost on those less comfortable with, or new to, statistical or Bayesian analysis[^stan_lm_vignette]. The good news is that you can simply run <span class="func">stan_glm</span> instead, and work with the prior on the regression coefficients as we have discussed, and you can use <span class="func">bayes_R2</span> to get the $R^2$.
You can certainly use <span class="pack">brms</span> for GLM, but it would have to compile the code first, and so will always be relatively, but not prohibitively, slower. For linear models with interactions or GLM generally, you may prefer it for the marginal effects plots and other additional functionality.
## Categorical Models
If you're just doing a standard logistic regression, I'd suggest <span class="func">stan_glm</span>, again, for the speed. In addition, it has a specific model function for conditional logistic regression (<span class="func">stan_clogit</span>). Beyond that, I'd recommend <span class="pack">brms</span>. For ordinal regression, <span class="func">stan_polr</span> goes back to requiring a prior for $R^2$, which is now the $R^2$ for the underlying latent variable of the ordinal outcome[^r2_polr]. Furthermore, <span class="pack">brms</span> has some ordinal-specific plots, as well as other types of ordinal regression (e.g. adjacent category) that allow the proportional odds assumption to be relaxed. It also can do multi-category models.
```{r ordinal, eval=FALSE}
brm(y ~ x, family = 'categorical') # nominal outcome with > 2 levels
brm(y ~ cs(x), family = 'acat') # ordinal model with category-specific effects for x
```
<span class="pack">brms</span> families for categorical:
- <span class="func">bernoulli</span>: binary target
- <span class="func">categorical</span>: nominal target
- <span class="func">cumulative</span>, <span class="func">sratio</span>, <span class="func">cratio</span>, and <span class="func">acat</span>: ordinal outcome (cumulative, stopping ratio, continuation-ratio, adjacent category)
## Extended Count Models
For going beyond binomial, poisson, and negative binomial distributions for count data, <span class="pack">brms</span> has a lot more for common extensions to those models, and beyond. It also has zero-altered counterparts to continuous outcomes (e.g. <span class="func">hurdle_gamma</span>).
- <span class="func">hurdle_poisson</span>
- <span class="func">hurdle_negbinomial</span>
- <span class="func">hurdle_gamma</span>
- <span class="func">hurdle_lognormal</span>
- <span class="func">zero_inflated_poisson</span>
- <span class="func">zero_inflated_negbinomial</span>
- <span class="func">zero_inflated_binomial</span>
- <span class="func">zero_inflated_beta</span>
- <span class="func">zero_one_inflated_beta</span>
As mentioned previously, there is currently no direct way to do multinomial count models[^nomulti] except via the poisson
## Mixed Models
The Bayesian approach really shines for mixed models in my opinion, where the random effects are estimated like other parameters, and so complicated structures are notably easier to deal with, and extending such models to other distribution families is straightforward. For the usual speed boost you can use <span class="pack">rstanarm</span>:
- <span class="func">stan_lmer</span>: standard <span class="pack">lme4</span> style mixed model
- <span class="func">stan_glmer</span>: glmm
- <span class="func">stan_glmer.nb</span>: for negative binomial
- <span class="func">stan_nlmer</span>: <span class="pack">nlme</span> (but see stan_gamm4)
- <span class="func">stan_mvmer</span>: multivariate outcome
- <span class="func">stan_gamm4</span>: generalized additive mixed model in <span class="pack">lme4</span> style
I would probably just recommend <span class="pack">rstanarm</span> for stan_lmer and stan_glmer, as <span class="pack">brms</span> has more flexibility, and even would be recommended for the standard models if you want to estimate residual (co-)variance structure, e.g. autocorrelation. It also will do multivariate models, and one can use <span class="pack">mgcv</span>::<span class="func">s</span> for smooth terms in any <span class="pack">brms</span> model.
## Other Models and Related
Along with all those <span class="pack">rstanarm</span> has specific functions for [beta regression](http://mc-stan.org/rstanarm/articles/betareg.html), [joint mixed/survival models](http://mc-stan.org/rstanarm/articles/jm.html), and [regularized linear regression](http://mc-stan.org/rstanarm/articles/lm.html). <span class="pack">brms</span> has many more distributional families, can do hypothesis testing[^], has marginal effects plots, and more. Both have plenty of tools for diagnostics, posterior predictive checks, and more of what has been discussed previously.
In general, <span class="pack">rstanarm</span> is a great tool for translating your standard models into Bayesian ones in an efficient fashion. On the other hand, <span class="pack">brms</span> uses a simplified syntax and is notably more flexible. Here is a brief summary of my recommend use for common regression models.
```{r arm_brms_comparison, echo=FALSE}
tibble(
Analysis = c(
'lm',
'glm',
'multinomial',
'ordinal',
'mixed',
'additive',
'regularized',
'beyond'
),
rstanarm = c('√', '√', '', '', '√', '√', '√', ''),
brms = c('', '', '√', '√', '√', '√', '√', '√')
) %>%
kable(width = '50%', align = 'lcc') %>%
kable_styling(full_width = FALSE)
```
Besides that, if you still need to model complexity not found within those, you can still use them to generate some highly optimized starter code, as they have functions for solely generating the underlying Stan code.
## Even More Packages
I've focused on the two widely-used general-purpose packages, but nothing can stop Stan at this point. As of the latest update to this document, there are now `r length(devtools::revdep('rstan'))` other packages depending on <span class="pack">rstan</span> in some way. There are also interfaces for Python, Julia, and more. Odds are good you'll find one to suit your needs.
```{r stan_getgraph, eval=FALSE, echo=FALSE}
# no longer works, and wasn't easily duplicated with e.g. devtools::revdep
get_depgraph = function(pack) {
require(httr)
url = paste0('http://rdocumentation.org/api/packages/', pack, '/reversedependencies')
out = GET(url) %>%
content()
node_df = data.table::rbindlist(out$nodes, fill = TRUE)
edge_df = data.table::rbindlist(out$links, fill = TRUE)
list(node_df=node_df, edge_df=edge_df)
}
stan_depgraph = get_depgraph('rstan')
```
```{r stan_graph, echo=F, eval=F, cache=FALSE}
library(visNetwork)
nodes = stan_depgraph$node_df %>%
rename(label=name) %>%
rownames_to_column(var='id') %>%
mutate(id=as.integer(id)-1)
edges = stan_depgraph$edge_df %>%
rename(from=source,
to=target)
listed_packs = c('brms', 'bayesplot', 'rstanarm', 'shinystan', 'loo', 'tidybayes', 'rethinking')
edges_trim = edges %>%
filter(to == 0) %>%
mutate(value = if_else(from %in% nodes$id[nodes$label %in% listed_packs], 30L, value),
length = if_else(from %in% nodes$id[nodes$label %in% listed_packs], 0, 250))
nodes_trim = nodes %>%
filter(id %in% c(0, edges_trim$from)) %>%
mutate(value=c(100, rep(0, nrow(.)-1)),
value = if_else(label %in% listed_packs, 30, value))
visNetwork(nodes = nodes_trim,
edges=edges_trim) %>%
visEdges(color=list(opacity=.10),
# length = c(200,10),
physics = TRUE)
```
[^stan_lm_vignette]: The developers note in their [vignette for ANOVA](https://cran.r-project.org/web/packages/rstanarm/vignettes/aov.html):<br><br>'but it is reasonable to expect a researcher to have a plausible guess for R2 before conducting an ANOVA.'<br><br> Actually, I'm not sure how reasonable this is. In consulting I've seen many, many researchers of varying levels of expertise, and I don't think even a large minority of them would be able to hazard much of a guess about $R^2$ before running a model, unless they're essentially duplicating previous work. I also haven't come across an explanation in the documentation (which is otherwise great) of how to specify it that would be very helpful to people just starting out with Bayesian analysis or even statistics in general. If the result is that one then has to try a bunch of different priors, then that becomes the focus of the analytical effort, which likely won't appeal to people just wanting to run a standard regression model.
[^r2_polr]: If someone tells me they know what the prior should be for that, I probably would not believe them.