-
Notifications
You must be signed in to change notification settings - Fork 0
/
analysis.qmd
295 lines (250 loc) · 14.1 KB
/
analysis.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
---
title: "BV to Weight Analysis"
format:
html:
embed-resources: true
toc: true
execute:
echo: false
warning: false
message: false
---
```{r}
#| label: load-libraries
#| include: false
library(here)
library(tidyverse)
library(readr)
library(modelsummary)
library(stringr)
library(gt)
```
```{r}
#| label: read-data
#| echo: false
units <- read_delim(here("data_raw", "Units.txt"), delim = "|")
units <- units |>
rename(unit_name=Combined, clan=Clan, weight=Weight,
date=`Intro Date`, unit_type=`Unit Type`, bv=BV,
tech=`Tech Rating`) |>
mutate(quality = str_sub(tech, 1, 1),
unit_type = ifelse(unit_type == "Biped Mech" | unit_type == "Quad Mech"
| unit_type == "Tripod Mech" |
unit_type == "Landair Mech", "Mech",
ifelse(unit_type == "Tank" |
unit_type == "Superheavy Tank", "Tank",
ifelse(unit_type == "Support Tank" |
unit_type == "Large Support Tank", "Support Tank",
unit_type))),
unit_type = relevel(factor(unit_type), "Mech"),
group = case_when(
unit_type == "Mech" | unit_type == "ProtoMech" |
unit_type == "Tank" | unit_type == "VTOL" |
unit_type == "Battlearmor" | unit_type == "Infantry" |
unit_type == "Aerospace fighter" |
unit_type == "Conventional Fighter" ~ "Combat",
str_detect(unit_type, "Support") | unit_type == "Small Craft" ~ "Support",
unit_type == "Gun Emplacement" ~ "Gun Emplacement",
TRUE ~ "Large Craft")) |>
select(unit_name, clan, weight, date, quality, unit_type, group, bv)
```
## Introduction
The goal of this analysis is to find a relatively simple equation to describe the predicted BV of a unit by its weight (i.e. tonnage) and unit type. This will allow us to create a "generic" BV for each entity based on a simple equation. This generic BV can be used for balancing forces when you don't want features of unit or pilot quality to determine that balance.
For the analysis, we downloaded information on all units from the current MegaMek database using existing Java features. we combined some unit types together for ease of analysis:
* All mech types were combined togethe, including lams and tripod mechs into a single Mech category.
* Superheavy Tanks were combined with Tanks.
* Large Support Tanks were combined with Support Tanks.
For the actual analysis, we split unit types into grouops of Combat, Support, and Large Craft for ease of analysis.
We considered two basic models for predicting BV. First we considered a linear model:
$$\hat{bv}_i = \beta_0+\beta_1(weight_i)$$
This model predicts BV as a linear function of weight for all units of the same type. The slope ($\beta_1$) tells us the expected increase in BV for a one ton increase in weight. The intercept ($\beta_0$) is the predicted BV when weight is zero - its not a meaningful value in this case, but gives us a starting point for the estimation.
We also considered an "elasticity" model:
$$\log(\hat{bv}_i) = \beta_0+\beta_1*log(weight_i)$$
In this model, we take the natural log of both the dependent and independent variable. The slope ($\beta_1$) in this model can be directly interpreted as the percent change in BV for a one percent increase in tonnage. In short, the model estimates relative change on relative change rather than absolute change on absolute change. The model is more complex but can often fit this kind of data better (as we will show below).
## Combat Units
We focus first on all of the ground combat units, as these are the most likely to need to be randomly generated. @fig-combat-linear-fit shows a scatterplot of weight to BV for every unit, separately by combat unit type. The blue lines indicate the fit of a linear model, while the red lines are LOESS smoothed lines which can detect non-linear relationships.
```{r}
#| label: fig-combat-linear-fit
#| fig-cap: Linear fit of weight to BV for all combat unit types. Blue line shows the linear fit and the red line shows a smoothed line.
units |>
filter(group == "Combat") |>
ggplot(aes(x=weight, y=bv))+
geom_point(alpha=0.2)+
geom_smooth(method="lm", se=FALSE)+
geom_smooth(color="red", se=FALSE)+
facet_wrap(~unit_type, scales = "free")+
theme_bw()
```
A linear fit works pretty well for most of the units in @fig-combat-linear-fit. The two biggest discrepancies are the diminishing returns shape at very high tonnages for mechs and the very non-linear relationship for infantry. The mech issue is likely a result of a single non-canonical 200-ton "Orca" mech that was part of an [April Fool's joke](https://www.sarna.net/wiki/Orca_%28BattleMech%29). The infantry issue is a more difficult issue. Most infantry units have very little weight and so they are concentrated at the lower left of the figure, but we see a spike around 50 tons. This is largely for field gunner units which are both much heavier and more effective. The few dots in the 150-ton range are beast-mounted infantry.
@fig-combat-elasticity-fit provides a similar structure, but on this figure both the x and y axis have been log-scaled. Thus the "linear" fit on these models actually represents the fit of an elasticity model. The fit of this model is very good across all units. Most notably, these transformations largely fixed the problem of nonlinearity for infantry units. We do now observe a slight bit of nonlinearity for Mechs in the low tonnage range, but this is for a very small set of industrial mech units.
```{r}
#| label: fig-combat-elasticity-fit
#| fig-cap: Fit of weight to BV for all combat unit types using a log-scale on both axes. Blue line shows the linear fit and the red line shows a smoothed line.
units |>
filter(group == "Combat") |>
ggplot(aes(x=weight, y=bv))+
geom_point(alpha=0.1)+
geom_smooth(method="lm", se=FALSE)+
geom_smooth(color="red", se=FALSE)+
scale_x_log10()+
scale_y_log10()+
facet_wrap(~unit_type, scales = "free")+
theme_bw()
```
To get a sense of how these models compare, we fit each type of model to each unit type. The results are shown in @tbl-model-linear-combat for the linear models, and @tbl-model-elasticity-combat for the elasticity models.
```{r}
#| label: tbl-model-linear-combat
#| tbl-cap: Linear models predicting unit BV by its weight for all combat unit types.
models <- units |>
filter(group == "Combat") |>
group_by(unit_type) |>
group_split() |>
map(.f = function(x) {
lm(bv~weight, data=x)
})
names(models) <- levels(droplevels(units[units$group == "Combat",]$unit_type))
modelsummary(models, fmt=2, gof_map = c("nobs", "r.squared"), statistic = NULL,
coef_rename = c("(Intercept)" = "Intercept", "weight" = "Tonnage"))
```
```{r}
#| label: tbl-model-elasticity-combat
#| tbl-cap: Elasticity models predicting unit BV by its weight for all combat unit types.
models <- units |>
filter(group == "Combat") |>
group_by(unit_type) |>
group_split() |>
map(.f = function(x) {
lm(log(bv)~log(weight), data=x, family=poisson)
})
names(models) <- levels(droplevels(units[units$group == "Combat",]$unit_type))
modelsummary(models, fmt=3, gof_map = c("nobs", "r.squared"), statistic = NULL,
coef_rename = c("(Intercept)" = "Intercept", "weight" = "Tonnage (logged)"))
```
For all groups, the goodness of fit measure (R2) is either better or effectively the same for the elasticity model in comparison to the linear model In combination with its ability to fix the nonlinearity problem for infantry, these results would strongly recommend the use of an elasticity model to generate generic BV.
The goodness of fit does vary substantially across unit types in both models. While this is perhaps not ideal, it doesn't necessarily represent an underlying problem with the models. Rather, it indicates that generic BV matching will result in more actual BV variation for those unit types.
## Support Units
@fig-support-linear-fit and @fig-support-elasticity-fit plot the same figures as above but for support units.
```{r}
#| label: fig-support-linear-fit
#| fig-cap: Linear fit of weight to BV for all support unit types. Blue line shows the linear fit and the red line shows a smoothed line.
units |>
filter(group == "Support") |>
ggplot(aes(x=weight, y=bv))+
geom_point(alpha=0.6)+
geom_smooth(method="lm", se=FALSE)+
geom_smooth(color="red", se=FALSE)+
facet_wrap(~unit_type, scales = "free")+
theme_bw()
```
```{r}
#| label: fig-support-elasticity-fit
#| fig-cap: Fit of weight to BV for all support unit types using a log-scale on both axes. Blue line shows the linear fit and the red line shows a smoothed line.
units |>
filter(group == "Support") |>
ggplot(aes(x=weight, y=bv))+
geom_point(alpha=0.6)+
geom_smooth(method="lm", se=FALSE)+
geom_smooth(color="red", se=FALSE)+
scale_x_log10()+
scale_y_log10()+
facet_wrap(~unit_type, scales = "free")+
theme_bw()
```
Because we have far fewer units in these classes and they are not necessarily optimized for combat, we get a bit more noisinees in our smoothing measures. However, for both the linear and elasticity case we get decent fits. The elasticity fit seems to improve our model substantially for support tanks in particular.
@tbl-model-linear-support and @btl-model-elasticity-support provide tables analogous to the tables above for combat units. In all four cases, an elasticity model substantially improves fit.
```{r}
#| label: tbl-model-linear-support
#| tbl-cap: Linear models predicting unit BV by its weight for all support unit types.
models <- units |>
filter(group == "Support") |>
group_by(unit_type) |>
group_split() |>
map(.f = function(x) {
lm(bv~weight, data=x)
})
names(models) <- levels(droplevels(units[units$group == "Support",]$unit_type))
modelsummary(models, fmt=2, gof_map = c("nobs", "r.squared"), statistic = NULL,
coef_rename = c("(Intercept)" = "Intercept", "weight" = "Tonnage"))
```
```{r}
#| label: tbl-model-elasticity-support
#| tbl-cap: Elasticity models predicting unit BV by its weight for all support unit types.
models <- units |>
filter(group == "Support") |>
group_by(unit_type) |>
group_split() |>
map(.f = function(x) {
lm(log(bv)~log(weight), data=x)
})
names(models) <- levels(droplevels(units[units$group == "Support",]$unit_type))
modelsummary(models, fmt=3, gof_map = c("nobs", "r.squared"), statistic = NULL,
coef_rename = c("(Intercept)" = "Intercept", "weight" = "Tonnage (logged)"))
```
## Large Craft
@fig-large-craft-linear-fit and @fig-large-craft-elasticity-fit provide figures analogous to those above but for large craft.
The dropshp and space station fits are particularly poor here. The space station is poor because of very few points and one massive outlier. The dropship data also has a couple of outliers that throw off the results. Switching to an elasticity model, however, improves the fit substantially in both cases.
```{r}
#| label: fig-large-craft-linear-fit
#| fig-cap: Linear fit of weight to BV for all large craft unit types. Blue line shows the linear fit and the red line shows a smoothed line.
units |>
filter(group == "Large Craft") |>
ggplot(aes(x=weight, y=bv))+
geom_point(alpha=0.2)+
geom_smooth(method="lm", se=FALSE)+
geom_smooth(color="red", se=FALSE)+
facet_wrap(~unit_type, scales = "free")+
theme_bw()
```
```{r}
#| label: fig-large-craft-elasticity-fit
#| fig-cap: Fit of weight to BV for all large craft unit types using a log-scale on both axes. Blue line shows the linear fit and the red line shows a smoothed line.
units |>
filter(group == "Large Craft") |>
ggplot(aes(x=weight, y=bv))+
geom_point(alpha=0.2)+
geom_smooth(method="lm", se=FALSE)+
geom_smooth(color="red", se=FALSE)+
scale_x_log10()+
scale_y_log10()+
facet_wrap(~unit_type, scales = "free")+
theme_bw()
```
@tbl-model-linear-large-craft and @tbl-model-elasticity-large-craft show tables analogous to those above but for large craft. For Jumpships and Warships we get substantially better fits with the elasticity model. The space station case is the only case of any unit types where the elasticity model actually performs worse. However, given the sparsity of data here, we don't think randomly generating space stations makes much sense anyway.
```{r}
#| label: tbl-model-linear-large-craft
#| tbl-cap: Linear models predicting unit BV by its weight for all large craft unit types.
models <- units |>
filter(group == "Large Craft") |>
group_by(unit_type) |>
group_split() |>
map(.f = function(x) {
lm(bv~weight, data=x)
})
names(models) <- levels(droplevels(units[units$group == "Large Craft",]$unit_type))
modelsummary(models, fmt=3, gof_map = c("nobs", "r.squared"), statistic = NULL,
coef_rename = c("(Intercept)" = "Intercept", "weight" = "Tonnage"))
```
```{r}
#| label: tbl-model-elasticity-large-craft
#| tbl-cap: Elasticity models predicting unit BV by its weight for all large craft unit types.
models <- units |>
filter(group == "Large Craft") |>
group_by(unit_type) |>
group_split() |>
map(.f = function(x) {
lm(log(bv)~log(weight), data=x)
})
names(models) <- levels(droplevels(units[units$group == "Large Craft",]$unit_type))
modelsummary(models, fmt=4, gof_map = c("nobs", "r.squared"), statistic = NULL,
coef_rename = c("(Intercept)" = "Intercept", "weight" = "Tonnage (logged)"))
```
Interestingly, the R2 values are the lowest for dropships. This suggests that regardless of method dropship generation via generic BV will create a high level of variability.
## Gun Emplacements
Gun emplacements need to be handled differently as they currently have no weight. In general, gun emplacements need reworking for many things including balance. For now, we will just use the mean weight of gun emplacements as a static generic BV.
```{r}
temp <- units |>
filter(group == "Gun Emplacement")
mean(temp$bv)
```
## Conclusions
Overall, the results here indicate that we should use an elasticity model for all cases. This improved fit, sometimes substantially, for most units and did not noticeably produce any poor fits.