-
Notifications
You must be signed in to change notification settings - Fork 1
/
04-mnl.Rmd
1516 lines (1224 loc) · 67.5 KB
/
04-mnl.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# The Multinomial Logit Model
```{r setup}
library(tidyverse)
library(kableExtra)
#' Create a tibble object that evaluates the named expressions passed
#' in as utility equation.
utility_table <- function(expressions){
top <- tibble(
Alternative = names(expressions),
Expression = expressions
) %>%
rowwise() %>%
mutate(
Value = eval(parse(text = Expression)),
Exponent = exp(Value)
) %>%
ungroup() %>%
mutate( Probability = Exponent / sum(Exponent) )
bottom <- tibble(
Alternative = "Total",
Expression = "",
Value = NA,
Exponent = sum(top$Exponent),
Probability = sum(top$Probability)
)
bind_rows(top, bottom)
}
```
## Overview Description and Functional Form
The mathematical form of a discrete choice model is determined by the
assumptions made regarding the error components of the utility function for each
alternative as described in section 3.5. The specific assumptions that lead to
the Multinomial Logit Model are (1) the error components are extreme-value (or
Gumbel) distributed, (2) the error components are identically and independently
distributed across alternatives, and (3) the error components are identically
and independently distributed across observations/individuals. We discuss each
of these assumptions below.
The most common assumption for error distributions in the statistical and
modeling literature is that errors are distributed normally. There are good
theoretical and practical reasons for using the normal distribution for many
modeling applications. However, in the case of choice models the normal
distribution assumption for error terms leads to the Multinomial Probit Model
(MNP) which has some properties that make it difficult to use in choice
analysis.[^numericalproblems] The Gumbel distribution is selected because it
has computational advantages in a context where maximization is important,
closely approximates the normal distribution (see Figure \@ref(fig:gumbelpdf)
and Figure \@ref(fig:gumbelcdf)) and
produces a closed-form[^withoutnumint] probabilistic choice model.
```{r gumbelpdf, fig.cap="Probability density function for normal and Gumbel distributions."}
pdf <- tibble(
x = seq(-5, 5, by = 0.01),
Normal = dnorm(x, sd = sqrt(pi^2/6)),
Gumbel = exp(-x) * exp(-exp(-x))
)
ggplot(pdf %>% gather(key = "Distribution", value = "Probability", -x),
aes(x = x, y = Probability, color = Distribution)) +
geom_line() +
theme_bw()
```
```{r gumbelcdf, fig.cap = "Cumulative density function for normal and Gumbel distributions."}
cdf <- tibble(
x = seq(-4, 5, by = 0.01),
Normal = pnorm(x, sd = sqrt(pi^2 / 6)),
Gumbel = exp(-exp(-x))
)
ggplot(cdf %>% gather(key = "Distribution", value = "Probability", -x),
aes(x = x, y = Probability, color = Distribution)) +
geom_line() +
theme_bw()
```
The Gumbel has the following cumulative distribution and probability density
functions:
\begin{equation}
F(\epsilon) = e^{e^{-\mu(\epsilon-\eta)}}
(\#eq:gumbelcumdist)
\end{equation}
\begin{equation}
f(\epsilon) = \mu e^{-\mu(\epsilon-\eta)} \times e^{e^{-\mu(\epsilon-\eta)}}
(\#eq:gumbelprobdens)
\end{equation}
where
- $\mu$ is the scale parameter which determines the variance of the distribution and
- $\eta$ is the location (mode) parameter.
The mean and variance of the distribution are:
\begin{equation}
Mean = \eta + \frac{0.577}{\mu}
(\#eq:meandistribution)
\end{equation}
\begin{equation}
Variance = \frac{\pi^2}{6\mu^2}
(\#eq:variancedistribution)
\end{equation}
The second and third assumptions state the location and variance of the
distribution just as $\mu$ and $\sigma^2$ indicate the location and variance of
the normal distribution. We will return to the discussion of the independence
between/among alternatives in [CHAPTER 8](#nested-logit-model).
The three assumptions, taken together, lead to the mathematical structure known
as the Multinomial Logit Model (MNL), which gives the choice probabilities of
each alternative as a function of the systematic portion of the utility of all
the alternatives. The general expression for the probability of choosing an
alternative ‘*i*’ (*i = 1,2,.., J*) from a set of *J* alternatives is:
\begin{equation}
Pr(i) = \frac{\exp(V_i)}{\sum_{j=1}^{J}\exp(V_j)}
(\#eq:mnl)
\end{equation}
Where
- $Pr(i)$ is the probability of the decision-make choosing alternative *i* and
- $V_j$ is the systematic component of the utility of alternative *j*.
The exponential function is described in Figure \@ref(fig:viwithexpVi) which shows the relationship
between $\exp(V_i)$ and $V_i$. Note that $\exp(V_i)$ is always positive and
increases monotonically with $V_i$.
```{r viwithexpVi, fig.cap="Relationship Between $V_i$ and $\\exp(V_i)$"}
tibble( x = seq(-3,3,.1), y = exp(x)) %>%
ggplot(aes(x = x, y = y)) +
geom_line() +
xlab(bquote(V[i])) +
ylab(bquote(exp(V[i]))) +
theme_bw()
```
The multinomial logit (MNL) model has several important properties. We
illustrate these for a case in which the decision maker has three available
alternatives: Drive Alone (DA), Shared Ride (SR), and TRansit (TR). The
probabilities of each alternative are given by modifying equation 4.5 for each
alternative to obtain:
\begin{equation}
Pr(DA) = \frac{\exp(V_{DA})}{\exp(V_{DA}) + \exp(V_{SR}) + \exp(V_{TR})}
(\#eq:prDAformnl)
\end{equation}
\begin{equation}
Pr(SR) = \frac{\exp(V_{SR})}{\exp(V_{DA}) + \exp(V_{SR}) + \exp(V_{TR})}
(\#eq:prSRformnl)
\end{equation}
\begin{equation}
Pr(TR) = \frac{\exp(V_{TR})}{\exp(V_{DA}) + \exp(V_{SR}) + \exp(V_{TR})}
(\#eq:prTRformnl)
\end{equation}
where $Pr(DA)$, $Pr(SR)$, and $Pr(TR)$ are the probabilities of the
decision-maker choosing drive alone, shared ride and transit, respectively, and
$V_{DA}$, $V_{SR}$ and $V_{TR}$ are the systematic components of the utility for
drive alone, shared ride, and transit alternatives, respectively. It is common
to replace these three equations by a single general equation to represent the
probability of any alternative and to simplify the equation by replacing the
explicit summation in the denominator by the summation over alternatives as:
\begin{equation}
Pr(i) = \frac{\exp(V_{i})}{\exp(V_{DA}) + \exp(V_{SR}) + \exp(V_{TR})}
(\#eq:mnlprob)
\end{equation}
\begin{equation}
Pr(i) = \frac{\exp(V_{i})}{\sum_{j=DA,SR,TR}\exp(V_j)}
(\#eq:mnlprobgeneral)
\end{equation}
where *i* indicates the alternative for which the probability is being computed.
This formulation implies that the probability of choosing an alternative
increases monotonically with an increase in the systematic utility of that
alternative and decreases with increases in the systematic utility of each of
the other alternatives. This is illustrated in Table \@ref(tab:dralonefordralone) showing the
probability of DA as a function of its own utility (with the utilities of other
alternatives held constant) and in Table \@ref(tab:drloneforshrridetransit)
as a function of the utility of other alternatives with its own utility fixed.
```{r dralonefordralone}
tibble(
case = 1:5,
vda = c(-3.0, -1.5, 0.0, 1.5, 3.0),
vsr = -1.5,
vtr = -0.5
) %>%
mutate(
p = exp(vda) / (exp(vda) + exp(vsr) + exp(vtr))
) %>%
kableExtra::kbl(
caption = "Probability Values for Drive Alone as a Function of Drive Alone Utility (Shared Ride and Transit Utilities held constant)",
col.names = c("$Case$", "$V_{DA}$", "$V_{SR}$", "$V_{TR}$", "$Pr(DA)$")) %>%
kableExtra::kable_styling()
```
```{r draloneforshrridetransit}
tibble(
case = 6:11,
vda = 0.0,
vsr = c(-1.5, -1.5, -1.5, -0.5, -0.5, -0.5),
vtr = c(-1.5, -1.0, -0.5, -1.5, -1.0, -0.5)
) %>%
mutate(
p = exp(vda) / (exp(vda) + exp(vsr) + exp(vtr))
) %>%
kableExtra::kbl(
caption = "Probability Values for Drive Alone as a Function of Shared Ride and Transit Utilties",
col.names = c("$Case$", "$V_{DA}$", "$V_{SR}$", "$V_{TR}$", "$Pr(DA)$")) %>%
kableExtra::kable_styling()
```
We use this three-alternative example to illustrate three important properties
of the MNL: (1) its sigmoid or *S* shape, (2) dependence of the alternative
choice probabilities on the differences in the systematic utility and (3)
independence of the ratio of the choice probabilities of any pair of
alternatives from the attributes and availability of other alternatives.
### The Sigmoid or S shape of Multinomial Logit Probabilities
The *S* shape of the MNL probabilities is illustrated in Figure
\@ref(tab:choiceprob) where the
probability of choosing Drive Alone is shown as a function of its own utility,
with the utilities of the other alternatives held constant. The *S*-shape
limits the probability range between zero when the utility of DA is very low,
relative to other alternatives, and one when the utility of DA is very high,
relative to other alternatives. This function has very gradual slope at extreme
values of DA utility, relative to the other alternatives, and is much steeper
when its utility reaches a value such that its choice probability is close to
one-half. This implies that if the representative utility of one alternative is
very low or very high, compared with the others, a small increase in the utility
of this alternative will not substantially affect its probability of being
chosen. The point at which an increase in the representative utility of an
alternative has the greatest effect on its probability of being chosen (*i.e.*,
the point of maximum slope along the curve) is when its representative utility
is equivalent to the combined utility of the other alternatives. When this is
true, a small increase in the utility of one alternative can ‘tip the balance’
and induce a large increase in the probability of the alternative being chosen
(Train, 1993).
```{r mnl_prob}
#' Function to compute mnl probabilities
#' @param vi utility of target alternative
#' @param ... utilities of other alternatives
mnl_prob <- function(vi, ...){
vj <- c(...)
exp(vi) / sum(exp(vj))
}
```
```{r choiceprob, fig.cap = "Probability of choosing alternatives while holding other utilities constant, $V_{transit} = 0, V_{walk} = -0.5$"}
u <- tibble(
v_da = seq(-3, 3, by = 0.01),
v_tr = 0, v_walk = -0.5
) %>%
rowwise() %>%
mutate(
drive = mnl_prob(v_da, v_da, v_tr, v_walk),
transit = mnl_prob(v_tr, v_da, v_tr, v_walk),
walk = mnl_prob(v_walk, v_da, v_tr, v_walk)
)
ggplot(u %>% gather(Mode, Probability, drive, transit, walk),
aes(x = v_da, y = Probability, color = Mode)) +
geom_line() +
xlab("Utility of Drive") +
theme_bw()
```
### The Equivalent Differences Property
A fundamental property of the multinomial logit and other choice models is that
the choice probabilities of the alternatives depend only on the differences in
the systematic utilities of different alternatives and not their actual values.
This can be illustrated in two ways. First, we show that the choice probability
equations are unchanged if the same incremental value, say $\Delta V$, is added
to the utility of each alternative. The original probabilities for the three
alternatives in the example are given by:
\begin{equation}
Pr(i) = \frac{\exp(V_{i})}{\exp(V_{DA}) + \exp(V_{SR}) + \exp(V_{TR})}
(\#eq:originalprob)
\end{equation}
where *i* is the alternative for which the probabilities are being computed.
Adding $\Delta V$ to the systematic components of $V_{DA}$, $V_{SR}$ and
$V_{TR}$ gives[^sumexp]:
\begin{align*}
Pr(i)&= \frac{\exp(V_i + {\Delta V })}
{\exp(V_{DA}+{\Delta V }) + \exp(V_{SR}+{\Delta V }) + \exp(V_{TR}+{\Delta V })} \\
&= \frac{\exp(V_i) \times \exp({\Delta V })}
{\exp(V_{DA}) \times \exp({\Delta V }) + \exp(V_{SR}) \times \exp({\Delta V }) + \exp(V_{TR}) \times \exp({\Delta V })} \\
&= \frac{\exp(V_i) \times \exp({\Delta V })}{[\exp(V_{DA}) + \exp(V_{SR}) + \exp(V_{TR})] \times \exp({\Delta V })} \\
&= \frac{\exp(V_i)}{[\exp(V_{DA}) + \exp(V_{SR}) + \exp(V_{TR})]}
(\#eq:probdeltav)
\end{align*}
which is the same probability as before $\Delta V$ was added to each of the
utilities. This result applies to any value of $\Delta V$. We also illustrate
this property through use of a numerical example for the three alternative
choice problem used earlier. The following equations represent the case when
the utility values for Drive Alone, Shared Ride and Transit equal 0.5, 1.5 and
3.0, respectively:
\begin{equation}
Pr(DA) = \frac {\exp(-0.5)}{\exp(-0.5) + \exp(-1.5) + \exp(-3.0)} = 0.690
(\#eq:DAutilityvalues)
\end{equation}
\begin{equation}
Pr(SR) = \frac {\exp(-1.5)}{\exp(-0.5) + \exp(-1.5) + \exp(-3.0)} = 0.254
(\#eq:SRutilityvalues)
\end{equation}
\begin{equation}
Pr(TR) = \frac {\exp(-3.0)}{\exp(-0.5) + \exp(-1.5) + \exp(-3.0)} = 0.057
(\#eq:TRutilityvalues)
\end{equation}
Similarly, if the utility of each alternative is increased by one, the
probabilities are:
\begin{equation}
Pr(DA) = \frac {\exp(0.5)}{\exp(0.5) + \exp(-0.5) + \exp(-2.0)} = 0.690
(\#eq:DAplusoneutilityvalues)
\end{equation}
\begin{equation}
Pr(SR) = \frac {\exp(-0.5)}{\exp(0.5) + \exp(-0.5) + \exp(-2.0)} = 0.254
(\#eq:SRplusoneutilityvalues)
\end{equation}
\begin{equation}
Pr(TR) = \frac {\exp(-2.0)}{\exp(0.5) + \exp(-0.5) + \exp(-2.0)} = 0.057
(\#eq:TRplusoneutilityvalues)
\end{equation}
As expected, the choice probabilities are identical to those obtained before the
addition of the constant utility to each mode. The calculations supporting this
comparison are shown in Table 4.3 and Table 4.4. Table 4.3 shows the
computation of the choice probabilities based on the initial set of modal
utilities and Table 4.4 shows the same computation after each of the utilities
is increased by one.[^illustratecalcs]
```{r tabel4-3, echo = F}
utility_table(
c("Drive Alone" = "-0.50","Shared Ride" = "-1.50","Transit" = "-3.00")
) %>%
kbl(caption = "Numerical Example Illustrating Equivalent Difference Property: Probability of Each Alternative Before Adding Delta") %>%
kable_styling() %>%
add_header_above(c(" "=1, "Utility" = 2, " " = 2), align = "center")
```
where the sum of the exponent variable is equal to 0.879.
```{r tabel4-4, echo = F}
utility_table(
c("Drive Alone" = "-0.50 + 1.00",
"Shared Ride" = "-1.50 + 1.00",
"Transit" = "-3.00 + 1.00")
) %>%
kbl(caption = "Numerical Example Illustrating Equivalent Difference Property: Probability of Each Alternative After Adding Delta (=1.0)") %>%
kable_styling() %>%
add_header_above(c(" "=1, "Utility" = 2, " " = 2), align = "center")
```
Where the sum of the exponent variable is equal to 2.391
The expression for the probability equation of the logit model (equation 4.9)
can also be presented in a different form which makes the equivalent difference
property more apparent. For the drive alone alternative, this expression can be
obtained by multiplying the numerator and denominator of the standard
probability expression by \exp(-VDA) as shown in the following equations.
\usepackage{amsmath}
\begin{equation}
\begin{split}
Pr(DA) &= \frac{\exp(V_{DA})}{\exp(V_{DA}) + \exp(V_{SR}) + \exp(V_{TR})} \times \frac {\exp(-V_{DA})}{\exp(-V_{DA})}\\
&= \frac {\exp(V_{DA})\times \exp(-V_{DA})}{[\exp(V_{DA}) + \exp(V_{SR}) + \exp(V_{TR})] \times \exp(-V_{DA})}\\
&= \frac {\exp(0)}{\exp(0) + \exp(V_{SR}-V_{DA}) + \exp(V_{TR}-V_{DA})}
\end{split}
(\#eq:fournineteen)
\end{equation}
which simplifies to:
\begin{equation}
Pr(i) = \frac{1}{\exp(0) + \exp(V_{SR}-V_{DA}) + \exp(V_{TR}-V_{DA})}
(\#eq:fourtwenty)
\end{equation}
This formulation explicitly shows that the probability of the drive alone
alternative is a function of the differences in systematic utility between the
drive alone alternative and each other alternative. This can be applied to the
general case for alternative $i$ which can be represented in terms of the
pairwise difference in its utility and the utility of each of the other
alternatives by the following equation:
\begin{equation}
Pr(i) = \frac{1}{1 + \sum_{j \ne i}\exp(V_j - V_i)} \qquad \forall i \in J
(\#eq:fourtwentyone)
\end{equation}
#### Implication of Constant Differences for Alternative Specific Constants and Variables
The constant difference property of logit models has an important implication
for the specification of the utilities of the alternatives. Recall that the
systematic portion of the utility of an individual, ‘$t$’, and alternative ‘$i$’
is the sum of decision-maker related bias, mode attribute related utility, and
interactions between these. That is:
\begin{equation}
V_{DA} = V_{DA}(S_t) + V(X_{DA}) + V(S_t,X_{DA})
(\#eq:fourtwentytwo)
\end{equation}
\begin{equation}
V_{SR} = V_{SR}(S_t) + V(X_{SR}) + V(S_t,X_{SR})
(\#eq:fourtwentythree)
\end{equation}
\begin{equation}
V_{TR} = V_{TR}(S_t) + V(X_{TR}) + V(S_t,X_{TR})
(\#eq:fourtwentyfour)
\end{equation}
Each term on the right hand side of each equation can be replaced by an explicit
function of the relevant variables. For example, if the decision-maker
preferences are a function of income, attribute based utility is a function of
travel time and there are no interaction terms, the utility function becomes:
\begin{equation}
V_{DA} = \beta_{DA,0} + \beta_{DA,1} \times Income_t + \gamma \times TT_{DA}
(\#eq:fourtwentyfive)
\end{equation}
\begin{equation}
V_{SR} = \beta_{SR,0} + \beta_{SR,1} \times Income_t + \gamma \times TT_{SR}
(\#eq:fourtwentysix)
\end{equation}
\begin{equation}
V_{TR} = \beta_{TR,0} + \beta_{TR,1} \times Income_t + \gamma \times TT_{TR}
(\#eq:fourtwentyseven)
\end{equation}
and the differences between pairs of alternatives for prediction of DA
probability become:
\begin{equation}
\begin{split}
V_{SR} - V_{DA} = (\beta_{SR,0} - \beta_{DA,0}) + \beta_{SR,1} - \beta_{DA,1})\\ \times Income_t + \gamma \times (TT_{SR} - TT_{DA})
\end{split}
(\#eq:fourtwentyeight)
\end{equation}
\begin{equation}
\begin{split}
V_{TR} - V_{DA} = (\beta_{TR,0} - \beta_{DA,0}) + \beta_{TR,1} - \beta_{DA,1})\\ \times Income_t + \gamma \times (TT_{TR} - TT_{TA})
\end{split}
(\#eq:fourtwentynine)
\end{equation}
It is not possible to estimate all of the constants; $\beta_{DA,0}$,
$\beta_{SR,0}$ and $\beta_{TR,0}$; and all of the income parameters;
$\beta_{DA,1}$, $\beta_{SR,1}$ and $\beta_{TR,1}$; in these equations because
adding any algebraic value to each of the constants or to each of the income
parameters does not cause any change in the probabilities of any of the
alternatives. This phenomenon is common to all utility-based choice models and
follows directly from the equivalent differences property discussed above. The
solution to this problem is to place a single constraint on each set of
parameters; in this case, the constants and the income parameters. Any
constraint can be adopted for each set of parameters; however, the simplest and
most widely used is to set the preference related parameters for one
alternative, called the base or reference alternative, to zero and to
re-interpret the remaining parameters to represent preference differences
relative to the base alternative.
The selection of the reference alternative is arbitrary and does not affect the
overall quality or interpretation of the model; however, the equations and the
estimation results will appear to be different. For example, if we set TRansit
as the reference alternative by setting $\beta_{TR,0}$ and $\beta_{TR,1}$ equal
to zero, the utility functions become:
\begin{equation}
V_{DA,t} = \beta_{DA-TR,0} + \beta_{DA-TR,1} \times Inc_t + \gamma \times TT_{DA}
(\#eq:fourthirty)
\end{equation}
\begin{equation}
V_{SR,t} = \beta_{SR-TR,0} + \beta_{SR-TR,1} \times Inc_t + \gamma \times TT_{SR}
(\#eq:fourthirtyone)
\end{equation}
\begin{equation}
V_{TR,t} = \qquad \qquad \qquad 0 \qquad \qquad \qquad + \gamma \times TT_{TR}
(\#eq:fourthirtytwo)
\end{equation}
where the modified notation for the remaining constants and income parameters is
used to emphasize that these parameters are ‘relative to the TRansit
alternative.’ Alternatively, if we select Drive Alone as the reference
alternative, we obtain:
\begin{equation}
V_{DA,t} = \qquad \qquad \qquad 0 \qquad \qquad \qquad + \gamma \times TT_{DA}
(\#eq:fourthirtythree)
\end{equation}
\begin{equation}
V_{SR,t} = \beta_{SR-DA,0} + \beta_{SR-DA,1} \times Inc_t + \gamma \times TT_{SR}
(\#eq:fourthirtyfour)
\end{equation}
\begin{equation}
V_{TR,t} = \beta_{TR-DA,0} + \beta_{TR-DA,1} \times Inc_t + \gamma \times TT_{TR}
(\#eq:fourthirtyfive)
\end{equation}
where the constants and income parameters are relative to the Drive Alone alternative.
These two models are equivalent as shown in Table \@ref(transit-base) and Table \@ref(da-base) which
correspond to the TRansit reference and Drive Alone reference examples,
respectively, for an individual from a household with $50,000 annual income and
facing travel times of 30, 35 and 50 minutes for Drive Alone, Shared Ride and
TRansit, respectively.
```{r transit-base}
utility_table(
c("Drive Alone" = "1.1 + 0.008 * 50 - 0.02 * 30",
"Shared Ride" = "0.8 + 0.006 * 50 - 0.02 * 35",
"Transit" = "0.0 + 0.000 * 50 - 0.02 * 50")
) %>%
kableExtra::kbl(caption = "Utility and Probability calculation with TRansit as Base Alternative") %>%
kable_styling() %>%
add_header_above(c(" "=1, "Utility" = 2, " " = 2), align = "center")
```
where the sum of the exponent variable equals 4.319.
```{r da-base}
utility_table(
c( "Drive Alone" = "0.0 + 0.000 * 50 - 0.02 * 30",
"Shared Ride" = "-0.3 - 0.002 * 50 - 0.02 * 35",
"Transit" = "-1.1 - 0.008 * 50 - 0.02 * 50")
) %>%
kableExtra::kbl(caption = "Utility and Probability calculation with TRansit as Base Alternative") %>%
kable_styling() %>%
add_header_above(c(" "=1, "Utility" = 2, " " = 2), align = "center")
```
where the sum of the exponent variable equals 0.964.
As expected, the resultant probabilities are identical in both cases. Table 4 7
shows that the differences in the alternative specific constants and income
parameters between alternatives are the same for the TRansit base case and the
Drive Alone base case.
```{r compareutilities}
tibble(
"Alternative"= c("Drive Alone", "Shared Ride", "Transit"),
"Constant" = c("1.1","0.8","0.0"),
"Income" = c("0.008", "0.006", "0.000"),
"Constant " = c("-1.1","-1.1","-1.1"),
"Income " = c("-0.008", "-0.008", "-0.008"),
"Constant " = c("0.0","-0.3","-1.1"),
"Income " = c("0.000", "-0.002", "-0.008")) %>%
kbl(caption = "Changes in Alternative Specific Constants and Income Parameters") %>%
kable_styling() %>%
add_header_above(c(" " = 1, "TRansit as Base Alternative"= 2, "Change in Parameters" = 2, "Drive Alone as Base Alternative" = 2), align = "center")
```
## Independence of Irrelevant Alternatives Property {#IIA-section}
One of the most widely discussed aspects of the multinomial logit model is its
independence from irrelevant alternatives (IIA) property. The IIA property
states that for any individual, the ratio of the probabilities of choosing two
alternatives is independent of the presence or attributes of any other
alternative. The premise is that other alternatives are irrelevant to the
decision of choosing between the two alternatives in the pair. To illustrate
this, consider a multinomial logit model for the choice among three intercity
travel modes – automobile, rail, and bus. The probability of choosing
automobile, rail and bus are:
\begin{equation}
Pr(Auto) = \frac{\exp(V_{Auto})}{\exp(V_{Auto})+\exp(V_{Bus})+\exp(V_{Rail})}
(\#eq:probchooseauto)
\end{equation}
\begin{equation}
Pr(Bus) = \frac{\exp(V_{Bus})}{\exp(V_{Auto})+\exp(V_{Bus})+\exp(V_{Rail})}
(\#eq:probchoosebus)
\end{equation}
\begin{equation}
Pr(Rail) = \frac{\exp(V_{Rail})}{\exp(V_{Auto})+\exp(V_{Bus})+\exp(V_{Rail})}
(\#eq:probchooserail)
\end{equation}
The ratios of each pair of probabilities are:
\begin{equation}
\frac{Pr(Auto)}{Pr(Bus)} = \frac{\exp(V_{Auto})}{\exp(V_{Bus})} = \exp(V_{Auto} - V_{Bus})
(\#eq:probratioab)
\end{equation}
\begin{equation}
\frac{Pr(Auto)}{Pr(Rail)} = \frac{\exp(V_{Auto})}{\exp(V_{Rail})} = \exp(V_{Auto} - V_{Rail})
(\#eq:probratioar)
\end{equation}
\begin{equation}
\frac{Pr(Bus)}{Pr(Rail)} = \frac{\exp(V_{Bus})}{\exp(V_{Rail})} = \exp(V_{Bus} - V_{Rail})
(\#eq:probratiobr)
\end{equation}
The ratios of probabilities for each pair of alternatives depend only on the
attributes of those alternatives and not on the attributes of the third
alternative and would remain the same regardless of whether that third
alternative is available or not. This formulation can be generalized to any
pair of alternatives by:
\begin{equation}
\frac{Pr(i)}{Pr(k)} = \frac{\exp(V_i)}{\exp(V_k)} = \exp(V_i - V_k)
(\#eq:probratioalts)
\end{equation}
which, as before, is independent of the number or attributes of other alternatives in the choice set.
The IIA property has some important ramifications in the formulation, estimation
and use of multinomial logit models. The independence of irrelevant
alternatives property allows the addition or removal of an alternative from the
choice set without affecting the structure or parameters of the model. The
flexibility of applying the model to cases with different choice sets has a
number of advantages. First, the model can be estimated and applied in cases
where different members of the population (and sample) face different sets of
alternatives. For example, in the case of intercity mode choice, individuals
traveling between some city pairs may not have air service and/or rail service.
Second, this property simplifies the estimation of the parameters in the
multinomial logit model (as will be discussed later). Third, this property is
advantageous when applying a model to the prediction of choice probabilities for
a new alternative.
On the other hand, the IIA property may not properly reflect the behavioral
relationships among groups of alternatives. That is, other alternatives may not
be irrelevant to the ratio of probabilities between a pair of alternatives. In
some cases, this will result in erroneous predictions of choice probabilities.
An extreme example of this problem is the classic “red bus/blue bus paradox.”
### The Red Bus/Blue Bus Paradox {#bus-paradox}
Consider the case of a commuter who has a choice of going to work by auto or
taking a blue bus. Assume that the attributes of the auto and the blue bus are
such that the probability of choosing auto is two-thirds and blue bus is
one-third so the ratio of their choice probabilities is 2:1. Now suppose that a
competing bus operator introduces red bus service (the bus is painted red,
rather than blue) on the same route, operating the same vehicle type, using the
same schedule and serving the same stops as the blue bus service. Thus, the
only difference between the red and blue bus services is the color of the buses.
The most reasonable expectation, in this case, is that the same share of people
will choose auto and bus and that the bus riders will split equally between the
red and blue bus services. That is, the addition of the red bus to the
commuters’ choice set should have no, or very little, effect on the share of
commuters choosing auto since this change does not affect the relative quality
of drive alone and bus. Therefore, we expect choice probabilities following the
initiation of red bus service to be auto, two-thirds; blue bus, one-sixth and
red bus, one-sixth. However, due to the IIA property, the multinomial logit
model will maintain the relative probability of auto and blue bus as 2:1. If we
assume that people are indifferent to color of their transit vehicle, the two
bus services will have the same representative utility and consequently, their
relative probabilities will be 1:1 and the share probabilities for the three
alternatives will be: Pr(Auto) = ½, Pr(Blue Bus) = 1/4, and Pr(Red Bus) = 1/4.
That is, the probability (share) of people choosing auto will decline from
two-thirds to one half as a result of introducing an alternative which is
identical to an existing alternative[^busfreqincr]. The red bus/blue bus
paradox provides an important illustration of the possible consequences of the
IIA property. Although this is an extreme case; the IIA property can be a
problem in other, less extreme cases.
## Example: Prediction with Multinomial Logit Model
We illustrate the application of multinomial logit models with different
specifications in the context of mode choice analysis. Consider a commute trip
by an individual who has three available modes in the choice set: drive alone,
carpool, and bus. The examples in this section illustrate the manner in which
different utility specifications and the estimated parameters associated with
them are used to predict choice probabilities based on characteristics of the
traveler (decision-maker) and attributes of the alternatives. These examples
progress from the simplest models to moderately complex models.
**Example 1 -- Constants Only Model**
The simplest specification of the multinomial logit model is the ‘constants
only’ model, in which the utility of each alternative has a fixed value for all
decision-makers. Typically, the alternative specific constants are considered
to represent the average effect of all factors that influence the choice but are
not included in the utility specification. For example, factors such as
comfort, safety, privacy and reliability may be excluded due to the difficulty
associated with their measurement. In the constants only model, it is
implicitly assumed that the constants reflect the average effects of all the
variables affecting the choice decision, since no variables are included
explicitly in the utility specification. If these constants are 0.0, -1.6 and
-1.8 for drive alone, shared ride and transit, respectively, the probability
calculation is as shown in Table \@ref(tab:constants).
```{r constants, echo = F}
utility_table(
c("Drive Alone" = "0.0",
"Shared Ride" = "-1.60",
"Transit" = "-1.80")
) %>%
kbl(caption = "MNL Properties for Constants Only Model") %>%
kable_styling() %>%
add_header_above(c(" " = 1, "Utility"= 2, " " = 2), align = "center")
```
As expected, Drive Alone has the highest probability, followed by Shared Ride, and TRansit.
**Example 2 -- Including Mode Related Variables - Travel Time and Travel Cost**
Two key attributes that influence choice of mode are travel time and travel
cost. We include these variables in the deterministic component of the utility
function of each mode with the parameter for time (in minutes) equal to -0.045
and for cost (in cents) equal to -0.004 for all three modes, Table 4.9. This
implies that a minute of travel time (or a cent of cost) has the same marginal
disutility regardless of the mode; such variables are referred to as *generic*
variables. The negative signs of the travel time and travel cost coefficients
imply that the utility of a mode and the probability that it will be chosen
decreases as the travel time or travel cost of that mode increases. Positive
coefficients would be inconsistent with our understanding of travel behavior and
therefore any specification which results in a positive sign for travel time or
travel cost should be rejected. Such counter-intuitive results are most likely
due to an incorrect or inadequate model specification; however, it is possible
that the data from any particular sample leads to such counter-intuitive
results.
The inclusion of travel time and travel cost variables induces a change in the
alternative specific constants, to -1.865 for shared ride and -0.650 for
transit, as the effect of excluding these time and cost variables is removed
from the constants. Such changes in alternative specific constants, as a result
of the introduction of new variables or the elimination of included variables
preserve the sample shares[^preservation] and are expected.
To illustrate the application of the multinomial logit model for the above
utility equation, we assume travel time and travel cost values as follows:
| Mode | Travel Time | Travel Cost |
|:------------|:------------|:------------|
| Drive Alone | 25 minutes | $1.75 |
| Shared Ride | 28 minutes | $0.75 |
| TRansit | 55 minutes | $1.25 |
The utilities and probabilities are calculated as shown in Table \@ref(tab:utimecost).
```{r utimecost}
utility_table(
c("Drive Alone" = "-0.045*25-0.004*175",
"Shared Ride" = "-1.865-0.045*28-0.004*75",
"Transit" = "-0.650-0.045*55-0.004*125")
) %>%
kbl(caption = "MNL Probabilities for Time and Cost Model") %>%
kable_styling() %>%
add_header_above(c(" " = 1, "Utility"= 2, " " = 2), align = "center")
```
This specification can be refined further by decomposing travel time into its
two major components: (1) in-vehicle travel time, and (2) out-of-vehicle travel
time. In-vehicle time (IVT) is defined as the time spent inside the vehicle,
and out-of-vehicle time (OVT) is the time not spent inside the vehicle
(including access time, waiting time, and egress time). There is an abundance
of empirical evidence that travelers are much more sensitive to out-of-vehicle
time than to in-vehicle time and therefore a minute of out-of-vehicle time will
generate a higher disutility than a minute of in-vehicle time. This will be
reflected in the modal utilities by a larger negative coefficient on
out-of-vehicle time than on in-vehicle time. Introduction of this refinement
will usually result in a less negative parameter for in vehicle time and a more
negative parameter for out of vehicle time than for total time; say -0.031 and
-0.062, respectively. If the travel times are split as follows:
| Mode | IVT | OVT | Travel Cost |
|:------------|:------------|:------------|:------------|
| Drive Alone | 21 minutes | 4 minutes | $1.75 |
| Shared Ride | 23 minutes | 5 minutes | $0.75 |
| Bus | 25 minutes | 30 minutes | $1.25 |
the new systematic utilities and choice probabilities are as computed in Table
\@ref(tab:uivtovt):
```{r uivtovt}
utility_table(
c("Drive Alone" = "-0.031*21-0.062*4-0.004*175",
"Shared Ride" = "-1.90-0.031*23-0.062*5-0.004*75",
"Transit" = "-0.80-0.031*25-0.062*30-0.004*125")
) %>%
kbl(caption = "MNL Probabilities for Time and Cost Model") %>%
kable_styling() %>%
add_header_above(c(" " = 1, "Utility"= 2, " " = 2), align = "center")
```
**Example 3 -- Including Decision-Maker Related Biases - Income**
The preceding examples do not include any characteristics of the traveler in the
modal utilities. However, we know that choice probabilities of the available
modes also depend on characteristics of the traveler, such as his/her income.
Economic theory and empirical evidence suggests that higher income travelers are
less likely to choose transit than drive alone or carpool. We can incorporate
this behavior in the model by including an alternative specific income variable
in the utility of up to two of the alternatives; in this case, we include income
in the transit alternative with a negative coefficient. That is, everything
else held constant, the utility of transit decreases as the income of the
traveler increases. Consequently, a higher income traveler will have a lower
probability of choosing transit than a lower income traveler. The absence of an
alternative specific parameter for the carpool alternative implies that the
choice of carpool, relative to drive alone, is unaffected by a traveler’s
income. The alternative specific constant of the transit utility changes
substantially from the preceding example as it no longer reflects the average
effect of excluding income from the transit utility specification. The
calculation of utilities and probabilities for this model for a person from a
household with $50,000 annual income is shown in \@ref(tab:uivtovtcost).
```{r uivtovtcost}
utility_table(
c("Drive Alone" = " -0.031*21-0.062*4-0.004*175",
"Shared Ride" = "-1.90-0.031*23-0.062*5-0.004*75",
"Transit" = "-0.50-0.031*25-0.062*30-0.004*125-0.0087*50")
) %>%
kbl(caption = "MNL Probabilities for In and Out of Vehicle Time, Cost and Income Model") %>%
kable_styling() %>%
add_header_above(c(" " = 1, "Utility"= 2, " " = 2), align = "center")
```
The probability of choosing transit is smaller for this traveler than would have
been predicted using the model reported in the preceding example. This model
will give decreasing transit probabilities for higher income travelers and
increasing transit probabilities for lower income travelers. That is, the lower
the traveler’s income, the greater his/her probability of choosing the least
expensive mode of travel (transit), an intuitive and reasonable result.
**Example 4 -- Interaction of Mode Attributes and Decision-Maker Related Biases**
An alternative method of including income in the utility specification is to use
income as a deflator of cost by forming a variable by dividing cost by income.
This formulation reflects the rationale that cost becomes a less important
factor in the choice of a travel mode as the income of the traveler increases.
The revised utility functions and calculations are shown in Table \@ref(tab:uinteraction)
using the values for the modal attributes and income as used in preceding
example.
```{r uinteraction}
utility_table(
c("Drive Alone" = " -0.031*21-0.062*4-0.153*(175/50) ",
"Shared Ride" = " -1.90-0.031*23-0.062*5-0.153*(75/50) ",
"Transit" = "-0.45-0.031*25-0.062*30-0.153*(125/50)")
) %>%
kbl(caption = "MNL Probabilities for In and Out of Vehicle Time, Cost/Income Model") %>%
kable_styling() %>%
add_header_above(c(" " = 1, "Utility"= 2, " " = 2), align = "center")
```
This specification of income in the utility function also results in lower
income travelers predicted to have higher probability of choosing transit; it
also suggests that such travelers will increase their probability of choosing
carpool, the least expensive mode. The reader should compute the probabilities
for different income values and verify the response pattern.
## Measures of Response to Changes in Attributes of Alternatives
Choice probabilities in logit models are a function of the values of the
attributes that define the utility of the alternatives; therefore, it is useful
to know the extent to which the probabilities change in response to changes in
the value of those attributes. For example, in a traveler’s mode choice
decision, an important question is to what extent the probability of choosing a
mode (rail, for example) will decrease/increase, if the fares of that mode are
increased by a certain amount. Similarly, a transit agency may want to know the
gain in ridership that is likely to occur in response to service improvements
(increased frequency). This section describes various aspects of understanding
and quantifying the response to changes in attributes of alternatives.
### Derivatives of Choice Probabilities
One measure for evaluating the response to changes is to calculate the
*derivatives* of the choice probabilities of each alternative with respect to
the variable in question. Usually, one is concerned about the change in
probability of an alternative, $P_i$, with respect to the change in attributes
of that alternative $X_i$. This measure, the *direct derivative*, is computed
by differentiating $P_i$ with respect to $X_{ik}$, the $k^{th}$ attribute of
alternative $i$. The mathematical expression for the *direct derivative* of
$P_i$ with respect to $X_{ik}$ is:
\begin{equation}
\displaystyle \frac{\partial P_{i}}{\partial X_{ik}} = \displaystyle \frac{\partial V_{i}}{\partial X_{ik}} \times (P_{i}) \times (1-P_{i})
(\#eq:expressionfordirectderivative)
\end{equation}
where $V_i$ is the utility of the alternative.[^derivationTrain]
Typically the utility function is specified to be linear in parameters; that is:
\begin{equation}
V_{i} = \beta_{0} + \beta_{1}X_{1i} + \beta_{2}X_{2i} + \cdots + \beta_{k}X_{ki} + \cdots + \beta_{K}X_{Ki}
(\#eq:linearutility)
\end{equation}
In this case, the expression for the *direct derivative* of $P_i$ with respect to $X_{ik}$ reduces to:
\begin{equation}
\displaystyle \frac{\partial P_{i}}{\partial X_{ki}} = \beta_{k} \times (P_{i}) \times (1-P_{i})
(\#eq:reducesexpressionfordirectderivative)
\end{equation}
where $\beta_{k}$ is the coefficient of attribute $k$.
The value of the derivative is largest at $P_{i} = \frac{1}{2}$ and becomes smaller as $P_{i}$ approaches zero or one. This implies that the magnitude of the response to a change in an attribute will be greatest when the choice probability for the alternative under consideration is 0.5 and this response diminishes as the probability approaches zero or one. The direct derivative is simply the slope of the logit model probability curve illustrated in Figure 4.4 and that its mathematical properties are consistent with the qualitative discussion of the S-shape of the logit probability curve in section 4.1.1. The sign of the derivative is the same as the sign of the parameter describing the impact of $X_{ik}$ in the utility of alternative $i$. Thus, an increase in $X_{ik}$ will increase (decrease) $P_{i}$ if $\beta_{ik}$ is positive (negative).
Often it is important to understand how the choice probability of other alternatives changes in response to a given change in the attribute level of the action alternative. This measure, termed the $cross$ $derivative$, is obtained by computing the derivative of the choice probability of an alternative,$P_{j}$ , with respect to the attribute of the changed alternative, $X_{ik}$. This $cross$ $derivative$ for linear utility functions is:[^derivationTrain2]
\begin{equation}
\displaystyle \frac{\partial P_{i}}{\partial X_{ki}} = \beta_{k} \times (P_{i}) \times (1-P_{j})
(\#eq:fortyfive)
\end{equation}
Often it is important to understand how the choice probability of other
alternatives changes in response to a given change in the attribute level of the
action alternative. This measure, termed the *cross derivative*, is obtained by
computing the derivative of the choice probability of an alternative, $P_j$,
with respect to the attribute of the changed alternative, $X_{ik}$. This *cross
derivative* for linear utility functions is:
\begin{equation}
\displaystyle \frac{\partial P_{j}}{\partial X_{ik}} = \beta_{k} \times (P_{i}) \times (P_{j}) \forall_{i} \ne j
(\#eq:linearutilityforcrossderivative)
\end{equation}
where $\beta_{jk}$ is the coefficient of the $k^{th}$ attribute of alternative $j$,
$P_{i}$ is the probability of alternative $i$, and
$P_{j}$ is the probability of alternative $j$.
In this case, the sign of the derivative is opposite to the sign of the parameter describing the impact of $X_{ik}$ on the utility of alternative $i$. Thus an increase in $X_{ik}$ will decrease (increase) the probability of choosing alternative, $P_{i}$, if the parameter $\beta_{k}$ is positive (negative).
In this case, the sign of the derivative is opposite to the sign of the
parameter describing the impact of $X_{ik}$ on the utility of alternative $i$.
Thus an increase in $X_{ik}$ will decrease (increase) the probability of
choosing alternative, $P_j$, if the parameter $\beta_k$ is positive (negative).
It is useful to recognize that the sum of the derivatives over all the
alternatives must be equal to zero. That is,
\begin{align}
\sum_{\forall_{j}}\frac{\partial P_{j}}{\partial X_{ik}}&=
\frac{\partial P_{j}}{\partial X_{ik}} + \sum_{\forall_{j \ne i}}\frac{\partial P_{j}}{\partial X_{ik}}\\
&= \beta_{k}P_{i}(1-P_{i}) - \sum_{\forall_{j \ne i}}\beta_{k}P_{i}P_{j}\\
&= \beta_{k}P_{i}(1-P_{i}) - \beta_{k}P_{i}\sum_{\forall_{j \ne i}}P_{j}\\
&= \beta_{k}P_{i}(1-P_{i}) - \beta_{k}P_{i}(1-P_{i})\\
&= 0
(\#eq:sumofderivativesequaltozero)
\end{align}
This is as expected. Since the sum of all probabilities is fixed at one, the
sum of the derivatives of the probability due to a change in any attribute of
any alternative must be equal to zero.
### Elasticities of Choice Probabilities