03-trends.Rmd

# Trends

## Least squares estimation for linear regression trend

We begin by taking the partial derivatives with respect to $\beta_0$.

\[
\frac{\partial}{\partial{\beta_0}} \mathcal{Q}(\beta_0, \beta_1) =
  -2\sum_{t=1}^n (Y_t - \beta_0 - \beta_1 t)
\]

We set it to $0$ and from this retrieve

\begin{align*}
-2\sum_{t=1}^n (Y_t - \beta_0 - \beta_1 t) = & 0 \implies \\
\sum_{t=1}^n Y_t - n\beta_0 - \beta_1 \sum_{t=1}^n t = & 0 \implies \\
\beta_0 = \frac{\sum_{t=1}^n Y_t - \beta_1 \sum_{t=1}^n t}{n} = &
  \bar{Y} - \beta_1 \bar{t}
\end{align*}

Next, we take the partial derivative with respect to $\beta_1$;

\[
\frac{\partial}{\partial{\beta_1}} \mathcal{Q}(\beta_0, \beta_1) =
  -2\sum_{t=1}^n t(Y_t - \beta_0 - \beta_1 t)
\]

Setting this to $0$ as well, multiplying both sides with $-1/2$ and rearranging
results in

\begin{align*}
-2\sum_{t=1}^n t (Y_t - \beta_0 - \beta_1 t) = & 0 \implies \\
\beta_1 \sum_{t=1}^n t^2 = & \sum_{t=1}^n Y_t t - \beta_0 \sum_{t=1}^n t
\end{align*}

Then, substituting with the result gained previously for $\beta_0$, we get

\begin{align*}
\beta_1 \sum_{t=1}^n t^2 = & \sum_{t=1}^n Y_t t - 
 \left( \frac{\sum_{t=1}^n Y_t}{n} - \beta_1 \frac{\sum_{t=1}^n}{n} \right)
 \sum_{t=1}^n t \iff \\
\beta_1 \left( \sum_{t=1}^n t^2 - \frac{(\sum_{t=1}^n t)^2}{n} \right) = & 
  \sum_{t=1}^n Y_t t - \frac{\sum_{t=1}^n Y_t \sum_{t=1}^n t}{n} \iff \\
\beta_1 = & \frac{n\sum_{t=1}^n Y_tt - \sum_{t=1}^nY_t \sum_{t=1}^n t}{n \sum_{t=1}^n t^2 - \left( \sum_{t=1}^n t \right)^2} = 
  \frac{\sum_{t=1}^n (Y_t - \bar{Y})(t-\bar{t})}{\sum_{t=1}^n (t-\bar{t})^2} \quad \square
\end{align*}


## Variance of mean estimator

\[
  \bar{Y} = \frac{1}{n}\sum_{t=1}^n Y_t = \frac{1}{n} \sum_{t=1}^n(\mu + e_t - e_{t-1}) = 
    \mu + \frac{1}{n} \sum_{t=1}^n (e_t - e_{t-1}) = \mu + \frac{1}{n}(e_n - e_0)
\]

\[
  \text{Var}[\bar{Y}] = \text{Var}[\mu + \frac{1}{n}(e_n - e_0)] =
    \frac{1}{n^2}(\sigma_e^2 + \sigma_e^2) = \frac{2\sigma_e^2}{n^2}
\]

It is uncommon for the sample size to have such a large impact on the variance
estimator for the sample mean.

Setting $Y_t = \mu + e_t$ instead gives

\[
  \bar{Y} = \frac{1}{n}\sum_{t=1}^n Y_t = \frac{1}{n} \sum_{t=1}^n(\mu + e_t) = 
    \mu + \frac{1}{n} \sum_{t=1}^n e_t
\]

\[
  \text{Var}[\bar{Y}] = \text{Var} \left[ \mu + \frac{1}{n} \sum_{t=1}^n e_t \right] =
    0 + \frac{1}{n^2} \times n \sigma_e^2 = \frac{\sigma_e^2}{n}.
\]

## Variance of mean estimator #2

\[
  \bar{Y} = \frac{1}{n} \sum_{t=1}^n(\mu + e_t + e_{t-1}) = 
    \mu + \frac{1}{n} \sum_{t=1}^n (e_t + e_{t-1}) = \mu + \frac{1}{n} \left( e_n + e_0 + 2 \sum_{t=1}^{n-1} t \right)
\]

\[
  \text{Var}[\bar{Y}] = \frac{1}{n^2}(\sigma_e^2 + \sigma_e^2 + 4(n-1) \sigma_e^2 ) = \frac{1}{n^2}2(2n-1)\sigma_e^2
\]

Setting $Y_t = \mu + e_t$ instead gives the result from 3.2. We note that for
large $n$ the variance if approximately four times larger with
$Y_t = \mu + e_t + e_{t-1}$.

## Hours

### a {-}

```{r, fig.cap="Monthly values of the average hours worked per week in the U.S. manufacturing sector."}
library(TSA)
data("hours")
xyplot(hours)
```

In Figure 1 we see a steep incline between 83 and 84. There also appears to be a seasonal
trend with generally longer work hours later in the year apart from the summer;
1984, however, does not exhibit as clear a pattern.

### b {-}

```{r, fig.cap="Monthly values of average hours worked per week with superposed initials of months."}
months <- c("J", "A", "S", "O", "N", "D", "J", "F", "M", "A", "M", "J")

xyplot(hours, panel = function(x, y, ...) {
  panel.xyplot(x, y, ...)
  panel.text(x = x, y = y, labels = months)
})
```

Here, in Figure 2, our interpretation is largely the same. It is clear that
December stands out as the month with the longest weekly work hours whilst
February and January are low-points, demonstrating a clear trend.

## Wages

### a {-}

```{r, fig.cap="Monthly average hourly wages for workers in the U.S. apparel and textile industry."}
data("wages")
xyplot(wages, panel = function(x, y, ...) {
  panel.xyplot(x, y, ...)
  panel.text(x, y, labels = months)
})
```

There is a positive trend with seasonality: August is a low-point for wages. 
Generally, there seems to be larger increases in the fall.

### b {-}

```{r}
wages_fit1 <- lm(wages ~ time(wages))
summary(wages_fit1)
wages_rst <- rstudent(wages_fit1)
```

### c {-}

```{r wages-resid, fig.cap="Residual plot"}
xyplot(wages_rst ~ time(wages_rst), type = "l",
       xlab = "Time", ylab = "Studentized residuals")
```

We still seem to have autocorrelation related to the time and not white noise.

### d {-}

```{r}
wages_fit2 <- lm(wages ~ time(wages) + I(time(wages)^2))
summary(wages_fit2)
wages_rst2 <- rstudent(wages_fit2)
```

### e {-}

```{r wages_quad_resid, fig.cap="Residual plot for our quadratic model."}
xyplot(wages_rst2 ~ time(wages_rst), type = "l",
       xlab = "Time", ylab = "Studentized residuals")
```

This looks more like random noise but there is still clear autocorrelation 
between the fitted residuals that we have yet to capture in our model.

## Beer sales

### a {-}

```{r, fig.cap = "Monthly U.S. beer sales."}
data(beersales)
xyplot(beersales)
```

Clear seasonal trends. There is an initial positive trend from 1975 to around
1981 that then levels out.

### b {-}

```{r, fig.cap = "Monthly U.S. beer sales annotated with the months' initials."}
months <- c("J", "F", "M", "A", "M", "J", "J", "A", "S", "O", "N", "D")

xyplot(beersales,
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.text(x, y, labels = months)
       })
```

It is now evident that the peaks are in the warm months and the slump in the
winter and fall months. December is a particular low point, while May, June,
and July seem to be the high points.

### c {-}

```{r}
beer_fit1 <- lm(beersales ~ season(beersales))
pander(summary(beer_fit1))
```

All comparisons are made against january. The model helpfully explains 
approximately 0.71 of the variance and is statistically significant. Most 
of the factors are significant (mostly the winter months as expected).

### d {-}

```{r rst-beer, fig.cap="Beer sales residual plot."}
xyplot(rstudent(beer_fit1) ~ time(beersales), type = "l",
       xlab = "Time", ylab = "Studentized residuals",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.xyplot(x, y, pch = as.vector(season(beersales)), col = 1)
       })
```

Looking at the residuals in \@ref(fig:rst-beer) We don't have a good fit to our
data; in particular, wee're not capturing the long-term trend.

### e {-}

```{r}
beer_fit2 <- lm(beersales ~ season(beersales) + time(beersales) +
                  I(time(beersales) ^ 2))
pander(summary(beer_fit2))
```

This model fits the data better, explaining roughly 0.91 of the variance.

### f {-}

```{r rst-beer2, fig.cap="Beer sales residual plot from the quadratic fit."}
xyplot(rstudent(beer_fit2) ~ time(beersales), type = "l",
       xlab = "Time", yla = "Studentized residuals",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.xyplot(x, y, pch = as.vector(season(beersales)), col = 1)
       })
```

Many of the values are still not being predicted successfully but at least
we're able to model the long term trend better.

## Winnebago

### a {-}

```{r winnebago, fig.cap = "Monthly unit sales of recreational vehicles from Winnebago."}
data(winnebago)
xyplot(winnebago)
```

### b {-}

```{r}
winn_fit1 <- lm(winnebago ~ time(winnebago))
pander(summary(winn_fit1))
```

The model is significant and explains 0.69 of the variance.

```{r winnebago-lm-res, fig.cap = "Residuals for the linear fit for the winnebago data."}
xyplot(rstudent(winn_fit1) ~ time(winnebago), type = "l",
       xlab = "Time", ylab = "Studentized residuals")
```

The fit is poor (Figure \@ref(fig:winnebago-lm-res). It is not random and it is
clear that we're making worse predictions for later yers.

### c {-}

To produce a better fit, we transform the outcome with the natural logarithm.

```{r}
winn_fit_log <- lm(log(winnebago) ~ time(winnebago))
pander(summary(winn_fit_log))
```

The model is better, explaining almost 0.8 of the variance.

### d {-}

```{r winnebago-log-res, fig.cap = "Residual plot after natural log transformation."}
xyplot(rstudent(winn_fit_log) ~ time(winnebago), type = "l",
       xlab = "Time", ylab = "Studentized residuals",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.xyplot(x, y, pch = as.vector(season(winnebago)), col = 1)
       })
```

This looks more like random noise (Figure \@ref(fig:winnebago-log-res). Values still
cling together somewhat but it
is certainly better than the linear model. We're still systematically
overpredictinig the values for some months, however.

### e {-}

```{r}
winn_fit_seasonal <- lm(log(winnebago) ~ season(winnebago) + time(winnebago))
pander(summary(winn_fit_seasonal))
```

The fit is improved further. We have a R^2^ of 0.89 and significance for most
of our seasonal means as well as the time trend.

### f {-}

```{r winnebago-seasonal-res}
xyplot(rstudent(winn_fit_seasonal) ~ time(winnebago), type = "l",
       xlab = "Time", ylab = "Studentized residuals",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.xyplot(x, y, col = 1, pch = as.vector(season(winnebago)))
       })
```

This is acceptable even if our residuals are quite large for some of the values,
notably at the start of the series.

## Retail

### a {-}

```{r retail, fig.cap="Total retail sales in the U.K. in billions pounds."}
data(retail)
xyplot(retail, panel = function(x, y, ...) {
  panel.xyplot(x, y, ...)
  panel.xyplot(x, y, pch = as.vector(season(retail)), col = 1)
})
```

Plotting the retail sales trend there seems to be a long-term linear trend
as well as heavy seasonality in tht December -- and to slighter extent also
November and October -- exhibit regular surges in retail sales.

### b {-}

```{r}
retail_lm <- lm(retail ~ season(retail) + time(retail))
pander(summary(retail_lm))
```

This *seems* like an effective model, explaining 0.98 of the variance in
retail sales.

### c {-}

```{r retail-res, fig.cap = "Studentized residuals for our seasonality + linear model of retail sales."}
xyplot(rstudent(retail_lm) ~ time(retail), type = "l",
       xlab = "Time", ylab = "Studentized residuals",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.xyplot(x, y, pch = as.vector(season(retail)), col = 1)
       })
```

The residual plot (Figure \@ref(fig:retail-res)) tells a different story:
we're underpredicting values for early period and overpredicting values for the
later years -- however, this should be an easy fix.

## Prescriptions

### a {-}

```{r prescrip, fig.cap="Monthly U.S. prescription costs."}
data(prescrip)
xyplot(prescrip, ylab = "Prescription costs",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.xyplot(x, y, pch = as.vector(season(prescrip)), col = 1)
       })
```

Figure \@ref(fig:prescrip) shows a clear, smooth, and cyclical seasonal trend.
Values are genereally higher for the summer months and there seems to be an
exponential increase long-term.

### b {-}

```{r prescrip-diff, fig.cap="Percentage changes from month-to-month in prescription costs."}
pchange <- diff(prescrip) / prescrip
xyplot(pchange ~ time(prescrip), type = "l",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.xyplot(x, y, pch = as.vector(season(pchange)), col = 1)
       })
```

The monthly percentage difference series looks rather stationary.

### c {-}

```{r}
pres_cos <- lm(pchange ~ harmonic(pchange))
pander(summary(pres_cos))
```

We explain 0.31 of the variance. The model is
significant though.

### d {-}

```{r cos-resid, fig.cap="Residuals for our cosine model."}
xyplot(rstudent(pres_cos) ~ time(prescrip), type = "l")
```

The residual plot in Figure \@ref(fig:cos-resid) looks rather random.

## Hours (revisited)

### a {-}

```{r}
data(hours)
hours_quad <- lm(hours ~ time(hours) + I(time(hours)^2))
pander(summary(hours_quad))
```

Both the linear and quadratic trends are significant. We explain 59% of the
variance. 

### b {-}

```{r hours-quad-res, fig.cap = "Studentized residuals for our quadratic fit for the hours series."}
xyplot(rstudent(hours_quad) ~ seq_along(hours), type = "l",
       xlab = "Index", ylab = "Studentized residuals",
       panel = function(x, y, ...) {
         panel.xyplot(x, y, ...)
         panel.xyplot(x, y, pch = as.vector(season(hours)), col = 1)
       })
```

We're clearly missing the seasonal trend here. February is underpredicted
and December overpredicted, for instance.

### c {-}

We run the *Runs test* to check for dependence between our observations.

```{r}
runs(rstudent(hours_quad))
```

We have more runs than expected and a significant test at $p = 0.00012$,
confirming out suspicions from (b).

### d {-}

```{r hours-acf, fig.cap = "Autocorrelation plot for the hours dataset."}
lat_acf(rstudent(hours_quad))
```

Figure \@ref(fig:hours-acf) makes the autocorrelation clear: for the first 5--6
values there is positive correlation, which then seems to reverse for the later
values. Some of these are significant.

### e {-}

```{r normality-tests, fig.cap="Normality plots", fig.subcap=c("Q-Q-plot", "Densityplot"), fig.show="hold", fig.width=3, fig.height=3}
qqmath(rstudent(hours_quad), asp = 1,
       xlab = "Theoretical quantities", ylab = "Studentized residuals",
       panel = function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       })
densityplot(rstudent(hours_quad), xlab = "Studentized residuals", 
            ylab = "Density")
```

The distribution is somewhat light-tailed but otherwise look quite normal.

## Wages (revisisted)

### a {-}

```{r}
# data(wages)
wages_quad <- lm(wages ~ time(wages) + I(time(wages)^2))
pander(summary(wages_quad))
```

This quadratic fit explains much of the variance (0.99).

### b {-}

```{r}
runs(rstudent(wages_quad))
```

Juding from the output of the Runs test, however, there is evidence
to suggest that we have dependence among variables.

### c {-}

```{r wages_acf, fig.cap="Autocorrelation for the quadratic fit on the wages time series."}
lat_acf(rstudent(wages_quad))
```

However, the autocorrelation plot (Figure \@ref(fig:wages_acf)) makes clear
that we are dealing with a lot of auttocorrelation and this is obviously
because we haven't accounted for the seasonal trend in the series.

### d {-}

Let's look at some normality plots as well.

```{r wages-norm, fig.height=3, fig.cap="Normality plots for the wages data with a quadratic fit."}
figa <- 
  qqmath(rstudent(wages_quad), xlab = "Theoretical quantities",
       asp = 1,
       ylab = "Studentized residuals",
       panel = function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       })

figb <- densityplot(rstudent(wages_quad), xlab = "Studentized residuals")
gridExtra::grid.arrange(figa, figb, ncol = 2)
```

The normality plots (Figure \@ref(fig:wages-norm)) testifies that the 
distribution of the residuals is somewhat heavy-tailed and ever-so-slightly
left-skewed.

## Beersales (revisited)

### a {-}

First, we just collect the residuals.

```{r}
#data(beersales)
beer_quad_seasonal <- lm(beersales ~ time(beersales) + I(time(beersales)^2) +
                           season(beersales))
beer_resid <- rstudent(beer_quad_seasonal)
```

### b {-}

Next, we perform a Runs test.

```{r}
runs(beer_resid)
```

The test is significant ($p = `r runs(beer_resid)$pvalue`$).

### c {-}

```{r beer-acf, fig.cap="Autocorrelation for the beer sales model."}
lat_acf(beer_resid)
```

Correlations are significant for several of the lags, leading us to
question independence.

### d {-}

```{r beer-norm, fig.cap="Normality plots for the beersales series after a linear, quadratic and seasonal fit."}
figa <- 
  qqmath(beer_resid, xlab = "Theoretical quantities",
       asp = 1,
       ylab = "Studentized residuals",
       panel = function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       })

figb <- densityplot(beer_resid, xlab = "Studentized residuals")
gridExtra::grid.arrange(figa, figb, ncol = 2)
```

## Winnebago (revisited)

### a {-}

```{r}
winn_resid <- rstudent(winn_fit_seasonal)
```

### b {-}

```{r}
runs(winn_resid)
```

The Runs test is signficant. We have fewer runs than expected.

### c {-}

```{r winn-acf, fig.cap="Autcorrelation for the winnebago model."}
lat_acf(winn_resid)
```

There is evidence of dependence which we have so far not taken into account in
the model

### d {-}

```{r, winn-norm, fig.height=3, fig.cap="Normality plots for the winnebago series fit with a logarithimg and seasonal fit."}
figa <- 
  qqmath(winn_resid, xlab = "Theoretical quantities",
       asp = 1,
       ylab = "Studentized residuals",
       panel = function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       })

figb <- densityplot(winn_resid, xlab = "Studentized residuals")
gridExtra::grid.arrange(figa, figb, ncol = 2)
```

There is left skew, a large outlier, but otherwise approximate normality.

## Retail (revisited)

### a {-}

```{r}
retail_lm_seasonal <- lm(retail ~ time(retail) + season(retail))
retail_resid <- rstudent(retail_lm_seasonal)
```

### b {-}

```{r}
runs(retail_resid)
```

The Runs test is signficant and we have fewer runs than expected.

### c {-}

```{r retail-acf, fig.cap="Autcorrelation for the retail model."}
lat_acf(retail_resid)
```

There is evidence of dependence which we have so far not taken into account in
the model. All of the lags are positive and several are significant too.

### d {-}

```{r, retail-norm, fig.height=3, fig.cap="Normality plots for the winnebago series fit with a logarithimg and seasonal fit."}
figa <- 
  qqmath(retail_resid, xlab = "Theoretical quantities",
       asp = 1,
       ylab = "Studentized residuals",
       panel = function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       })

figb <- densityplot(retail_resid, xlab = "Studentized residuals")
gridExtra::grid.arrange(figa, figb, ncol = 2)
```

The distributin of the residuals is considerably light-tailed.

## Prescriptions (revisited)

### a {-}

```{r}
pres_resid <- rstudent(pres_cos)
```

### b {-}

```{r}
runs(pres_resid)
```

The Runs test is signficant and we have fewer runs than expected.

### c {-}

```{r pres-acf, fig.cap="Autcorrelation for the prescriptions model."}
lat_acf(pres_resid)
```

Some of the lags have correlations that surpass statistical significane. There
may be some alternating trends that we have not taken into account.

### d {-}

```{r, pres-norm, fig.height=3, fig.cap="Normality plots for the prescitions series fit with a cosine model."}
figa <- 
  qqmath(pres_resid, xlab = "Theoretical quantities",
       asp = 1,
       ylab = "Studentized residuals",
       panel = function(x, ...) {
         panel.qqmathline(x, ...)
         panel.qqmath(x, ...)
       })

figb <- densityplot(pres_resid, xlab = "Studentized residuals")
gridExtra::grid.arrange(figa, figb, ncol = 2)
```

The distribution of the residuals is somewhat heavy-tailed and left-skewed.

## Variance estimator for sample mean

### a {-}

We have the variance estimator

$$
\begin{align}
  \text{Var}[\bar{Y}] = & \frac{\gamma_0}{n} \left( 1 + 2\sum_{k=1}^{n-1}\left(1 - \frac{k}{n}\right)\phi^k \right) \\
                      = & \frac{\gamma_0}{n} \left( 1 + 2\sum_{k=0}^{n-1}\left(1 - \frac{k}{n}\right)\phi^k -
                          2\sum_{k=0}\left(1 - \frac{k}{n}\right)\phi^k \right) \\
                      = & \frac{\gamma_0}{n} \left( 1 + 2\sum_{k=0}^{n-1}\left(1 - \frac{k}{n}\right)\phi^k - 2 \right) \\
                      = & \frac{\gamma_0}{n} \left( -1 + 2\sum_{k=0}^{n-1}\phi^k - \frac{2}{n}\sum_{k=0}^{n-1}k\phi^k \right) \\
                      = & \frac{\gamma_0}{n} \left( -1 + 2 \frac{1-\phi^n}{1-\phi} - \frac{2\phi}{n} \sum_{k=0}^{n-1}k\phi^{k-1} \right) \\
                      = & \frac{\gamma_0}{n} \left( -1 + 2 \frac{1-\phi^n}{1-\phi} - \frac{2\phi}{n} \frac{\partial}{\partial{\phi}} \sum_{k=0}^{n-1}\phi^k \right) \\
                      = & \frac{\gamma_0}{n} \left( -1 + 2 \frac{1-\phi^n}{1-\phi} - \frac{2\phi}{n} \frac{(1-\phi)(-n\phi^{n-1}) - (1-\phi^n)(-1)}{(1-\phi)^2} \right) \\
                      = & \frac{\gamma_0}{n} \left( -1 + 2 \frac{1-\phi^n}{1-\phi} - \frac{2\phi}{n} \frac{(1-\phi)(-n\phi^{n-1}) - (1-\phi^n)(-1)}{(1-\phi)^2} \right) \\
                      = & \frac{\gamma_0}{n} \left( -1 + 2 \frac{1-\phi^n}{1-\phi} - \frac{2\phi}{n} \frac{1-\phi^n}{(1-\phi)^2} + \frac{2\phi^n}{1-\phi} \right) \\
                      = & \frac{\gamma_0}{n} \left( \frac{2-2\phi^n+2\phi^n-1+\phi}{1-\phi} - \frac{2\phi}{n} \frac{1-\phi^n}{(1-\phi)^2} \right) \\
                      = & \frac{\gamma_0}{n} \left( \frac{1+\phi}{1-\phi} - \frac{2\phi}{n} \frac{1-\phi^n}{(1-\phi)^2} \right) \quad \square
\end{align}
$$

### b {-}

$$
\lim_{n \rightarrow \infty}\text{Var}[\bar{Y}] = \frac{\gamma_0}{n}\left( \frac{1+\phi}{1-\phi} - 0 \right) = \frac{\gamma_0}{n}\left( \frac{1+\phi}{1-\phi}\right)
$$

since $\phi \in [-1,1]$.

### c {-}

```{r varbary, fig.cap="Variance estimation for different values of $\\phi$."}
phi <- seq(-1, 1, 0.01)
var_ybar <- (1 + phi) / (1 - phi)

xyplot(var_ybar ~ phi, ylab = expression(Var(bar(Y))), xlab = expression(phi),
       type = "l")
```

Plotting $\text{Var}[\bar{Y}]$ for values of $\phi$ in $[-1, 1]$. Shows that
variance increases exponentially as $\phi$ approaches 1, in which case
our estimates of $\bar{Y}$ become increasingly uncertain.

## Equation 3.2.6

$$
\begin{align}
  \text{Var}[\bar{Y}] = & \frac{\gamma_0}{n} \sum_{k= -\infty}^\infty \rho_k \quad \text{when} \quad \rho_k = \phi^{|k|} \implies \\
                      = & \frac{\gamma_0}{n} \sum_{k= -\infty}^\infty \phi^{|k|} \\
                      = & \frac{\gamma_0}{n} \left( \sum_{k = 0}^\infty \phi^k + \sum_{0}^\infty \phi^{-k} \right) \\
                      = & \frac{\gamma_0}{n} \left( \frac{1}{1-\phi} - \frac{1}{1-\frac{1}{\phi}} \right) \\
                      = & \frac{\gamma_0}{n} \frac{1+\phi}{1-\phi} \tag*{$\square$}
\end{align}
$$

## Equation 3.2.7

$$
\begin{gather}
  \text{Var}[\bar{Y}] = \frac{1}{n^2} \text{Var}\left[ \sum_{i=1}^n Y_i \right] = \text{Var}\left[ \sum_{i=1}^n \sum_{j=1}^i e_j \right] = \\
    \frac{1}{n^2}\text{Var}[e_1 + 2e_2 + 3e_3 + \dots + ne_n] = \frac{\sigma_e^2}{n}\sum_{k=1}^n k^2 = \\
    \frac{\sigma_e^2}{n} \frac{n(n+1)(2n + 1)}{6} = \sigma_e^2 \frac{(n+1)(2n + 1)}{6} \tag*{$\square$}
\end{gather}
$$