gradu-yde.tex

% STEP 1: Choose oneside or twoside. Use the 'draft' option a lot when writing.
\documentclass[english, twoside]{HYgradu}

\usepackage[utf8]{inputenc} % For UTF8 support. Use UTF8 when saving your file.
%\usepackage{lmodern} % Font package
\usepackage[T1]{fontenc} % better font package
\usepackage{textcomp}
\usepackage[pdftex]{color, graphicx} % For pdf output and jpg/png graphics
\usepackage[pdftex, plainpages=false]{hyperref} % For hyperlinks and pdf metadata
\usepackage{fancyhdr} % For nicer page headers
%\usepackage{tikz} % For making vector graphics (hard to learn but powerful)
%\usepackage{wrapfig} % For nice text-wrapping figures (use at own discretion)
\usepackage{amsmath, amssymb} % For better math
\usepackage[sort,colon]{natbib} % For bibliography
\usepackage[bf]{caption} % For more control over figure captions
\usepackage{marginnote} % margin notes for myself
%\usepackage{svg} % vector graphics
\usepackage{graphicx}
\graphicspath{{kuvat/tex/}}
\usepackage{wasysym}
\usepackage{tabularx}
\usepackage{adjustbox}
\usepackage{makecell}
\usepackage{rotating}
\usepackage[shortcuts]{extdash}
\usepackage{physics}
\usepackage[super]{nth}
\usepackage{gensymb}
\usepackage{layouts}


% titlesec and fix for no numbering on titlesec 2.10.2
\usepackage{titlesec}
\usepackage{etoolbox}
\makeatletter
\patchcmd{\ttlh@hang}{\parindent\z@}{\parindent\z@\leavevmode}{}{}
\patchcmd{\ttlh@hang}{\noindent}{}{}{}
\makeatother

\fussy % Probably not needed but you never know...

\renewcommand{\topfraction}{.75} % less single-float pages
\renewcommand{\floatpagefraction}{.75} % less single-float pages

\newcommand{\matr}[1]{\mathbf{#1}}


\hypersetup{
    bookmarks=true,         % show bookmarks bar first?
    unicode=true,           % to show non-Latin characters in Acrobat’s bookmarks
    pdftoolbar=true,        % show Acrobat’s toolbar?
    pdfmenubar=true,        % show Acrobat’s menu?
    pdffitwindow=false,     % window fit to page when opened
    pdfstartview={FitH},    % fits the width of the page to the window
    pdftitle={Statistical Estimation of the Local Group Mass},            % title
    pdfauthor={Anni Järvenpää},           % author
%    pdfsubject={},          % subject of the document
%    pdfcreator={},          % creator of the document
    pdfproducer={pdfLaTeX}, % producer of the document
%    pdfkeywords={something} {something else}, % list of keywords for
%    pdfnewwindow=true,      % links in new window
    colorlinks=true,        % false: boxed links; true: colored links
    linkcolor=black,        % color of internal links
    citecolor=black,        % color of links to bibliography
    filecolor=black,      % color of file links
    urlcolor=black           % color of external links
}


\title{Statistical Estimation of the Local Group Mass}
\author{Anni Aleksandra Järvenpää}
\date{\today}
\level{Master's thesis}
\faculty{Faculty of Science}
\department{Department of Physics}
\address{PL 64 (Gustaf Hällströmin katu 2)\\00014 University of Helsinki}
\subject{Astronomy}
\prof{Dr. Till Sawala}{Professor Peter Johansson}
\censors{Professor Peter Johansson}{Dr. Till Sawala}{}
\depositeplace{}
\additionalinformation{}
\numberofpagesinformation{\pageref*{tab:PCs}\ pages}
\classification{}
\keywords{galaxies: Local Group - galaxies: haloes - methods: statistical - methods: numerical - dark matter}
\quoting{``Each one of you can change the world, for you are made of star stuff, and you are connected to the universe.'' \\---Vera Rubin}


\begin{document}

\maketitle
\onehalfspacing

\begin{abstract}
During the nearly a century the approximate structure of the Local Group has been known, many methods for estimating the masses of both the individual galaxies within it and the group as a whole have been constructed. Still, to this day, estimates from different credible sources vary by a factor of 2--3. In this master's thesis I construct a new model for estimating the combined mass of the Milky Way and Andromeda galaxies based on the kinematics of the galaxy pair and properties of the Hubble flow surrounding it, aiming to improve on the accuracy of the timing argument.


The model is based on a sample of subhalo catalogues from cosmological dark matter only simulations containing a system resembling the Local Group. From these catalogues, I identified the Local Group analogues based on the presence of a pair of dark matter haloes with mutual kinematics and masses resembling those of the Milky Way and Andromeda galaxies in the Local Group, with no other massive objects in the vicinity. For the Hubble flow fitting I used an algorithm for automatically choosing the best range of objects to include in the fit.

The surrounding Hubble flows showed clear signs of anisotropy and existence of substructures within the flow. In order to capture the properties of these structures, I clustered the subhaloes within 1.5 to 5.0~Mpc from the Milky Way analogue in each simulation using the DBSCAN clustering algorithm. I then fitted separate Hubble flows for subhaloes outside clusters, within each cluster and within each cluster with all members less massive than $8 \times 10^{11} \mathrm{M}_{\astrosun}$ in each simulation.

I used twelve variables to construct the model for predicting the Local Group mass. Nine of these were the Hubble constants, Hubble flow zero point distances and velocity dispersions around the fit measured separately for all haloes, clustered haloes and haloes outside clusters. The remaining three consisted of the radial and tangential velocity components and the distance of the Andromeda Galaxy analogue as seen from the Milky Way analogue.

I split the data set consisting of 119 subhalo catalogues into a training set with approximately 60~\% of the whole data set (71 subhalo catalogues) and a test set containing the remaining subhalo catalogues. I then extracted the principal components of the training set and selected the two first to be used in predicting the mass. A linear regression model was fitted to these components using 10\=/fold cross-validation. The error of the resulting model was estimated by applying the model on the test set and comparing its predictions to the known masses of the subhalo pair. The obtained root-mean-square error was $1.06 \times 10^{12}\ \mathrm{M}_{\astrosun}$. This is a clear improvement over the timing argument, which had a root-mean-square error of $1.42 \times 10^{12}\ \mathrm{M}_{\astrosun}$ in the same data set.
\end{abstract}

\mytableofcontents


\chapter{Introduction}
This thesis explores the properties of the Local Group and its surroundings with the aim of constructing a model for predicting its mass based on cosmological simulations. The following sections shortly introduce the topic and the early history of the field together with a brief review of the aim and structure of the thesis.

\section{Galaxies and Galaxy Groups}
Galaxies are the smallest components of the Universe on cosmological scales. They trace the underlying distribution of dark matter, making visible the enormous filaments and walls stretching over tens of Mpc separated by vast voids of very low density. The environment of a galaxy also strongly affects its properties with e.g. large elliptical galaxies being more common in densely populated environments whereas spirals are more often found in more isolated settings.

Many galaxies reside in a group or cluster of galaxies, neighboured by tens to even thousands of other galaxies, though without a distinctive outer edge, the membership of peripheral galaxies in groups can be ambiguous. The distinction between a group and a cluster of galaxies is typically drawn somewhere around 50 bright members, with smaller associations called groups and larger clusters \citep{mo2010galaxy, binney2008galactic}. The Milky Way is also located in a galaxy group, which is known as the Local Group.

The Local Group covers a volume with a radius of around 1~Mpc around the Milky Way and Andromeda galaxies, with these two massive spiral galaxies in the Local Group accounting for most of its mass. It contains more than 70 members, most of which are satellites of either the Milky Way or the Andromeda Galaxy \citep{mcconnachie2012observed}. Due to our location within it, it is the best studied galaxy group as within it even very faint dwarf galaxies can be observed.

Even then, the masses of both the Local Group as a whole and its most massive members individually are still quite uncertain. For example, different estimates of the total mass of the Milky Way from the past ten years easily differ by a factor of 3 \citep{wang2015estimating}. This is due to most of the mass being dark matter, which cannot be directly observed. Therefore, instead of relying on e.g. star counts, most of the mass estimates use the kinematics in the system and estimate the mass by analysing orbits of either individual galaxies or point sources such as masers or high velocity stars. This can be done either purely analytically, with a single pair of objects, or statistically by fitting the mass to observations of a large number of objects.


\section{History of Local Group Research}

The Andromeda Galaxy and both the Large and Small Magellanic Clouds are visible to the naked eye and thus have been available to inspire mythologies for millennia, with the first known written record of the Andromeda Galaxy and the Magellanic Clouds being from 946~BCE, when Persian astronomer Al-Sufi wrote his \textit{Book of Fixed Stars} \citep{schultz2012andromeda, ihsan2011abdul}. Nonetheless, the understanding of them as individual galaxies belonging to the Local Group is a relatively recent concept.

At the time of Al-Sufi, and for centuries to come, they were known only as nebulous objects with no information about their distance or physical nature as galaxies analogous to the Milky Way. This is reflected for example in the fact that the Andromeda Galaxy was included in the Messier Catalogue, compiled during the years 1771--1784 to help comet hunters filter out known nebulous objects of no interest \citep{longair2006cosmic}. Despite numerous other lists of nebulous objects having been compiled since, including ones for more relevant intended uses, the Messier Catalogue still has a strong footing especially in amateur astronomy. For example the Andromeda Galaxy is still often called M31, its designation in the catalogue, both in scientific and popular publications.

Still in the beginning of the \nth{20} century the nature of these objects now known to be extragalactic was unknown, as was the nature of the Milky Way itself and whether it  encompasses the whole Universe \citep{longair2006cosmic}. Only after Henrietta Leavitt published her paper on the period-luminosity relation in 1912, could extragalactic distances be measured reasonably accurately. In 1913, Ejnar Hertzsprung calibrated the relation and determined the distance to the Small Magellanic Cloud to be 10 kpc, a gross underestimate of the true distance but still well outside the range of previously measured distances for any object \citep{longair2006cosmic}. Still, more than ten years of uncertainty about whether these nebulae are a part of the Milky Way ensued, up to 1925 when Edwin Hubble used the cepheids in Andromeda and Triangulum galaxies to confirming both of them to be their own galaxies outside the Milky Way \citep{longair2006cosmic}. Hubble was also the first to use the name \textit{Local Group} when he introduced it in his book \textit{The Realm of the Nebulae} \citep{hubble1936realm}.

The progressively improving estimates of the positions and kinematics of the nearby galaxies have confirmed these objects to be part of the Local Group, together with numerous smaller galaxies. The number of known Local Group members has also increased over time, starting from just the most luminous members and growing to 41 by \citeyear{mateo1998dwarf}, only to be more than doubled to over a hundred known galaxies within 3~Mpc by \citeyear{mcconnachie2012observed} \citep{mateo1998dwarf, mcconnachie2012observed}. Although the list of known Local Group members is substantial, it is likely that unseen dwarf galaxies still lie within the Local Group.

Still, not all questions about even the most massive members of the Local Group have even reasonably accurate answers. For example the masses of the Milky Way and M31 have puzzled astronomers as long as they have been recognized as separate galaxies and the uncertainty is still considerable. Nowadays, powerful telescopes allow using a multitude of objects for tracing the gravitational potential of the galaxies, but this has not always been the case. One of the early ways of estimating the total mass in the Milky Way and M31 galaxies is the timing argument by \citet{kahn1959intergalactic}. It relies on approximating the galaxy pair as isolated point masses, initially moving away from each other due to the expansion of the Universe but eventually coming to a halt and starting to approach each other due to their gravitational pull.

Knowing their positions and velocities and the age of the Universe, the mass can be approximated by fitting a Keplerian orbit to them. \citeauthor{kahn1959intergalactic} obtained a minimum of $1.8 \times 10^{12}\ \mathrm{M}_{\astrosun}$ for the mass, notably being of the same magnitude as modern estimates though a bit on the low side. Applying the same method with modern values for the input variables produces a mass of approximately $4.2 \times 10^{12}\ \mathrm{M}_{\astrosun}$.

The assumption of isolated point masses is of course intrinsically wrong as galaxies are extended objects, so refining the model is useful. Cosmological simulations offer an attractive tool for this, as it is possible to generate a multitude of systems resembling the Local Group and calibrating the model based on the observed effects of the environment and tangential velocities of the extended haloes. The first ones to calibrate the timing argument in this way were \citet{kroeker1991accuracy}, with more recent similar studies including e.g. that of \citet{li2008masses}. A more detailed description of the timing argument and other mass estimation methods is presented in section \ref{sect:lg_and_mass}.


\section{Aim and Structure of This Thesis}
This thesis focuses on producing a method for estimating the mass of the Local Group besting the accuracy of the timing argument. This is done by applying tools of statistics and machine learning to analyse a sample of 119 Local Group analogues found in cosmological simulations. The performances are compared only on simulated Local Groups, and applying the method to observations is left for future work.

First, in chapter \ref{chapt:theoretical_background}, the basic cosmological context for the evolution of the Local Group is covered, together with what is currently known about the Local Group and its mass. This chapter also introduces some of the implications the mass of the Local Group has on cosmological models. Then, in chapter \ref{chapt:simulations}, the simulation techniques used to run the analysed simulations are described, followed by descriptions of the simulation runs. Chapter \ref{chapt:mathematical_and_statistical} describes the analysis methods used for creating and testing the model, after which all building blocks required for creating the model are covered.

The actual data analysis is described in chapter \ref{chapt:results}. It starts with an overview of the properties of the found Local Group analogues in section~\ref{sect:properties_of_LGs}, after which the methods and results of analysing the surrounding Hubble flows are presented in section~\ref{sect:hf}. These are followed by a review of the directional dependence of the Hubble flow in section~\ref{sect:hf-anisotropy}, investigating both the anisotropy of the flow around the binary distribution of the mass in the Local Group and the effect of other structures present in the 5~Mpc region surrounding the Local Group. These structures are identified by finding clusters of subhaloes near each other on the sky as seen from the Milky Way analogue.
Finally, in section~\ref{sect:statistical_estimate}, a statistical model for predicting the mass of the Local Group based on the measurements conducted in the previous sections is presented.


\chapter{Theoretical Background} \label{chapt:theoretical_background}
Cosmology determines the properties of the Universe, including its origin, the laws of nature by which it evolves and the structures that arise within it \citep{mo2010galaxy}. Thus many fields of astronomy and astrophysics, including the study of galaxies and galaxy groups such as the Local Group and its members, are tightly connected to the study of cosmology. This section gives a brief explanation of the current cosmological understanding, its relevant implications and how the Local Group is currently viewed in ta cosmological context.

\section{Basics of Cosmology}
Understanding processes that take place on very large scales or take long times requires understanding some basic cosmology. The following sections will cover the most basic concepts of cosmology, the evolution of the Universe on both large and small scales, and the $\Lambda$CDM model which is currently the cosmological model that best matches the observations on multiple scales \citep{mo2010galaxy}. Sections \ref{universe-evolution} and \ref{universe-composition} cover large scales at which the cosmological principle applies and section \ref{universe-structure} covers smaller scales of individual dark matter haloes.

\subsection{Evolution of the Universe} \label{universe-evolution}

The current cosmological understanding is based on the general theory of relativity together with simple hypotheses such as the cosmological principle, which states that on large scales the Universe is spatially homogeneous and isotropic \citep{mo2010galaxy}. At a given location, an observer might not see this, as is easy to understand when considering two observers located at the same point but moving relative to each other. In this situation, it is  clear that at least one of the observers will see a dipole in the surrounding velocities and thus will not observe the universe to be isotropic. Nonetheless, in an isotropic universe, for every point in space we can define a so-called fundamental observer as the observer who sees the universe as isotropic \citep{mo2010galaxy}. These fundamental observers correspond to a cosmological rest frame, which can be determined at a given location e.g. by observing the cosmic microwave background and subtracting the velocity corresponding to the observed dipole component \citep{mo2010galaxy}.

The existence of such fundamental observers has interesting consequences. The existence of any large-scale flows is clearly prohibited, as these would violate the isotropy. In a three-dimensional universe any curl of the velocity field around any fundamental observer is also forbidden. This can be easily seen by considering the hairy ball theorem, which states that a tangential vector field on a sphere must be zero in at least one point \citep{renteln2013manifolds}. Thus if there was a curl in the surrounding velocity field, there would always still be at least one direction in which the tangential velocity is zero and thus the field would not be isotropic. This means that there cannot be any tangential motion and thus the only allowed motion happens in the radial direction: the Universe can either expand or contract.

The expansion of the Universe can be parametrized using the dimensionless scale factor $a$, whose time evolution is governed by the Friedmann equations
\begin{align}\label{friedmann1}
\frac{\ddot{a}}{a} = -\frac{4\pi G}{3} \left( \rho + 3 \frac{P}{c^2} \right) + \frac{\Lambda c^2}{3} \\ \label{friedmann2}
{\left( \frac{\dot{a}}{a} \right)}^2 = \frac{8\pi G}{3}\rho - \frac{Kc^2}{a^2} + \frac{\Lambda c^2}{3},
\end{align}
where $G$ is the gravitational constant, $P$ is pressure, $\rho$ is energy density and $K$ and $\Lambda$ are cosmology-related parameters \citep{mo2010galaxy}. $K$ specifies the curvature of the Universe, determined by the overall density of the Universe \citep{mo2010galaxy}. The allowed values for $K$ are $-1$ corresponding to a hyperbolic universe, $0$ to a flat one and $1$ to a spherical universe \citep{mo2010galaxy}. These are also often called open, critical and closed universes respectively. Modern measurements suggest that the Universe is flat within the measurement error, but small deviations from the $K=0$ cannot be ruled out \citep{planck2016resultsI}. $\Lambda$ is the cosmological constant driving the expansion of the Universe, often described as dark energy or vacuum energy \citep{mo2010galaxy}.

The concept of scale factor $a$ is crucial for this thesis as it affects measured values of both distances and velocities. It increases or decreases as the Universe expands or contracts, and thus it can be used to relate distances at different times \citep{mo2010galaxy}. For a universe that is monotonically expanding, the scale factor can also be used as an alternative time coordinate, as it has a one-to-one correspondence to time.

It is often convenient to measure not the proper distance $l$ between a pair of objects, but instead their so-called comoving distance $r$: \citep{mo2010galaxy}
\begin{equation} \label{comoving}
r = \frac{l}{a}
\end{equation}
For a pair of objects with no relative motion, the comoving distance will remain constant as the size of the Universe changes. A comoving coordinate system is often used in cosmological simulations, as the expansion of the universe is included in the scale factor instead of all of the coordinates having to be recalculated as the size of the universe varies \citep{griebel1002numerical}.

Observations have confirmed that the Universe is indeed expanding, first observed by \citet{hubble1929relation}. The rate of the expansion is denoted with the Hubble parameter $H(t)$, which is defined using the proper distance and the rate of change of the proper distance between a pair of fundamental observers:
\begin{equation}
H(t)\; l = \dv{l}{t}
\end{equation}
The Hubble parameter is also closely related to the scale factor, as \citep{mo2010galaxy}
\begin{equation}\label{reducedhubble}
H(t) = \frac{\dot{a}(t)}{a(t)}.
\end{equation}
Often in calculations the Hubble parameter is replaced with the reduced Hubble parameter, often denoted by $h$ and defined as \citep{montgomery2012introduction}
\begin{equation}
h(t) = \frac{H(t)}{100\ \mathrm{km/s/Mpc}}.
\end{equation}


The early measurements trying to determine the current value of the Hubble parameter, often called the Hubble constant, were prone to error as the distance estimates to extragalactic objects were inaccurate. For example the first measurement by \citet{hubble1929relation} yielded a value of over 500~km/s/Mpc due to systematically erroneous distance measurements derived using Cepheid variables. Later estimates offer reasonably accurate results such as the \citet{planck2014resultsXVI} value of $67.77 \pm 0.77$~km/s/Mpc for the current expansion speed, though the results of different experiments have considerable scatter.

Another factor affecting measurements based on extragalactic objects is that the observed proper velocities contain not only the expansion of the Universe but also peculiar motions of the objects, which is why measurements based on other probes such as the cosmic microwave background in case of \citet{planck2016resultsI} are valuable. Information contained in the peculiar motions can still be interesting. In this thesis, the mass of the Local Group is estimated using radial velocity measurements within a few megaparsecs of a number of simulated Local Universe analogues. At scales this small, the expansion of the Universe is greatly affected by local gravity fields, and thus expansion measurements can be used to infer the mass enclosed within the Local Group.

While the scale factor is one possible way of expressing time, it is not the only one. As the universe expands and we observe objects receding, the light emitted from them is shifted to longer wavelengths. The further away the emitter is, the more the space between the observer and emitter will expand making the effect stronger. The relative change in the wavelength is called redshift $z$, defined as
\begin{equation}
z \equiv \frac{\lambda_o - \lambda_e}{\lambda_e}
\end{equation}
where $\lambda_o$ is the observed and $\lambda_e$ the emitted wavelength \citep{mo2010galaxy}. If the effect of peculiar motions is ignored, redshift is directly related not only to the distance to the emitter but also to the scale factor at the time of the emission. For arbitrary scale factors at the time of the emission and observation, the relation between $z$ and $a$ is \citep{mo2010galaxy}
\begin{equation}
1 + z = \frac{a(t_o)}{a(t_e)}.
\end{equation}
In most situations observations are done at $a(t_o)=1$, in which case the scale factor at the time of the emission can be written
\begin{equation}
a = \frac{1}{1+z}.
\end{equation}


\subsection{Composition of the Universe} \label{universe-composition}
Using the Friedmann equations \ref{friedmann1} and \ref{friedmann2} to model the evolution of the Universe requires not only the equations and cosmological parameters introduced in section \ref{universe-evolution} but also the composition of the Universe to be known in order to determine the density, pressure, cosmological constant and ultimately even the curvature of the Universe. According to the current understanding, the Universe is made of three components: non-relativistic matter, relativistic matter and dark energy \citep{mo2010galaxy}. At present time, about 69 \% of the energy density $\rho$ in the Universe is dark energy and 31 \% is non-relativistic matter, including both baryonic and dark components \citep{planck2016resultsI}. Most of the non-relativistic matter, around 84 \%, is dark matter, with baryonic matter only contributing around 16 \% of total matter energy density \citep{planck2016resultsI}. The energy density of relativistic matter, i.e. photons and standard-model neutrinos, in the present-day universe is negligible \citep{mo2010galaxy}.

This has not always been true, as these ratios change over time as the Universe evolves. It is easy to see that, as the Universe expands, the energy density of matter behaves as $a^{-3}$, as the volume of the Universe increases as $a^3$ and in an adiabatic system no matter is created or disappears. Radiation is diluted similarly to matter, but in addition to the effect of increasing volume, the growing universe also causes the wavelength of the radiation to increase, resulting in the energy density decreasing as $a^{-4}$. The dark energy can be thought as arising from the space itself, and thus a change in $a$ does not affect the energy density of the component \citep{mo2010galaxy}. The change of energy density of a component may also correspond to a change in the pressure or temperature of the component \citep{mo2010galaxy}.

The different time evolutions of the components also mean that the dominant component of the Universe changes as the scale factor grows. At very early times the Universe was radiation dominated, but as the radiation energy density decreases faster than the energy densities of the two other components, a matter dominated era followed. As dark energy is the only component with constant energy density, it will be the final dominant energy component. This transition from matter to dark energy dominated era has happened in the recent past \citep{mo2010galaxy}.

The geometry of the Universe is also determined by its contents. A density threshold known as critical density and defined as
\begin{equation}
\rho_{crit,0} \equiv \frac{3H_0^2}{8\pi G}
\end{equation}
acts as a threshold value that separates the different geometries \citep{mo2010galaxy}. Subscript zero comes from $z=0$ and denotes present-day values, but all introduced quantities can also be determined at any other time. The overall density of the Universe can be parametrized using this critical density:
\begin{equation} \label{Omega}
\Omega_0 \equiv \frac{\bar \rho_0}{\rho_{crit,0}}
\end{equation}
where $\bar\rho_0$ is the mean density of the Universe and $\Omega_0$ is known as the density parameter \citep{mo2010galaxy}. Different values of $\Omega_0$ correspond to different geometries: for values of $\Omega_0 < 1$ the Universe is hyperbolic, for $\Omega_0 = 1$ flat and for  $\Omega_0 > 1$ spherical \citep{mo2010galaxy}.

As the contents of the Universe can be divided into the three categories of dark energy and relativistic and non-relativistic matter, the total density parameter is also a sum of three density parameters:
\begin{equation}
\Omega_0 = \Omega_{m,0} + \Omega_{r, 0} + \Omega_{\Lambda, 0}
\end{equation}
where $m$, $r$ and $\Lambda$ stand for non-relativistic matter (``matter''), relativistic matter (``radiation'') and dark energy \citep{mo2010galaxy}. Each of these can be calculated similarly to the equation \ref{Omega} but replacing the overall density with density of the corresponding component \citep{mo2010galaxy}. As in the Universe the value of $\Omega_0$ is very close to unity, the different density parameters conveniently correspond to the relative densities of the components.


\subsection{Structure Formation in the Linear Regime} \label{universe-structure}
All structures in the Universe arise from the small perturbations in the matter density in the early universe and the physics that govern their evolution. These primordial density fluctuations that later develop into the structures such as galaxy clusters and voids separating them can still be seen in the cosmic microwave background (CMB) \citep{planck2016resultsI}. Starting at the end of inflation, the density contrast of such perturbations in the dark matter begins to grow. While dark matter is collisionless, after inflationary epoch the baryons are still coupled to radiation via Thomson scattering and experience pressure which slows their structural evolution down. Consequently, their evolution is delayed relative to the dark matter, and baryons sink into potential wells already formed from the dark matter.

At $z \approx 1100$, the time of recombination and origin of the CMB photons, these fluctuations are still very small, having $\Delta\rho/\rho \approx 10^{-5}$, but sufficiently overdense regions have already collapsed \citep{mo2010galaxy}. The gas inside these structures, seen as hot spots in the CMB, has just reached a density maximum and is shock-heated, while gas inside smaller fluctuations has already begun expanding again, and larger scales have not yet reached their maximum density at the time of the CMB.

At first, the evolution of overdense regions differs from the evolution of the surroundings only by the speed of the expansion of the Universe in that region: expansion is slower in denser regions. Later some of these volumes will start to collapse. This requires the volume to be sufficiently dense to allow gravity to overcome the expansion of the universe as described by the Friedmann equations.

To understand the evolution of density perturbations at early times when the perturbations are still small and evolve linearly, let us consider an ideal fluid of density $\rho$, moving at proper velocity $\mathbf{v}$ and experiencing the gravitational field with potential $\phi$. Growth of a perturbation in this medium is governed by three equations: the equation of continuity describing the conservation of mass, the Euler equation governing the motions in the fluid and the Poisson equation describing the gravitational field, or
\begin{equation}\label{continuity}
\frac{\textrm{D}\rho}{\textrm{D}t} + \rho \nabla_\mathbf{x}\cdot{\mathbf{v}} = 0 \,,
\end{equation}
\begin{equation}\label{euler}
\frac{\textrm{D}\mathbf{v}}{\textrm{D}t} = - \frac{\nabla_\mathbf{x} P}{\rho} - \nabla_\mathbf{x}\phi
\end{equation}
and
\begin{equation}\label{poisson}
\nabla_\mathbf{x}^2\phi = 4\pi G\rho
\end{equation}
respectively \citep{mo2010galaxy}. Here $\mathbf{x}$ denotes proper coordinates and $\frac{\textrm{D}}{\textrm{D}t}$ is the convective time derivative, defined as
\begin{equation}
\frac{\textrm{D}}{\textrm{D}t} = \pdv{}{t} + \mathbf{v}\cdot\nabla_\mathbf{x}
\end{equation}
and describing the time derivative when moving with the fluid \citep{mo2010galaxy}.

Next let us follow \citet{longair2008galaxy} and introduce a small perturbation by replacing $\mathbf{v}$, $\rho$, $P$ and $\phi$ with $\mathbf{v_0} + \delta\mathbf{v}$, $\rho_0 + \delta\rho$, $P_0 + \delta P$ and $\phi_0 + \delta \phi$ respectively, having subscript zero represent the properties of the unperturbed medium. Now equations \ref{continuity}--\ref{poisson} can be written as
\begin{equation}
\frac{\textrm{D}\rho_0}{\textrm{D}t} + \frac{\textrm{D}\delta\rho}{\textrm{D}t} + \rho_0 \nabla_\mathbf{x}\cdot{\mathbf{v}_0} + \rho_0 \nabla_\mathbf{x} \cdot\delta{\mathbf{v}} + \delta \rho \nabla_\mathbf{x} \cdot{\mathbf{v}}_0 + \delta\rho \nabla_\mathbf{x} \cdot\delta{\mathbf{v}}  = 0 \,,
\end{equation}
\begin{equation}
\frac{\textrm{D} \mathbf{v}_0}{\textrm{D}t} + \frac{\textrm{D} \delta\mathbf{v}}{\textrm{D}t}= - \frac{\nabla_\mathbf{x} (P_0 + \delta P)}{\rho_0 + \delta \rho} - \nabla_\mathbf{x}\phi_0 - \nabla_\mathbf{x}\delta\phi
\end{equation}
and
\begin{equation}
\nabla_\mathbf{x}^2\phi_0 + \nabla_\mathbf{x}^2\delta\phi = 4\pi G\rho_0 + 4\pi G\delta\rho.
\end{equation}
These equations can be greatly simplified by subtracting the unperturbed versions from each and assuming that the initial state is homogeneous and isotropic i.e. $\nabla P_0=0$ and $\nabla \rho_0 = 0$. Using the knowledge that in the linear regime the perturbations are small and thus discarding second order terms, the equations take the following forms:
\begin{equation}
\frac{\textrm{D}}{\textrm{D}t} \frac{\delta \rho}{\rho_0} = -\nabla_\mathbf{x} \cdot \delta \mathbf{v}
\end{equation}
\begin{equation}
\pdv{\delta \mathbf{v}}{t} + (\delta \mathbf{v} \cdot \nabla_\mathbf{x}) \mathbf{v}_0 = - \frac{\nabla_\mathbf{x} \delta P}{\rho_0} - \nabla_\mathbf{x} \delta \phi
\end{equation}
\begin{equation}
\nabla_\mathbf{x}^2\delta \phi = 4 \pi G \delta \rho
\end{equation}

As the Universe is expanding, it is natural to transit to comoving coordinates, defined in equation \ref{comoving} and denoted by $\mathbf{r}$. This also affects the velocities, and using the dot to denote time derivative we can write
\begin{equation}
\mathbf{v} = \dot{\mathbf{x}} = \dot{a}(t)\mathbf{r} + a(t)\dot{\mathbf{r}} = \dot{a}(t)\mathbf{r} + \mathbf{u},
\end{equation}
where $\mathbf{u}$ denotes the perturbed comoving velocity. From this it follows that $\delta \mathbf{v} = a \mathbf{u}$. In comoving coordinates the operator $\nabla_\mathbf{x}$ is also replaced with $\frac{1}{a}\nabla_\mathbf{r}$. Using these and $(a\mathbf{u}\cdot\nabla_\mathbf{x})\dot{a}\mathbf{r}=\mathbf{u}\dot{a}$, the equations can be written as
\begin{equation}\label{continuity-modified}
\frac{\textrm{D}}{\textrm{D}t} \frac{\delta \rho}{\rho_0} = -\nabla_\mathbf{r} \cdot \mathbf{u}
\end{equation}
\begin{equation}\label{euler-modified}
\pdv{\mathbf{u}}{t} + 2 \frac{\dot a}{a}\mathbf{u} = - \frac{\nabla_\mathbf{r} \delta P}{\rho_0 a^2} - \frac{1}{a^2}\nabla_\mathbf{r} \delta \phi.
\end{equation}
\begin{equation}\label{poisson-modified}
\nabla_\mathbf{r}^2\delta \phi = 4 \pi G \delta a^2 \rho
\end{equation}
Taking the comoving divergence, equation \ref{euler-modified} gives
\begin{equation}
\nabla_\mathbf{r} \cdot \dot{\mathbf{u}} + 2 \frac{\dot a}{a}\nabla_\mathbf{r} \cdot \mathbf{u} = - \frac{\nabla_\mathbf{r}^2 \delta P}{\rho_0 a^2} - \frac{1}{a^2}\nabla_\mathbf{r}^2 \delta \phi.
\end{equation}
There the last term on the right side contains the left side of equation \ref{poisson-modified}, the first term on the left can be found in equation \ref{continuity-modified} after taking the convective time derivative and the second term on the left already contains the right side of \ref{continuity-modified}. Inserting these yields
\begin{equation}\label{euler-insertions1}
\frac{\textrm{D}^2}{\textrm{D}^2 t} \frac{\delta \rho}{\rho_0} + 2 \frac{\dot a}{a} \frac{\textrm{D}}{\textrm{D} t} \frac{\delta \rho}{\rho_0} = \frac{\nabla_\mathbf{r}^2 \delta P}{\rho_0 a^2} + 4 \pi G \delta \rho.
\end{equation}

The overdensities in the Universe are often denoted using the density contrast $\Delta = \delta\rho/\bar{\rho}$, where the bar denotes the universal mean \citep{mo2010galaxy}. In this case the unperturbed density $\rho$ is the mean density, so the left side of equation \ref{euler-insertions1} can be expressed using $\Delta$. If the density perturbations are assumed to be adiabatic, the adiabatic sound speed $c_s$ relates perturbations in density and pressure as $\delta P / \delta \rho = c_s^2$. Inserting these yields
\begin{equation}\label{euler-insertions1}
\frac{\textrm{D}^2\Delta}{\textrm{D}^2 t} + 2 \frac{\dot a}{a} \frac{\textrm{D}\Delta}{\textrm{D} t} = \frac{c_s^2 \nabla_\mathbf{r}^2 \Delta \rho_0}{a^2 \rho_0} + 4 \pi G \Delta \rho_0.
\end{equation}

Now a trial solution of $\Delta \propto e^{i(\mathbf{k}_r\cdot\mathbf{r}-\omega t)}$ corresponding to a wave with comoving wavevector $\mathbf{k}_r$ yields a wave equation
\begin{equation}\label{wave-equation}
\frac{\textrm{D}^2\Delta}{\textrm{D}^2 t} + 2 \frac{\dot a}{a} \frac{\textrm{D}\Delta}{\textrm{D} t} = \Delta(4\pi G \rho_0 - k^2c_s^2).
\end{equation}
Here $\mathbf{k}_r$ has been replaced by $a\mathbf{k}$ to transform it to proper coordinates. The wave equation is a linear second-order partial differential equation that describes the evolution of perturbations.

Whether the waves described by equation \ref{wave-equation} are oscillatory or unstable depends on the sign of the right side of the equation. If $c_s^2k^2 > 4\pi G\rho_0$ the perturbations are oscillating sound waves  supported by the internal pressure of the denser regions, but if $c_s^2k^2 < 4\pi G\rho_0$ the wave is unstable and the modes grow \citep{longair2008galaxy}. In a static universe these density perturbations grow exponentially but in an expanding universe the growth is slower: e.g. in simple Einstein-de Sitter universe with $\Omega_0 = 1$ and $\Omega_\Lambda = 0$ the growth is only algebraic \citep{longair2008galaxy}.

\subsection{Non-Linear Collapse} \label{universe-collapse}

The equation \ref{wave-equation} represents the perturbations well if all matter behaves as non-relativistic fluid so that Newtonian physics suffice to describe the perturbations, the perturbations are assumed to be spatially small compared to the observable universe and the density contrast is smaller than unity \citep{mo2010galaxy}. In the present-day Universe it is clear that many structures with $\Delta \gg 1$ exist so the linear model alone is not sufficient to explain the evolution of structures. In the general case the evolution of these non-linear perturbations cannot be predicted analytically and simulations are often used instead to gain insight into their dynamics \citep{mo2010galaxy}.

One special case in which an analytical solution can be presented is spherical top-hat collapse in which there is a uniform spherical density perturbation with no angular momentum inside a uniform density field \citep{longair2008galaxy}. The evolution of such a perturbation resembles the evolution of an individual universe with $K = 1$: initially the perturbation expands with the surrounding Universe, but its expansion slows down relative to its surroundings and eventually a sufficiently dense perturbation reaches a point after which it starts to contract \citep{longair2008galaxy}. As the size of the perturbation as a function of time is a cycloidic function, it is easiest to express in parametric form
\begin{align}
a_p &= \frac{\Omega_0}{2(\Omega_0-1)}(1-\cos\theta) \\
t &= \frac{\Omega_0}{2H_0(\Omega_0-1)^{3/2}}(\theta - \sin\theta)
\end{align}
where $a_p$ is the relative size of the perturbation and $t$ is time \citep{longair2008galaxy}. Parameter $\theta$ evolves from 0 to $2\pi$. From this equation one can see that $a_p$ reaches its maximum at $\theta = \pi$. This maximum size is known as the turnaround radius of the perturbation and the corresponding time as the turnaround time \citep{mo2010galaxy}.

At $\theta=0$ and $\theta = 2\pi$ the model predicts a radius of zero and therefore an infinite density for the perturbation. Thus it is clear that in addition to the often unrealistic assumption of a perfectly uniform and spherical density perturbation the model has other limitations as well. In reality, smaller perturbations are present within the perturbation to cause the larger perturbation to fragment and the perturbation feels tidal forces from other perturbations that surround it, which apply torques \citep{longair2008galaxy}. Different structures can also collide and merge.

The evolution of these fragments depends on their composition. Collisionless matter such as cold dark matter will simply experience relaxation and end up as virialized structures, but structures containing baryons have more variation \citep{mo2010galaxy}. In gas, the non-linear evolution of the perturbation is more complicated as the gas feels pressure and shocks can occur when the gas compresses \citep{mo2010galaxy}. Unless the gas is able to radiate away energy by effective cooling, the relaxation ends when the structure is in hydrostatic equilibrium \citep{mo2010galaxy}.

In this work, only dark matter is studied. These collapsed objects made of dark matter are called dark matter haloes \citep{mo2010galaxy}. The exact definition for when an object is dense enough to be considered a halo and where the edge of a halo lies vary somewhat depending on the source, but for halo catalogues analysed in this thesis a halo is defined to extend to the radius at which the mean density of a spherical volume drops below 200 times the critical density of the Universe. The mass and radius of such a halo are denoted $M_{200}$ and $r_{200}$.

As the cosmological model and its parameters affect the structure formation, analysing the observations of structures at different stages of their evolution provides a way to compare cosmological models \citep{mo2010galaxy}. Currently the standard model is the so-called $\Lambda$CDM model, according to which the Universe consists mostly of dark energy and cold dark matter in ratios given in section \ref{universe-composition}, with baryonic matter making up only a small fraction of total mass \citep{mo2010galaxy}. Observations at the scale of the Local Group and its surroundings are not well suited for constraining the nature of dark energy, but the existence of dark matter can already be seen and its properties studied at scales of an individual galaxy \citep{mo2010galaxy}.

For example the mass of possible warm dark matter particle can be greatly restricted \citep{kennedy2014constraining} and hot dark matter made of standard model neutrinos can be excluded as a sole dark matter component based on the fact that non-linear structure has been able to form in the distribution of galaxies by the current age of the Universe \citep{white1984is}. This is due to the properties of hot dark matter particles, i.e. low-mass dark matter particles that would have decoupled from the radiation while still relativistic  and their thermal motions would allow them to escape from gravity wells and smooth out structures smaller than some tens of Mpc \citep{mo2010galaxy}. The same connection between the particle mass and speed of thermal motions and thus size of smoothed-out structures also sets the lower limit for a warm dark matter particle mass.

\section{The Local Group and Its Mass} \label{sect:lg_and_mass}
Galaxy groups are systems of galaxies defined by the number of galaxies within a volume. Exact definitions vary, but typically they are required to have at least three galaxies and a volume with numerical overdensity of the order of 20 \citep{mo2010galaxy}. The upper limit of the size of a galaxy group is set by the least massive galaxy clusters: again, the different definitions exist but typically if a group would have over 50 members with apparent magnitudes $m < m_3 +2$, $m_3$ denoting the magnitude of the third brightest member, it is classified as a galaxy cluster instead \citep{mo2010galaxy}.

The Local Group, the galaxy group containing the Milky Way, is the best-known of these galaxy groups as it offers great opportunity for precise observations: much fainter dwarf galaxies can be observed in it than in any other galaxy group and objects subtend large areas on the sky allowing smaller details to be observed than in more distant galaxy groups \citep{mo2010galaxy, mcconnachie2012observed}. This makes it an appealing target for testing astrophysical and cosmological models \citep{bullock2017small}. The Local Group has two main members: the Milky Way and the Andromeda Galaxy (M31). In total more than hundred galaxies are known to exist within 3~Mpc of the Sun, more than half of these being satellites for either of the primaries \citep{mcconnachie2012observed}. As the distance range is quite wide, some of the more distant galaxies are likely not bound to the Local Group, but on the other hand there are likely numerous faint dwarf galaxies that remain unseen \citep{mcconnachie2012observed}. The number of these unknown galaxies within the Local Group could be as high as several hundred \citep{tollerud2008hundreds}.

The number of galaxies is not the only uncertainty regarding the Local Group: also the total mass of the system and the individual masses of the Milky Way and M31 galaxies are still highly uncertain with different estimates easily differing by a factor of 2--3 \citep{wang2015estimating, carlesi2016constraining}. This is problematic as in addition to constraining dark matter models as mentioned in section \ref{universe-collapse}, the Local Group has a key role in testing the currently dominant $\Lambda$CDM model. The model explains the large-scale structure of the Universe where it is able to make accurate predictions, but at distance scales of a single galaxy group the quality of the predictions suffers \citep{bullock2017small}.

Two of the possible discrepancies between observations and $\Lambda$CDM predictions are the missing satellites problem and the too-big-to-fail problem. The first arises from the fact that both analytical calculations and cosmological simulations produce more low-mass satellites in systems similar to the Milky Way or M31 than can be observed \citep{klypin1999missing}. To some extent the number of luminous galaxies and dark matter haloes can be expected to differ as stars do not form in all haloes due to reionization and feedback processes such as supernova feedback \citep{efstathiou1992suppressing, larson1974effects}. The problem is also somewhat alleviated by the high estimated number of satellite galaxies that are luminous but faint and yet unseen. It is still uncertain whether the missing satellites problem actually exists as many modern simulations such as those by \citet{sawala2016apostle} agree well with the observed number of faint satellite galaxies.

Regardless of the existence of the missing satellites problem, the most massive dark matter haloes in dark matter only simulations also seem to be more numerous than observed galaxies in the Local Group are. This is known as the too-big-to-fail problem. For example the Milky Way has only three satellite galaxies with central densities high enough to allow maximum circular velocities of more than 30~km/s, but dark matter only simulations tend to produce double or even triple this number of similar mass dark matter haloes \citep{sawala2016apostle}. In this case the galaxies should be massive enough to form stars regardless of reionization, but central masses of the satellites can still be reduced as a result of satellites interacting with the primary and e.g. tidal stripping and ram pressure stripping lowering the central densities \citep{bullock2017small}. As with the missing satellites problem, the difference between the simulations and observations decreases as baryons and baryonic effects are included in the simulations \citep{sawala2016apostle}.

Both of these possible problems are sensitive to the mass of the Local Group: a massive halo is expected to have more subhaloes than a less massive one has. Thus the mass of the Local Group and its members is an intriguing question not only for building up knowledge about our surroundings but also for testing cosmological models. Numerous studies have been conducted to find out its mass, and the following sections outline some of the methods that have been used.

\subsection{Timing Argument}\label{sect:timing-argument}
One possible way of estimating the lower end of the range of possible Local Group masses is to use the timing argument, first introduced by \citet{kahn1959intergalactic}. It is based on the mutual kinematics of the Milky Way and M31 galaxies and the assumption that they have formed close to each other and are now orbiting their mutual mass centre, approaching each other for the first time. This corresponds to the structure formation model presented in section \ref{universe-structure}: initially the expansion of the Universe drives the pair further away from each other, but eventually the mass of the system is able to overcome the expansion and the galaxies start to approach each other.

For a zero angular momentum system, \citeauthor{kahn1959intergalactic} obtain an estimate of minimum total mass of the system using Kepler's third law
\begin{equation}
P^2 = \frac{4\pi}{GM}a
\end{equation}
and the fact that energy is conserved:
\begin{equation}
\frac{GM}{2a} = \frac{GM}{D} - E_k.
\end{equation}
Many of the quantities that appear in the equations are either known or can be approximated: the current distance $D$ between the Milky Way and the centre of mass of the system can be estimated using the distance to the M31 galaxy, an upper limit for period $P$ can be obtained using the current age of the Universe and kinetic energy, $E_k$, only depends on the velocity of the galaxy, easily obtained using radial velocity measurements as long as the tangential velocity is assumed to be negligible. Thus what remains to be solved are semimajor axis, $a$, and the effective mass at the centre of gravity, $M$. The lower limit for the mass of the system derived by \citeauthor{kahn1959intergalactic} was $1.8 \times 10^{12}~\mathrm{M_{\astrosun}}$. This is clearly less than current estimates
that tend to favour masses around $5 \times 10^{12}~\mathrm{M_{\astrosun}}$ \citep{li2008masses, fattahi2016apostle}, but still significantly larger than the observed baryonic mass content of the two galaxies \citep{kahn1959intergalactic}.

The estimate can be improved by using the Kepler's laws in parametric form:
\begin{equation}\label{kepler-r}
r = a(1-\cos(E))
\end{equation}
and
\begin{equation}\label{kepler-t}
t = \left(\frac{a^3}{\mu}\right)^{1/2}(E-\sin(E))
\end{equation}
where $E$ is the eccentric anomaly and $\mu$ is gravitational parameter \citep{li2008masses}. These two equations can be combined to determine the value of $\dv{r}{t}$ as
\begin{equation}
\dv{r}{t} = \dv{r}{E} \dv{E}{t} = \dv{r}{E} \left(\dv{t}{E}\right)^{-1}.
\end{equation}
Calculating the derivatives using \ref{kepler-r} and \ref{kepler-t} and inserting them yields the following equation:
\begin{equation}\label{orbit-speed}
\dv{r}{t} = \left(\frac{\mu}{a}\right)^{1/2}\frac{\sin(E)}{1 - \cos(E)}.
\end{equation}

Solving for $a$ and $\sqrt{\mu}$ from equations \ref{kepler-r} and \ref{kepler-t} yields
\begin{equation}\label{timing-a}
a = \frac{r}{1-\cos(E)}
\end{equation}
and
\begin{equation}\label{timing-mu}
\sqrt{\mu} = \frac{a^\frac{3}{2}}{t}(E - \sin(E))
\end{equation}
respectively and inserting them into equation \ref{orbit-speed} gives
\begin{equation}
\frac{vt}{r} = \frac{\sin (E) \left(E - \sin(E)\right)}{(1- \cos(E))^2},
\end{equation}
where velocity $v$ and distance $r$ can be deduced from observations and if the Milky Way and M31 are on their first orbit then $t$ is the age of the Universe. Now $E$ can be solved numerically. Inserting equations \ref{timing-a} and the solved value of $E$ into \ref{timing-mu}, a value for $\mu$ and the combined mass of the pair is acquired.

Getting an estimate is simple, but the results are burdened with the assumptions that are made about the system. Modern values for the velocity and distance yield a virial mass of $(4.23 \pm 0.45) \times 10^{12}~\mathrm{M_{\astrosun}}$ for the Local Group \citep{vandermarel2012m31}. This is a considerably higher mass than calculations of the Milky Way and M31 masses using kinematic tracers of the gravitational field suggest \citep{wang2015estimating}, but at least some of the effect might be explained by the fact that the timing argument is sensitive to a larger volume of mass compared to satellite galaxies and other kinematic tracers \citep{kroeker1991accuracy}.

The assumption of the galaxies being on their first approach is likely to be valid as a higher number of completed orbits would result in a mass so high that it does not seem physically realistic. Applying Kepler's laws also requires the masses to be approximated as point masses.  If the dark matter haloes of the galaxies are assumed to be spherically symmetrical, this is not a problem if the haloes are sufficiently far from each other, but at times when the centre of one halo is encompassed in the other halo, the accuracy of the model is compromised. Indeed the mass estimates acquired using timing argument seem to be best consistent with the mass enclosed in two spheres surrounding the two galaxies with radii of half the distance between the galaxies \citep{kroeker1991accuracy}. 

The assumption of radial orbit is possibly more problematic as the estimated tangential velocities vary considerably from study to study. For example \citet{vandermarel2012m31} give a tangential velocity of 17~km/s for M31 with $v_{t} < 34$~km/s at 1~$\sigma$ confidence, which is small compared to their radial velocity of $-109.3 \pm 4.4$~km/s, whereas \citet{vandermarel2018first} estimate the tangential velocity to be as high as 57~km/s, with 68~\% confidence region containing velocities as high as 92~km/s. It is also possible to carry out the calculations above for elliptical orbits as is done by \citet{einasto1982mass}. A non-zero tangential velocity increases the resulting mass: for example \citet{vandermarel2012m31} give a mass of $4.27 \pm 0.53 \times 10^{12}~\mathrm{M_{\astrosun}}$ when the tangential velocity is included. Even if external forces felt by the galaxies do not result in significant tangential velocities for the galaxies, they can affect the system in other ways. Large-scale structure surrounding the Local Group can apply forces and torques and smaller members of the Local Group interact with the primary members as was noted by \citet{kahn1959intergalactic}.


\subsection{Other Mass Estimation Methods} \label{sect:other_mass_estimation}

Other structures can also be used to infer the masses of Milky Way and M31 galaxies. For example \citet{zaritsky1989velocities} use Leo I, a Milky Way satellite with high galactocentric velocity, to estimate the mass of the Milky Way by a calculation similar to the timing argument. This is done by assuming that the satellite is bound to the Milky Way which is the only body gravitationally influencing it and that the satellite is on a radial orbit, having passed its periapsis once and now moving away from the Milky Way and towards the apoapsis. This yielded a lower limit of $1.3 \times 10^{12}~\mathrm{M_{\astrosun}}$ for the mass of the Milky Way \citep{zaritsky1989velocities}.

Another option is to use the Large and Small Magellanic Clouds, a pair of well-studied Milky Way satellites. For example \citet{busha2011mass} use circular velocities within the Magellanic clouds and their distances and velocities relative to the Milky Way to estimate its mass. This is done by constructing the probability density functions (PDFs, see section \ref{sect:distribution-functions}) of these three measurements from a simulation subhalo catalogue and then using Bayes' theorem to find the Milky Way mass range that is most probable given the observations.

Naturally this type of analysis can suffer from effects not included in the model. If for example the system happens to be peculiar in some sense, its characteristics might cause effects that are not reflected in the PDFs of the selected variables. This is studied by e.g. \citet{gonzalez2013satellites} who measure the effects of local galaxy density and nearby clusters and the fact that the Magellanic Clouds are a fairly close pair and thus rare. They find that subhalo pairs similar to the Magellanic Clouds are rare in haloes resembling the Milky Way and including the subhalo pair criterion in analysis used in \citet{busha2011mass} brings the estimated mass down considerably, reducing the most probable mass by a factor of two \citep{gonzalez2013satellites}.

Including more constraints for the sample from which the PDF is determined naturally brings it closer to the one that would represent the Milky Way, but the range of allowed values has to be large enough to cover the uncertainty of measurements from the real system. Tightening the criteria also requires having a larger simulation to find a statistically representative sample of subhaloes. Naturally the Milky Way system can also be located in either of the tails of the probability distribution, which is true for all mass estimation methods based on statistics instead of physics.

Larger number of objects can also be used in analysis. For example the properties of satellite galaxies can be utilized in mass estimation as is done by e.g. \citet{sales2007satellites} who use the velocity dispersion of the satellites as an indicator of the virial velocity of the host halo and \citet{barber2013orbital} who use the ellipticity distribution of Milky Way satellites to constrain the range of likely masses. Other useful objects include individual high-velocity stars from which the galactic escape speed can be estimated \citep{piffl2014rave} and the orbits of stellar streams \citep{newberg2010orbit} among others.

The mass can also be determined based on a larger volume than is occupied by the two main galaxies of the Local Group and their satellites: for example \citet{penarrubia2014dynamical} and \citet{fattahi2016apostle} estimate the mass of the Local Group by studying the local Hubble flow. This is possible as the gravitational interaction of the surrounding galaxies with the Milky Way and M31 galaxies cause the surrounding galaxies to recede slower than ones surrounding empty regions of the Universe, and the strength of this effect can be translated to a mass estimate using e.g. Bayesian analysis. The velocity dispersion around the Hubble flow can also be used as is done by for example \citet{gonzalez2014mass}. The properties of the Hubble flow are also used in this thesis as one estimator of the Local Group mass.

\begin{figure}
   \centering
   \includegraphics[width=0.7\textwidth]{kuvat/wang2015-masses.png}
   \caption{$M_{200}$ masses of the Milky Way as obtained by different authors using various mass estimation methods, collected and plotted by \citet{wang2015estimating}. Error bars show the $1\sigma$ range where assuming Gaussian distribution was appropriate. Masses derived using similar methods are plotted in same colour.}
   \label{fig:wang-masses}
\end{figure}

Different papers present varying values for the masses of the Local Group and its galaxies, both due to the strengths and limitations of the methods used and differences in the used observed and simulated data. A collection of mass estimates for Milky Way is shown in figure \ref{fig:wang-masses} by \citet{wang2015estimating}. The masses are calculated by different authors using various mass estimation methods, some of which have been briefly introduced in this section. The plot clearly illustrates the fact that currently the masses of the Local Group galaxies have errors of over a factor of two.

Especially the timing argument stands out from the other methods. This is in agreement with a study of \citet{gonzalez2014mass} where timing argument masses were found to overestimate the true mass of a Local Group like system with small tangential velocity and low local overdensity by a mean factor of 1.6. This is due to the timing argument not being sensitive to the tangential velocity or environment of the system.

The other mass estimates also have considerable scatter, sometimes even within the same mass estimation method. At least some of this scatter can be a result of some estimates being more than ten years old and thus their methods and data possibly outdated, but the scatter also reflects the fact that the exact mass of the Local Group is currently a question without a definitive answer. Thus exploring new ways of estimating the mass and refining the old results is important.


\chapter{Simulations and Simulation Codes} \label{chapt:simulations}

Numerical simulations are a valuable tool in astrophysical research as they bypass many major restrictions characteristic to observational astronomy. For example the dark matter content of the Universe cannot be directly observed and following the evolution of a single object is impossible, with the exception of the most rapid events such as supernovae. Much can also be done using analytical models, but they are often valid only for a simplified problem or suffer from other limitations. For example the Zel'dovich approximation can be used to calculate the non-linear evolution of density perturbations in the Universe up to shell crossing, but after the structures have collapsed to sheets along one of their axes, the model ceases to be valid \citep{mo2010galaxy}.

The data used in this master's thesis also originates from a cosmological N\=/body simulation. This section shortly introduces first some fundamental operational principles of N\=/body simulations and then discusses specifics of the simulations from which the data used here originates from. Lastly, the extraction and properties of the resulting data set are discussed.


\section{N\=/Body Simulations}
N\=/body simulations are a type of computer simulations that follow a number of particles interacting with each other \citep{binney2008galactic}. They are often used in computational astrophysics as well as other fields of physics. For some applications, including for example simulating motions of planets in a planetary system, it is possible to have each body represented by a single simulation particle. This kind of simulations are run using collisional N\=/body codes \citep{binney2008galactic}. In cosmological simulations the number of simulated objects such as stars let alone dark matter particles is too great for them to be simulated as separate particles. In this case a collisionless N\=/body simulation is used, meaning that the simulated volume is approximated to contain a smooth density field, evolution of which is studied by following a number of particles representing the system \citep{binney2008galactic}. These particles do not correspond to any real structures but instead they can be thought to sample the probability density distribution (see section \ref{sect:distribution-functions} for more on probability distributions) of positions and velocities \citep{binney2008galactic}. In this thesis, only collisionless simulations are discussed.

The concept of both collisional and collisionless N\=/body simulations is simple: the current positions and velocities of the particles are known and a physical model is used to calculate how they should advance over a small period of time known as the time step \citep{binney2008galactic}. Simple dark matter only simulations might only handle gravitational interactions of the particles, but many simulation codes such as \textsc{gadget-3} \citep{springel2005cosmological} (see section \ref{sect:gadget} for more about \textsc{gadget} family simulation codes) and Enzo \citep{norman2007simulating} are also able to simulate hydrodynamics of baryonic matter and can include star formation, feedback and other baryonic processes.

As the particles of a collisionless system are not any real structures, e.g. stars, star clusters or galaxies, the situation in which two particles come near each other and thus feel strong forces resulting in large accelerations is problematic \citep{binney2008galactic}. This is because the underlying density distribution is smooth and thus such two-body scattering is unphysical. The problem can be alleviated by softening, i.e. modifying the gravitational force calculations so that for small interparticle distances the force diminishes \citep{binney2008galactic}. At large separations the force remains unchanged, and the distance within which the force differs from Newtonian gravity is called the softening length \citep{binney2008galactic}.

There are multiple ways to handle both the force calculations and updating the positions and velocities for the particles \citep{binney2008galactic}. For the force calculations, the most popular algorithms are based on either hierarchical trees or particle meshes, but for updating the positions, a wide variety of integrators have been developed for a range of different needs \citep{binney2008galactic}. The simulations discussed in this thesis are run using a modified version of \textsc{gadget-3} and thus use the TreePM method for force calculations accompanied by a leapfrog integrator. TreePM is a mix of a hierarchical tree for short-range forces and a particle mesh for long-range forces. The description of the particle mesh is omitted, but the \citet{barnes1986hierarchical} hierarchical tree algorithm is introduced in the following subsection, followed by a short description of the leapfrog integrator.

\subsection{Hierarchical Tree Algorithm} \label{sect:tree}
In many applications of computational astrophysics the desired number of particles in a simulation is too great to allow calculating interparticle forces by direct summation as its time complexity is $\mathcal{O}(n^2)$, where $n$ is the number of particles in the simulation. One of the alternatives is to organize the particles into a tree data structure, which allows distant particles residing close to each other to be approximated as a single more massive particle. This approach was first introduced by \citet{appel1985efficient} and \citet{barnes1986hierarchical}, the latter of which will be followed here.

Constructing the tree starts with setting the full simulation box as the root of the tree. This root cube is then subdivided into eight equally sized sub-cubes called cells. These eight cells are children of the root node. This starts a recursive process where each new cell is again divided into eight subcells recursively until each of the cells contains either no particles, one particle or eight subcells. When a new cell is created, the total mass of the enclosed particles and the location of the mass centre of the particles within it is stored as a pseudoparticle to allow easy calculation of the approximate gravitational effect the particles within a cell have on a distant particle.

\begin{figure}
    \centering
    \input{kuvat/tex/tree-box.pdf_tex}
    \caption{A two-dimensional example of particles (black dots) being assigned to cells using the Barnes-Hut algorithm \citep{barnes1986hierarchical}. The outermost square is the simulation box, i.e. the root of the tree, and the smaller squares with decreasing line thicknesses are its descendants. Note how each cell contains either one particle, no particles or four smaller subcells, each of which may be further divided into subcells of their own.}\label{fig:tree-box}
\end{figure}

\begin{figure}
    \centering
    \def\svgwidth{\columnwidth}
    \input{kuvat/tex/tree-solids.pdf_tex}
    \caption{Cells of Fig.\ \ref{fig:tree-box} shown as a tree, each level of subcells in Fig.\ \ref{fig:tree-box} traversed from left to right and top to bottom corresponding to each level of nodes from left to right. Cells containing one or more particles are shown as filled circles. Cells with no particles are not used in force calculations and thus they do not need to be stored in computer memory, but they are shown here as non-filled circles to emphasize the structure of the tree.}\label{fig:tree}
\end{figure}

To aid in understanding the process, a two-dimensional simplification of the simulation box divided into cells is shown in Fig.\ \ref{fig:tree-box}, together with the corresponding tree in Fig.\ \ref{fig:tree}. The thick outer line of Fig.\ \ref{fig:tree-box} is the simulation box, corresponding to the topmost node of the tree in Fig.\ \ref{fig:tree}. This root cell is first divided into four subcells (in contrast to the eight subcells in the three-dimensional case), which is shown with the second-thickest line dividing the simulation box into quarters in Fig.\ \ref{fig:tree-box}. These four cells are the four children of the root node in the tree. Each of the subcells contains more than one particle, so each subcell is again split into four quarters, each of them thus receiving four new child nodes. The newly-created quarters of quarters form the third level of the tree in Fig.\ \ref{fig:tree}.

Some of these new cells are empty or only have a single particle. For them the recursion halts and the nodes of the tree are leaves. The rest are again split and a new level is added to the tree, but as some cells were complete already the fourth level of the tree is not full. In this particular case only one cell requires division beyond the fourth level, producing the last four leaves of the tree on the fifth level.

Often, as is the case in this example, some of the leaf nodes contain no particles. In the case of a real simulation, these cells of course do not need to be saved as there is no information to store, but in this example case all are drawn to emphasize the regular structure of the tree. In the three-dimensional case, where each internal cell has eight subcells, the formed tree is known as an octree, whose two-dimensional analogue, the quadtree, is constructed in the previous example.

After constructing the tree, it can be utilized to speed up the force calculations. When calculating the gravitational acceleration felt by a single particle, the tree is traversed starting from the root. The ratio $l/D$ of the length of a side of the cell ($l$) and the distance from the particle to the pseudoparticle representing the centre of mass within the cell ($D$) is then calculated. If the ratio is smaller than a predefined accuracy parameter $\theta$ representing the opening angle of the cell, the cell is treated as if everything within it was replaced with the pseudoparticle having the combined mass of the cell. Otherwise the subcells of the cell are examined recursively in the same way until a small enough subcell is found or a leaf of the tree is reached, at which point the pseudoparticle and the real particle in the cell are equivalent and the force can be calculated with the maximum accuracy possible for the simulation.

Using the Barnes-Hut algorithm, the tree construction and the force calculations both have time complexity of $\mathcal{O}(n \log n)$. This is a significant improvement over the $\mathcal{O}(n^2)$ of the direct summation considering that the accuracy cost is fairly small: \citet{barnes1986hierarchical} report accuracy of about 1~\% for a single force evaluation when $\theta=1$ and the accuracy can be improved by either setting a smaller $\theta$ or including multipole moments in the pseudoparticles \citep{barnes1989error}. 

The algorithm is also straightforward to parallelize as different branches of the tree can be assigned to their own threads, though memory management has to be done carefully when forces outside the current branch of a thread are calculated \citep{binney2008galactic}. One way to circumvent the memory management issue is to use a particle mesh based integrator for long range forces as \textsc{gadget-3} does \citep{springel2005cosmological}. The algorithm can also handle problems where the range of densities in the simulation box is large, which is important for applications such as galaxy mergers and cosmological simulations. This, together with their competitive time complexity, makes tree based codes an appealing tool for many astrophysical simulations \citep{binney2008galactic}.


\subsection{Leapfrog Integrator} \label{sect:leapfrog}
A number of different integrators suitable for astronomical and astrophysical applications have been developed  for solving different problems \citep{binney2008galactic}. No integrator is optimal for every task and thus factors such as the integration time, amount of memory available per particle, smoothness of the potential and the cost of a single gravitational field evaluation should be considered \citep{binney2008galactic}. One of the integrators that are well suited for cosmological simulations is the leapfrog integrator, which is also used by \textsc{gadget-3} simulation code \citep{springel2005cosmological}.

When a fixed time step is used, the leapfrog integrator conserves the energy of the system and is time-reversible \citep{binney2008galactic}. While a variable time step is possible and often used, it requires some modifications to the algorithm, presented in e.g. \citet{springel2005cosmological} and implemented in the \textsc{gadget-3} simulation code among others. Other benefits of the integrator are its second order accuracy and the fact that it does not require excessive amounts of memory per particle as only the current state of the system is needed in calculating the next step \citep{binney2008galactic}. It is also second-order accurate and symplectic \citep{binney2008galactic}. Due to symplecticity of the integrator, numerical dissipation of energy does not happen and thus the integrator rivals some integrators with higher-order accuracy such as the fourth order Runge-Kutta when the number of simulated time steps is large \citep{binney2008galactic}.

\begin{figure}
\centering
\includegraphics{kuvat/tex/leapfrog.pdf}
\caption{Timesteps taken by the leapfrog algorithm, with positions ($x$) updated as indicated with the lower arrows and velocities ($v$) as indicated by the upper arrows. It is not important for the algorithm which of the two is chosen to step through the integer times, the choice of it being $x$ in this figure is arbitrary. The dashed short arrows depict the half-steps that are needed when a synchronized output is desired.}\label{fig:leapfrog}
\end{figure}

Timestepping with the leapfrog integrator consists of two phases, the drift and the kick steps, which are alternated with a half-step offset as shown in Fig.\ \ref{fig:leapfrog} \citep{binney2008galactic}. During the kick step, the momenta of the particles are updated and during the drift step the positions of the particles are changed according to the momenta calculated during the kick step \citep{binney2008galactic}. When synchronized output of both positions and velocities is desired, a half-timestep advance or backtrack to one of the variables is needed to determine their values at the same point in time. This kind of synchronization steps at start and end of the integration are indicated in Fig.\ \ref{fig:leapfrog} with dashed arrows.


\subsection{Halo Finding} \label{sect:halofinding}
The data output by a cosmological simulation consists of individual particles tracing the underlying density field. To make comparisons with the real Universe, a way of matching structures in the simulation to observable objects is needed. In a dark matter only simulation no structure would obviously be directly observable but luckily many properties of dark matter haloes from simulations can be compared with the estimated dark matter haloes of observed galaxies. Making such comparisons naturally requires structures to be identified first, which makes structure finding a key step in the data analysis for many astrophysical and cosmological applications \citep{knebe2013structure}.

A number of different halo finders designed to read N\=/body data and extract locally overdense gravitationally bound systems have been developed to suit different needs, most based either on locating density peaks or collecting and linking together particles located close to each other based on some metric \citep{knebe2013structure}. Two methods, the friends-of-friends (\textsc{fof}) and \textsc{subfind} algorithms, are discussed here. Both algorithms were developed separately, \textsc{fof} by \citet{davis1985evolution} and \textsc{subfind} by \citet{springel2001populating}, but they work well when used together by inputting \textsc{fof} results to \textsc{subfind} as a starting point for the subhalo finding \citep{springel2005cosmological}.

The \textsc{fof} algorithm is simple and based purely on the spatial separation of the particles: pairs of particles residing closer to each other than a chosen threshold distance called the linking length are marked to reside within the same group by linking them together \citep{davis1985evolution}. When all particles have been processed, each distinct subset of particles linked to each other is defined to be a group \citep{davis1985evolution}. Figure \ref{fig:fof} presents an example of a set of particles grouped using the \textsc{fof} algorithm. Depending on the specifics of the data and its intended usage, one might want to discard the smallest groups with only a few particles. This is because they are more likely than bigger groups to be just realizations of the random noise instead of actual physical structures, but also as they might not be massive enough to represent the structures that are being studied.

\begin{figure}
    \centering
    \input{kuvat/tex/fof.pdf_tex}
    \caption{Friends-of-friends groups found from a mock data set using the length of the indicator in the upper right area of the illustration as the linking length. Particles are depicted as black dots and the links connecting particles within groups are shown with black lines.}\label{fig:fof}
\end{figure}

The algorithm is made appealing by its simplicity, small number of free parameters and the ability to find structures of arbitrary shape \citep{davis1985evolution}. However, it is prone to link unrelated structures via thin bridges that might consist of only a single chain of particles. This behaviour can be seen in Fig.\ \ref{fig:fof}: the leftmost group has two dense areas that could be cores of two separate groups connected by a single-particle bridge. Removal or even suitable movement of this connector particle would result in the group separating into two distinct groups. The two existing big groups in the figure also stretch quite close to each other in their lower parts: moving or adding one particle between the chains of particles protruding towards each other could result in the two groups merging. In addition, the algorithm as is cannot be used to detect substructure within larger objects \citep{springel2001populating}. If desired, this can be done by some modified versions of the algorithm such as hierarchical friends-of-friends algorithm by \citet{Gottlober1999halo}.

In contrast to groups found by \textsc{fof}, the \textsc{subfind} algorithm has been developed to extract physically well-defined subhaloes that are self-bound and locally overdense from a given parent group. \textsc{subfind} can work with an arbitrary parent group, but \textsc{fof} groups are well-suited for parent groups. The algorithm is simple and with appropriately long linking length the \textsc{fof} groups are unlikely to split a physical structure between \textsc{fof} groups \citep{springel2001populating}. It is still possible that independent structures end up in the same \textsc{fof} group, but \textsc{subfind} is able to distinguish them as two separate objects.

Unlike \textsc{fof}, \textsc{subfind} uses a local density estimate instead of individual particle pairs. It labels all locally overdense regions enclosed by an isodensity contour traversing a saddle point as substructure candidates \citep{springel2001populating}. This is done by lowering an imaginary density threshold through the range of the density field: particles surrounding a common local density maximum are assigned to a common substructure candidate until two separate substructure regions join in a saddle point of the potential \citep{springel2001populating}. When a saddle point is reached, the two substructure candidates it connects are both stored individually to be processed further and the saddle point particle is added to a new substructure candidate containing the particles from both of the smaller candidates \citep{springel2001populating}. Thus the algorithm is able to identify a hierarchy of substructures within each other \citep{springel2001populating}.

A two-dimensional example of this substructure candidate identification is shown in Fig.\ \ref{fig:subfind}. The algorithm starts from the particle in the most dense area of the simulation. At that point, the particle has no neighbours that would be at a higher density than the particle itself, and thus it becomes the first particle of a substructure candidate. All of the particles belonging to this substructure candidate are marked with striping in the figure. The algorithm iterates through the particles in order of decreasing density, always finding that the next particle has one neighbouring particle in higher density area than the particle itself and thus adding it to the same dashed group, until the second local density maximum is reached. At that point, a new substructure candidate, marked with a chequerboard pattern, is created. The following particles are assigned to their respective substructure candidates based on their single higher potential neighbour until a saddle point between the two is reached, at which point the state of the substructure candidates is shown in Fig.\ \ref{fig:subfind}. Then the current substructure candidates are complete and will be saved. Next the particles belonging to the two structures can be joined by the saddle point particle to form a new bigger substructure candidate.

\begin{figure}
    \centering
    \input{kuvat/tex/subfind.pdf_tex}
    \caption{Intermediate stage of the \textsc{subfind} algorithm, shown just before it reaches the first saddle point. Circles depict simulation particles and the line the underlying density field. Striped and checkered circles both belong to their own subhalo candidates whereas the white ones are not yet labelled.}\label{fig:subfind}
\end{figure}

Unfortunately, now some particles are assigned to multiple substructure candidates and it is not clear that all particles within one substructure candidate are part of an actual physical structure \citep{springel2001populating}. Is is very much possible that some particles are just passing by and if the same particles were re-examined at a later time, they would no longer be anywhere near the structure they were supposed to belong to. Hence the next step in the analysis is to eliminate unbound particles from each group by iteratively removing particles with positive total energy until all of the remaining particles are bound to each other by their mutual gravitational attraction \citep{springel2001populating}. At this stage, each particle is labelled only based on the smallest structure it resides in, which solves the problem of a single particle belonging to multiple structures \citep{springel2001populating}.

After the iterative pruning stage some substructure candidates can vanish completely or be left with very few members. These substructure candidates with less than some minimum number of bound particles can be discarded \citep{springel2001populating}. The structure candidates surviving the pruning can then be considered to represent physical structures and are labelled as subhaloes \citep{springel2001populating}. In this thesis, all of the analysis is based on catalogues containing such subhaloes. 

\section{Simulation Runs} \label{sect:simruns}
All results presented in this thesis are based on cosmological simulations run by Till Sawala. They are cosmological zoom-in simulations, meaning that a higher resolution was used in regions of interest than in the rest of the simulation box. For these simulations, this was done by identifying regions relevant for the study from an output of a low-resolution simulation at $z=0$, after which the resolution of these volumes was increased and the new simulations were run. In this case, the interesting regions were defined as the surroundings of a pair of dark matter haloes resembling ones of the Milky Way and Andromeda galaxies, the two main galaxies of the Local Group.

The simulations were dark matter only simulations, meaning that gravity is the only modelled interaction and all matter behaves as dark matter. For my analysis only the dark matter halo information was needed, so instead of using the full simulation output, the analysis was conducted on subhalo catalogues. The subhalo information was extracted by Till Sawala as described in section \ref{sect:halofinding}. The resulting data set consists of subhalo catalogues from 448 zoom simulations, each centred on one region resembling the Local Group in the low-resolution simulation. The following sections shortly introduce the code used to run the simulation and the parameters of the simulations used for both stages of the zoom-in simulations.


\subsection{Modified \textsc{gadget-3}} \label{sect:gadget}
This master's thesis is based on data obtained from simulations run using a simulation code called \textsc{gadget-3}, an update to the \textsc{gadget-2} which is described by \citet{springel2005cosmological}. The code is the same as used in the \textsc{eagle} project, so detailed descriptions of the changes between the versions can be found in \citet{schaye2015eagle}. However, the changes mostly affect the handling of baryonic matter so understanding the basic \textsc{gadget-2} gives a good basis for understanding the simulations \citep{schaye2015eagle}.

\textsc{gadget} uses a TreePM algorithm to compute forces, meaning that short-range forces are calculated using a tree method as described in section \ref{sect:tree} and a particle mesh is employed for long-range forces \citep{springel2005cosmological}. As the code is parallel, the normal tree-construction algorithm is problematic in regard to splitting the nodes of the octree between tasks \citep{springel2005cosmological}. To ensure a balanced workload amongst all processors, the particles are split between tasks by constructing a space-filling fractal curve known as Peano-Hilbert curve, and splitting it into segments with approximately equal number of particles on each segment \citep{springel2005cosmological}. The properties of the curve ensure that particles close to each other in space are usually also near each other along the curve, which means that close-range forces can frequently be calculated without the need to access memory belonging to other processors \citep{springel2005cosmological}. In regions where an octree constructed from particles belonging to a processor should contain particles assigned to other processors, a pseudoparticle resembling in principle the pseudoparticles used when calculating forces for cells where $l/D < \theta$ is inserted, instead of the full particle information \citep{springel2005cosmological}.

Updating the positions of the particles is done using the leapfrog integrator described in section \ref{sect:leapfrog}, but the integrator is modified to allow using variable time step lengths \citep{springel2005cosmological}. These modifications are important, as in cosmological simulations it is not sensible to use the same time step in all parts of the simulation as there are both high-density regions and sparse void areas, first of which requiring a time step so small that a lot of computational time is wasted while integrating the latter with more detail than needed.

\subsection{Parent Simulation}
The first step in this kind of a zoom-in simulation is to construct initial conditions and run a low-resolution box. A large box with a comoving side length of 542.16~Mpc/$h$, $h$ being the reduced Hubble parameter defined in equation \ref{reducedhubble}, was used to ensure that the box is large enough to contain a sample of more than a hundred Local Group analogues and that the structure of the Universe is represented on all relevant scales in the simulation. At $z = 0$ the simulation volume is $800^3$~Mpc$^3$ in physical units.

The volume contained about $1.3 \times 10^{11}$ dark matter particles, each with a mass of about $1.6 \times 10^8~\mathrm{M_{\astrosun}}$. This determines the mass resolution of the simulation: objects smaller than the mass of a single particle are not seen at all and reliable analysis cannot be made of objects that are made of fewer than 20--40 particles, the minimum number depending on the used halo finder \citep{knebe2011haloes}. This means that for example distribution of spiral and elliptical galaxies can be studied but all dwarf galaxies are not resolved. The spatial resolution of the simulation is determined by the softening length: the distance within which the gravitational interaction calculations are modified in order to remove singularities and avoid extreme accelerations. The properties of the parent simulation are shown in Table \ref{tab:parentBox}.

In the parent simulation a comoving softening length of 2.3~kpc/$h$ was used, meaning that the softening length grows with the simulation box. For regions expanding with the simulated universe this is advantageous, but for e.g. virialized structures with no spatial growth this is an unwanted behaviour. For simulations aimed at studying objects with specific mass and thus specific redshift when they collapse, it is possible to specify a redshift after which the softening length is kept constant in physical units, but for simulations exploring a range of different objects this is not possible.

\begin{table}
	\centering
	\begin{tabular}{| l | l |}
		\hline
		Box size & $542.16^3$ Mpc$^3$/h$^3$ \\ \hline
		Number of particles & $5040^3 \approx 1.3 \times 10^{11}$ \\ \hline
		Particle mass & $1.6 \times 10^8~\mathrm{M_{\astrosun}}$ \\ \hline
		Comoving softening length & 2.3 kpc/h \\ \hline
	\end{tabular}
	\caption{Properties of the parent simulation.} \label{tab:parentBox}
\end{table}

The cosmology of the simulation is a standard $\Lambda$CDM cosmology with cosmological parameters from the Planck space telescope measurements \citep{planck2014resultsXVI}. The most important parameters are shown in Table \ref{tab:cosmopars}. For a dark matter only simulation the $\Omega_b$, corresponding to the baryonic component of the Universe, is treated as dark matter. The $\sigma_8$ parameter shown in the table is defined as the density variance in the primordial power spectrum at scales of 8~Mpc/$h$.

\begin{table}
	\centering
	\begin{tabular}{| l | l |}
		\hline
		$H_0$ & 67.77 km/s/Mpc\\ \hline
		$\Omega_m$ & 0.307 \\ \hline
		$\Omega_b$ & 0.0455 \\ \hline
		$\Omega_\Lambda$ & 0.693 \\ \hline
		$\sigma_8$ & 0.8288 \\ \hline
	\end{tabular}
	\caption{Most important cosmological parameters of the simulations.} \label{tab:cosmopars}
\end{table}

The low-resolution simulation was then run and when it reached redshift $z=0$, subhaloes were identified using the \textsc{fof} and \textsc{subfind} algorithms as described in section \ref{sect:halofinding}. From the resulting subhalo catalogue, subhalo pairs resembling the Local Group were identified. In the parent simulation, the following properties were required from a pair of subhaloes:
\begin{itemize}
\item Mutual distance of the two subhaloes lies between 700 and 840 kpc.
\item The subhaloes are moving towards each other with a radial velocity between 100 and 170 km/s.
\item Tangential velocity of the subhaloes is smaller than 50 km/s.
\item The subhaloes have masses between $5 \times 10^{11}$ and $5 \times 10^{12}~\mathrm{M}_{\astrosun}$.
\item The two subhaloes are the most massive ones within 2~Mpc.
\end{itemize}
This criterion produced 448 hits in the parent box and a zoom simulation was created for each of these Local Group analogues.


\subsection{Zoom Simulations} \label{sect:zooms}
The zoom simulations simulated the same box as the parent simulation, but new initial conditions with enhanced resolution around Local Group analogues and coarser resolution elsewhere were created for each pair of objects. The large scale structure and evolution in these zooms is similar to the parent simulation, but due to increased resolution near the Local Group analogues they can be studied in more detail. The change of resolution was implemented by grouping the dark matter particles to three categories ranging from the least massive type 1 particles to the most massive type 3 particles.

Whereas type 1 particles have a fixed mass in each simulation, type 2 and 3 particles have masses ranging from type 2 particles with only slightly larger mass than the type 1 particles to type 3 particles with masses comparable to a million type 1 particles. Masses of type 2 and 3 particles vary, but each particle type has its own softening length with type 1 particles having the smallest and type 3 particles the largest softening lengths. While the number and mass of type 1 particles varied slightly from simulation to simulation, ranging from $3.7 \times 10^7$ to $4.7 \times 10^7~\mathrm{M}_{\astrosun}$ per particle, every simulation used the same softening length of 1.0~kpc/$h$ for them. The \textsc{gadget-3} simulation code is also capable of handling baryonic matter such as gas and stars, but the simulations used in this thesis were dark matter only simulations so no other particles than the three types of dark matter particles were present. 

The type 1 particles were assigned to regions identified to end up in or pass by the central Local Group analogue, surrounded by a region of type 2 particles. The rest of the box was filled with type 3 particles. Where two different types of particles interact, the larger of the two softening lengths is applied for the interaction in order to reduce unphysical two-body scatterings. The number and mass of type 1 particles in a simulation varies based on the extent and density of the regions forming the Local Group analogue. A histogram showing the numbers of high resolution particles in each simulation is presented in Fig.\ \ref{fig:type1hist}. The number of type 1 particles ranges from about $1.6 \times 10^6$  to nearly $30 \times 10^6$ with lower values being most typical: most of the simulations have less than $7 \times 10^6$ high resolution particles. %1,618,600 to 29,791,000

\begin{figure}
    \centering
    \input{kuvat/tex/type1particles.pdf_tex}
    \caption{The number of simulations with given number of type 1 particles. Values near the lower end of the distribution are most common, but many simulations have more than $2 \times 10^7$ particles.}\label{fig:type1hist}
\end{figure}


At $z=0$ a subhalo catalogue was created for each of the zoom simulations. This was done using a combination of \textsc{fof} and \textsc{subfind} algorithms as was done with the parent simulation. A lower limit of 20 particles in each subhalo was used and smaller structures were ignored. Instead of the full particle data, all analysis conducted for this thesis was done using these subhalo catalogues. Initially there were 448 of these catalogues in the data set. Simulations were run and the halo catalogues generated by Till Sawala but all further analysis presented is done as a part of this thesis.

The 448 simulation runs did not easily translate to 448 Local Group analogues. First of all, picking the Local Group analogues from the parent simulation had resulted in most subhalo pairs being picked twice. Two zoom simulations were then run for each of these pairs resulting essentially the same simulation being run twice, with the second run providing no new information. Removing one of each of these pairs from the data set resulted in a total of 250 unique zoom simulations.

The analysis started with identifying Local Group analogues in all of the catalogues. This was done by identifying pairs of subhaloes that are made of type 1 particles and fulfil the following criteria:
\begin{itemize}
\item Distances between centres of potential are in range 0.6--1.0~Mpc.
\item The subhaloes are moving towards each other with radial velocity no larger than 170 km/s.
\item Tangential velocity of the subhaloes is smaller than 50~km/s.
\item Masses of individual subhaloes lie between $4 \times 10^{11}$ and $5 \times 10^{12} \mathrm{M_{\astrosun}}$.
\item The two subhaloes are the most massive ones within 2~Mpc.
\end{itemize}
The criteria were chosen so that the systems resemble the Local Group but the allowed ranges are broad enough to accommodate uncertainties in the observed values and to produce a large enough data set for statistical analysis. The ranges are somewhat wider than with the parent simulation, which reduces the probability of a Local Group analogue that was found in the parent simulation no longer fulfilling the criteria after resimulation. This is important for statistical analysis as with the parent simulation criteria only 58 Local Group analogues are found in the zooms, compared to the 199 simulations with Local Group analogues found with these criteria.

Even with the loose criteria, the sample size was reduced from the number of unique simulations as some resimulations did not contain a pair of subhaloes fulfilling the Local Group criteria. This can happen as the zooms are not identical to the parent simulation and especially Local Group analogues with some value near the edge of the allowed range can end up no longer meeting all criteria after resimulation. Some of the simulations also contained more than one Local Group analogue, in which case the pair with largest distance to the nearest type 2 or 3 particle was chosen for analysis.

As the analysis conducted for this thesis uses not only the properties of a Local Group analogue but also its surroundings, the extent of high resolution regions is important. As can be seen in Fig.\ \ref{fig:uncontaminatedDistances}, most of the simulations contain only type 1 particles up to 4~Mpc from the Local Group analogue. For larger distances, the number of simulations uncontaminated by type 2 or 3 particles decreases until after 8~Mpc only few simulations have only type 1 particles. To ensure sufficient accuracy of the Hubble flow analysis, all of the Local Group analogues used in the analysis were required not to have any type 2 or 3 particles within 5~Mpc of the mass centre of the subhalo pair. This resulted in a sample of 120 subhalo catalogues.


\begin{figure}
    \centering
    \includegraphics{kuvat/uncontaminatedDistances.pdf}
    \caption{Number of simulations that have only high resolution type 1 particles at least up to a given distance as a function of distance.}\label{fig:uncontaminatedDistances}
\end{figure}

\chapter{Mathematical and Statistical Methods} \label{chapt:mathematical_and_statistical}
The analysis presented in chapter~\ref{chapt:results} relies strongly on many tools of mathematics and statistics. This chapter covers the background needed to understand and motivate the use of these methods, starting from the basic concepts of statistics in section~\ref{sect:statistical-background}. Sections~\ref{sect:regression} and \ref{sect:pca} introduce linear regression and principal component analysis used in constructing a model for predicting the Local Group mass. Then, in sections~\ref{sect:cross-validation} and \ref{sect:statistical_tests}, tools used for evaluating the accuracy and significance of results are introduced. Finally, section~\ref{sect:cluster-analysis} covers the clustering algorithm used to find the subhaloes located near each other on the sky.


\section{Statistical Background} \label{sect:statistical-background}
Precision of the used equipment limits accuracy of all data gathered from physical experiments, simulations or observations. Thus assessing whether for example measurements support a model requires using statistical methods. The main methods relevant for this thesis are covered here. The methods are shortly introduced in the following sections together with basic statistical concepts that are necessary to understand the methods. 


\subsection{Hypothesis Testing and p-values}
A common situation in scientific research is that one has to compare a sample of data points to either a model or another sample in order to derive a conclusion from the dataset. In statistics, this is known as hypothesis testing \citep{wall2003practical}. For example, this can mean testing hypotheses such as ``these two variables are not correlated'' or ``this sample is from a population with a mean of 1.0''. Next paragraphs shortly introduce the basic concept of hypothesis testing and methods that can used to test the hypothesis ``these two samples are drawn from the same distribution'' following the approach of \citet{bohm2010introduction} and \citet{wall2003practical}.

The process of hypothesis testing as described by \citet{bohm2010introduction} begins with forming a null hypothesis $H_0$ that is formatted such that the aim for the next steps is to either reject it or deduce that it cannot be rejected with a chosen significance level. The negation of the null hypothesis is often called research hypothesis or alternative hypothesis and denoted as $H_1$. For example, this can lead to $H_0$ ``this dataset is sampled from a normal distribution'' and $H_1$ ``this dataset is not sampled from a normal distribution''. Choosing the hypotheses in this manner is done because often the research hypothesis is difficult to define otherwise.

After setting the hypothesis one must choose an appropriate test statistic. Ideally this is chosen such that the difference between cases $H_0$ and $H_1$ is as large as possible. Then one must choose 
the significance level $\alpha$ which corresponds to the probability of rejecting $H_0$ in the case where $H_0$ actually is true. This fixes the critical region i.e.\ the values of test statistic that lead to the rejection of the $H_0$. This kind of probability based decision making is always prone to error. Rejecting $H_0$ despite it being true is known as error of the first kind. However, this is not the only kind of error possible. It might also occur that $H_0$ is false but it does not get rejected, which is known as error of the second kind.

There is no one optimal way of choosing $\alpha$, but one should try to find a balance between false rejections of the null hypothesis and not being able to reject the null hypothesis based on the dataset even if it is false. When sample size (often denoted $N$) is large, smaller values of $\alpha$ can often be used as decisions get more accurate when $N$ grows. For example in all hypothesis testing done in this thesis the value of $\alpha$ is 0.01 and $N=119$.

It is crucial not to look at the test results before choosing $\alpha$ in order to avoid intentional or unintentional fiddling with the data or changing the criterion of acceptance or refusal of the null hypothesis in order to produce desired results. Only after these steps should the test statistic be calculated. If the test statistic falls within the critical region, $H_0$ should be rejected and otherwise stated that $H_0$ cannot be rejected at this significance level. The critical values for different test statistics are widely found in statistical textbooks and collections of statistical tables or they can be calculated using statistical or scientific libraries available for many programming languages.

Despite statistical tests having a binary outcome ``$H_0$ rejected'' or ``$H_0$ not rejected'', a continuous output is often desired. This is what p-values are used for. The name p-value hints towards probability, but despite its name p-value is not equal to the probability that the null hypothesis is true. These p-values are functions of a test statistic and the p-value for a certain value $t_{obs}$ of a test statistic gives the probability that under the condition that $H_0$ is true, the value of a test statistic for a randomly drawn sample is at least as extreme as $t_{obs}$. Therefore, if the p-value is smaller than $\alpha$, $H_0$ is to be rejected.

\subsection{Distribution Functions} \label{sect:distribution-functions}
Some statistical tests such as the Kolmogorov-Smirnov test and the Anderson-Darling test make use of distribution functions such as cumulative density function (CDF) and empirical distribution function (EDF) in determining the distribution from which a sample is drawn. To understand CDF and EDF, one must first be familiar with probability density function (PDF).
As the name suggests, a PDF is a function the value of which at some point $x$ represents the likelihood that the value of the random variable would equal $x$. This is often denoted as $f(x)$. Naturally for continuous functions the probability of drawing any single value from the distribution is zero, so these values should be interpreted as depicting relative likelihoods of different values \citep{htk}. For example if $f(a)=0.3$ and $f(b)=0.6$ we can say that drawing value $b$ is twice as likely as drawing value $a$.

Another way to use the PDF is to integrate it over a semi-closed interval from negative infinity to some value $a$ to obtain the CDF, often denoted with $F(x)$:
\begin{equation}
	F(x) = \int_{-\infty}^x f(x') \,dx'.
\end{equation}	
This gives the probability of a random value drawn from the distribution having value that is smaller than $x$ \citep{htk}. The relation between the PDF and the CDF is illustrated in Fig.\ \ref{fig:cdf}, where PDFs and CDFs are shown for three different distributions. It is easy to see the integral relation between PDF and CDF and how wider distributions have wider CDFs.

\begin{figure}
   \centering
   \includegraphics[width=0.9\textwidth]{kuvat/cdf.pdf}
   \caption{Cumulative distribution functions (top panel) and probability distribution functions (bottom panel) for three random samples drawn from different distributions, two of which are normal and one is uniform. Parameters $\mu$ and $\sigma$ of the normal distribution describe the mean and the spread of the distribution respectively, large values of $\sigma$ corresponding to a wide distribution.}
   \label{fig:cdf}

\end{figure}

Both the PDF and the CDF apply either to whole populations or to the sets of all possible outcomes of a measurement. In reality the sample is almost always smaller than this. Therefore one cannot measure the actual CDF. Nevertheless, it is possible to calculate a similar measure of how big a fraction of measurements falls under a given value. This empirical counterpart of the CDF is known as empirical distribution function (EDF), often denoted $\hat F(x)$, and for a dataset $X_1, X_2,\,..., X_n$ containing $n$ samples it is defined to be
\begin{equation}
	\hat F(x) = \frac{1}{n}\sum_{i=1}^n I[X_i \leq x]
\end{equation}
where $I$ is the indicator function, value of which is 1 if the condition in brackets is true, otherwise 0 \citep{feigelson2012modern}.

Due to the EDF being a result of random sampling, it may deviate from the underlying CDF considerably as can be seen by comparing CDFs in Fig.\ \ref{fig:cdf} and corresponding EDFs in Fig.\ \ref{fig:edf}. This example is somewhat exaggerated with its $N$=35 as the actual dataset used in this thesis has $N$>100, but reducing the sample size makes seeing the effects of random sampling easier. The latter figure also has EDFs corresponding to two random samples drawn from the distribution of the orange curve in the first figure to further illustrate the differences that can arise from random sampling. This randomness also makes determining whether two samples are drawn from the same distribution difficult.

\begin{figure}
   \centering
   \includegraphics[width=0.9\textwidth]{kuvat/edf.pdf}
   \caption{Empirical distribution function for four random samples ($N$=35) drawn from the same distributions as in Fig.\ \ref{fig:cdf}. Note that both the orange and the violet data are drawn from the same distribution.}
   \label{fig:edf}
\end{figure}


\section{Linear Regression} \label{sect:regression}
Regression analysis is a set of statistical analysis processes that are used to estimate functional relationships between a response variable (denoted with $y$) and one or more predictor variables (denoted with $x$ in case of single predictor or $x_1,\ x_2,\ \dots,\ x_i$ if there are multiple predictor variables) \citep{feigelson2012modern}. In this section, both simple regression, where there is only one response variable, and multiple linear regression, where there are more than one predictor variables, are introduced briefly. The models also contain an $\varepsilon$ term that represents the scatter of measured points around the fit. One of the models used is the linear regression model, which can be used to fit any relationship where the response variable is a linear function of the model parameters \citep{montgomery2012introduction}. In addition to the widely known and used models where the relationship is a straight line, such as $y = \beta_0 x + \varepsilon$, 
all models where relationship is linear in unknown parameters $\beta_i$ are linear \citep{montgomery2012introduction}. Thus for example $y = \beta_0 x^2 + \varepsilon$ and $y = \beta_0 e^x + \beta_1 \tan{x} + \varepsilon$ are linear models. On the other hand, all models where the relationship is not linear, for example $y = x^{\beta_0} + \varepsilon$ and $y = \beta_0 x + \cos{(\beta_1 x)} + \varepsilon$, are nonlinear.

Simple linear regression is a model with a single predictor variable and a single response variable with a straight line relationship, i.e.
\begin{equation}
	y = \beta_0 + \beta_1 x + \varepsilon
\end{equation}
where parameter $\beta_0$ represents the $y$ axis intercept of the line and $\beta_1$ is the slope of the line \citep{montgomery2012introduction}. The parameters can be estimated using method of least squares, i.e. choosing parameter values that minimize the sum of squared differences between the data points and the fitted line \citep{montgomery2012introduction}. 

\begin{figure}
    \centering
    \input{kuvat/tex/OLS.pdf_tex}
    \caption{Ordinary least squares fit (black line) to a data set (black circles) with the response variable on the vertical axis. The fitted line minimizes the sum of squares of vertical distances between the data points and the fitted line, shown in grey.}\label{fig:OLS}
\end{figure}

The best-known method of minimizing the sum of squared error is the ordinary least squares (OLS) estimator. The OLS method uses differences in the response variable as shown in Fig.\ \ref{fig:OLS} and thus the minimized sum is
\begin{equation}
	\sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)
\end{equation}
where $x_1$ and $y_i$ are single values of the measured quantities \citep{feigelson2012modern}. This approach requires that the values of the predictor variable are known exactly without error and all uncertainty is in the values of the response variable \citep{feigelson2012modern}. In situations where this assumption is not valid, results acquired using OLS may be counter-intuitive. This can be seen for example in Fig.\ \ref{fig:OLSproblem} where OLS is used to calculate two linear fits: one where $x$ is used and predictor variable and $y$ as response variable and another where $y$ is the predictor and $x$ the response. As the minimized distance is different, the fitted lines also differ.

\begin{figure}
    \centering
    \input{kuvat/tex/OLSproblem.pdf_tex}
    \caption{Ordinary least squares fits using either vertical or horizontal distances, both resulting in different fit. In order to avoid such ambiguity, ordinary least squares should only be used when the predictor variable is measured exactly.}\label{fig:OLSproblem}
\end{figure}

When dividing the variables to the predictor variable with no error and a response variable with possible measurement error is not a justifiable choice, OLS should not be used. One alternative for OLS is total least squares (TLS, also known as orthogonal least squares in some sources such as \citet{feigelson2012modern}) \citep{markovsky2007overview}.  The major difference between OLS and TLS is that instead of vertical distance, the minimized squared distance is measured between a point and its projection to the fitted line, thus providing minimum of the sum of the squared orthogonal distances from the line \citep{feigelson2012modern}. These minimized distances are shown in Fig.\ \ref{fig:TLS}. Unfortunately, as the orthogonal distance is calculated using the Pythagorean theorem, summing the distances along the two axes requires that both are measured in same units. If they are not but TLS is still used, the variables have to be transformed to same units by e.g. standardization yielding dimensionless variables for all axes.

\begin{figure}
    \centering
    \input{kuvat/tex/TLS.pdf_tex}
    \caption{Total least squares minimizes the sum of squares of orthogonal distances of the data points from the fitted line. Distances between the data points (black circles) and the fitted line (black line) are shown in grey.}\label{fig:TLS}
\end{figure}

Multiple linear regression is analogous to simple linear regression in other ways, but more than one predictor variables are used \citep{xin2009linear}. Thus the relationship of the response and predictor variables can be expressed as
\begin{equation}
	y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \varepsilon
\end{equation}
where each of the predictor variables $x_n$ has its own regression coefficient $\beta_n$, the value of which is chosen to best model the data. Each of these parameters describes how the response value changes if the corresponding predictor changes while all other predictors remain constant \citep{montgomery2012introduction}.

Least squares estimation and many other methods that can be used with single predictor are also applicable for multiple linear regression, but with multiple linear regression the properties of the predictor variable set can greatly affect the accuracy of the predictions \citep{montgomery2012introduction}. One such situation is when some of the predictor variables are correlated with each other, known as multicollinearity \citep{montgomery2012introduction}. In this case, the values of the regression coefficients tend to be have abnormally large absolute variables and they can be very sensitive to small changes of the values in the predictor variables \citep{montgomery2012introduction}.

Ideally the predictor variables are all orthogonal, but this is not always the case. 
The quality of the prediction usually does not suffer excessively from weak correlations, but using data sets with strong correlations often requires utilizing some approach to lessen the effects of multicollinearity \citep{montgomery2012introduction}. Depending on what is causing the multicollinearity, this can mean e.g. discarding some variables from the analysis or collecting more data \citep{montgomery2012introduction}. However, often increasing the size of the data set is not possible, and in many cases discarding variables does not produce the best possible results. One way of alleviating the effects of multicollinearity while retaining all variables is to first transform the data set into a new set of variables that are orthogonal, known as the principal components of the data set, and then constructing the regression model using these principal components as predictor variables \citep{montgomery2012introduction}. The extraction process and the properties of the principal components of a data set are described in the following section.


\section{Principal Component Analysis} \label{sect:pca}
Principal component analysis (PCA) is a statistical procedure first introduced by \citet{pearson1901lines} to aid physical, statistical and biological investigations where fitting a line or a plane to n-dimensional dataset is desired. It is widely used in e.g. image compression, face recognition and many fields of machine learning \citep{smith2002tutorial, alpaydin2014introduction}. When performing PCA, one transforms a data set to a new set of uncorrelated variables i.e.\ ones represented by orthogonal basis vectors. These variables are called principal components (PCs) \citep{jolliffe2002principal}. In addition to eliminating multicollinearity in the data set, this approach can also be used to solve the problem of sometimes arbitrary choice of division of the data into predictor and response variables introduced in the previous section \citep{pearson1901lines}. 

PCA can be used to both reduce and interpret data \citep{johnson2007applied}. Often PCA alone does not produce the desired result, but instead PCs are used as a starting point for other analysis methods such as factor analysis or regression analysis \citep{johnson2007applied}. In this work, the first principal components of a data set are used as predictor variables, both reducing the dimensionality of the problem and ensuring that the predictors are orthogonal. The steps needed for this are shortly described in the following subsections. In addition to these applications, PCA is also used in image compression, face recognition and other fields \citep{smith2002tutorial}.

\subsection{Extracting Principal Components}
\begin{figure}
    \centering
    \input{kuvat/tex/pca-illustrated.pdf_tex}
    \caption{Extracting the PCs of a two-dimensional data set. First the origin is moved to the centroid of the data, original location of which is shown with a red x in the top panel. Next the line along which the variance of the data points is largest is determined, shown with the black line in the middle panel. This is the first PC, to which the second PC is orthogonal and thus fully determined. In the bottom panel the data is plotted along these principal components. }\label{fig:pca-illustrated}
\end{figure}

In order to understand the process of obtaining principal components of a data set let us follow the procedure on a two-dimensional data set shown in the top panel of Fig.\ \ref{fig:pca-illustrated} with black dots. The first step of finding the PCs is to locate the centroid of the dataset i.e.\ the mean of the data along every axis \citep{smith2002tutorial}. This is marked with a red x in the top panel of Fig.\ \ref{fig:pca-illustrated}.

The best-fit line and therefore the PCs always pass through the centroid of the system \citep{pearson1901lines}, so subtracting the location of the centroid from the data is a natural next step, as this ensures that in the next step only the slope has to be determined. This is done in the middle panel of the Fig.\ \ref{fig:pca-illustrated}. If the variables have different units, each variable should be scaled to have equal standard deviations \citep{james2013introduction} unless the linear algebra based approach with correlation matrices, as explained in e.g. \citet{jolliffe2002principal}, is used.

If this scaling is not performed, the choice of units can arbitrarily skew the principal components. This is easy to see when considering for example a case where one has distances to galaxies in megaparsecs and their masses in units of $10^{12}\ \mathrm{M_{\astrosun}}$, both of which might result in standard deviations being of the order of unity and PCA might thus yield principal components that are not dominated by neither variable alone. Now, say another astronomer has a similar data set, but distances are given in meters. In this case, most of the variation is in the distances, so distances will also dominate the PCs. If all variables are measured in the same units, scaling can be omitted in some cases \citep{james2013introduction}.

Now the first PC can be located by finding the line that passes through the origin and has the maximum variance of the projected data points \citep{jolliffe2002principal}, shown with a black line in the middle panel of Fig.\ \ref{fig:pca-illustrated} for our data set. PCs are always orthogonal and intersect at the origin, so in the two-dimensional example case the second and final PC is fully determined. The data set can now be represented using the PCs as is shown in the bottom panel of the Fig.\ \ref{fig:pca-illustrated}.

For a data set with more than two dimensions, the second PC is chosen such that it and the first PC are orthogonal and that variance along the new PC is again maximized \citep{jolliffe2002principal}. This can be repeated for each dimension of the data set or, if dimensionality reduction is desired, only for a smaller number of dimensions.

This level of understanding is often enough to successfully apply PCA to a problem, because PCA has ready-made implementations for many programming languages such as \texttt{prcomp} in R \citep{james2013introduction} and \texttt{sklearn.decomposition.PCA} in the scikit-learn library  for Python \citep{scikit-learn}. If a more mathematical approach is desired, \citet{smith2002tutorial} explains PCA together with covariance matrices, eigenvectors and eigenvalues required to understand the process very clearly. \citet{jolliffe2002principal} also includes a very thorough description of PCA.


\subsection{Excluding Less Interesting Principal Components} \label{sect:pca-excluding}
Even though a data set has as many principal components as there are measured variables, one is often not interested in all of them as the last principal components might explain only a tiny fraction of the total variation in the data \citep{james2013introduction}. Reducing the dimensionality of the problem also greatly eases visualizing and interpreting the data. Thus one might want to retain only the first few PCs when PCA is used to, for example, compress, visualize or just interpret a data set \citep{james2013introduction, johnson2007applied}. Unfortunately, many of the rules and methods used to determine the number of PCs to retain are largely without a formal basis or require assuming a certain distribution which is often not justifiable with the data \citep{jolliffe2002principal}. With careful consideration these methods can nevertheless aid a researcher in making informed decisions and reasoned conclusions, so some rules are introduced in this section.

If the PCA is performed to aid visualizing the data set, retaining only the two first PCs can be a justified choice as two is the maximum number of dimensions that are easy to visualize on two-dimensional media such as paper and the two first PCs determine the best-fit plane for the data \citep{jolliffe2002principal}. Of course the question whether the two PCs are sufficient to describe the data reasonably well still remains unanswered in this case.

\begin{figure}
    \centering
    \input{kuvat/tex/scree.pdf_tex}
    \caption{Example of a scree plot of randomly generated normally distributed data. In this case the plot has a clear elbow at the fifth PC with the PCs 5-9 appearing roughly on a line. Thus the last four or five PCs could be omitted if dimensionality reduction is desired.}\label{fig:scree}
\end{figure}

One widely used technique was introduced by \citet{cattell1966scree} to be used in factor analysis, but is also very much applicable to PCA \citep{jolliffe2002principal}. This so called Cattell scree test involves plotting the variance of the data points along each PC versus the index of the PC. These plots tend to look similar to what is shown in Fig.\ \ref{fig:scree}, resembling a steep cliff with eroded material accumulated at the base, which is why these plots are known as scree plots and the nearly linear section of the plot is called the scree.

When the scree plot has two clearly different areas, the steep slope corresponding to the first PCs and a more gently sloping scree for the latter PCs, locating this elbow in the plot connecting the two areas will give the number of PCs that should be included \citep{jolliffe2002principal}, which in case of Fig.\ \ref{fig:scree} would yield five PCs. Some sources such as \citep{cattell1966scree} suggest that in some cases the PC corresponding to the elbow should be discarded, which will result in one less PC.

Unfortunately, as Cattell also acknowledges in his paper, all cases are not as easy to analyse as the one in Fig.\ \ref{fig:scree} and may prove difficult to discern for an inexperienced researcher. This problem might arise from for example noise in the linear part of the plot or the scree line consisting of two separate linear segments with different slopes. The first case has no easy solution, but in the latter case Cattell suggests using the smaller number of PCs.

Another straightforward method for choosing how many PCs to retain is to examine how much of the total variation in data is explained by first PCs and including components only up to a point where pre-defined percentage of the total variance is explained \citep{jolliffe2002principal}. Whereas the previous method posed a challenge in determining which PC best matches the exclusion criteria, when using this approach the problem arises from choosing the threshold for including PCs. \citet{jolliffe2002principal} suggests that a value between 70~\% and 90~\% of the total variation is often a reasonable choice, but admits that the properties of the data set may justify values outside this range. Unfortunately, the suggested range is quite wide, so it may contain multiple PCs and therefore it is up to the researcher to determine the best number of PCs, while the criterion again acts only as an aid in the process. 


\section{K-Fold Cross-Validation} \label{sect:cross-validation}
When a model has been constructed, determining the expected error of its predictions is most often required. This is important both when a model is used to make predictions and when the model is compared with other models or the same model with different parameters. Even when training the model yields an appealing estimate of the error, e.g. the value of mean-square error in OLS fitting, this does not represent the actual performance of the model on new data as it is an underestimate of the error of the same model applied to unseen data \citep{alpaydin2014introduction}. The problem is even more severe if two models are compared using the training errors, as a more complex model can always conform to the details of the data set better than a simpler one, even when using the complex model results in overfitting.

This is why a portion of the data, called the validation set, must not be used in constructing the model but rather in estimating the expected error when the model is used on new data \citep{alpaydin2014introduction}. Often just training using a portion of the data and then determining the error using the rest is not sufficient though, as the data set might be small or outliers can be present, causing the results to change significantly depending on which data points belong to the validation set \citep{alpaydin2014introduction}. One solution for this problem is to repeat the training and validating multiple times, using different data points for validation each time, and averaging the error over these runs \citep{alpaydin2014introduction}. This approach is known as cross-validation. Many variations of it exist, but here only the $k$-fold cross-validation is covered. It is also the variant used in this thesis assessing the expected error of the model constructed for estimating the mass of the Local Group and presented in section~\ref{sect:statistical_estimate}.

In $k$-fold cross-validation, the training data is first divided into $k$ equal-sized parts, called folds. One by one, these folds are then used as validation sets for a model trained using the rest of the folds \citep{alpaydin2014introduction, murphy2012machine}. This means that each data point is used once in validation and $k-1$ times in training. Typical values for $k$ are e.g. 5, 10, or 30, the best value depending on the size of the data set at hand and the cost of a single training iteration \citep{alpaydin2014introduction, murphy2012machine}.

When the used model has been chosen, the final model should be constructed using all data in the training set, without $k$-fold cross-validation \citep{alpaydin2014introduction}. It should also be noted that when $k$-fold cross-validation is used to choose which algorithm or model to use or to choose the parameters of the model, the error reported by $k$-fold cross-validation does not represent the actual accuracy of the model \citep{alpaydin2014introduction}. This happens because the model can accidentally be overfitted to produce small error when the pre-defined data points are used to test it but perform badly when new data is introduced. In order to determine the expected error of a model trained using $k$-fold cross-validation, a portion of the data set, known as the test set, should be excluded from the training \citep{alpaydin2014introduction}. The test set must not be used at all for either training or validation before the final model is constructed, at which point this data can be used to report the error of the model. A typical size for the test set is around a third of the whole data set \citep{alpaydin2014introduction}.


\section[Comparing Two Samples Drawn from Unknown Distributions]{Comparing Two Samples Drawn from\\Unknown Distributions} \label{sect:statistical_tests}
A common question in multiple fields of science is whether two or more samples are drawn from the same distribution. Such questions can emerge for example when comparing effectivenesses of two procedures, determining if an instrument has changed over time or whether observed data is compatible with simulations. There are multiple two-sample tests that can address this kind of questions, e.g. Kolmogorov-Smirnov, Cram\'er-von Mises and Anderson-Darling tests.

All of the tests introduced in this section have been designed to be applied to numerical data and compare empirical distribution functions (EDF) of the datasets. In addition to comparing two samples, these tests can be used as one-sample tests to determine whether it is expected that the sample is from a particular distribution. However, some restrictions apply when using the one-sample variants. In this work, comparing samples is done using the Kolmogorov-Smirnov test. It is introduced in section~\ref{sect:ks}, mostly following \citet{bohm2010introduction} and \citet{feigelson2012modern}. Other similar tests are also shortly described in section~\ref{sect:edf-tests} in order to help motivate the choice.


\subsection{Kolmogorov-Smirnov Test} \label{sect:ks}
For astronomers, one of the most well-known statistical test is the Kolmogorov-Smirnov test, also known as the KS test. It is computationally inexpensive to calculate, easy to understand and does not require binning of data. It is also a nonparametric test i.e.\ the data does not have to be drawn from a particular distribution. In the astrophysical context this is often important because astrophysical models usually do not fix a specific statistical distribution for observables and it is common to carry out calculations with logarithms of observables, after which the originally possibly normally distributed residuals  will no longer follow a normal distribution. When using the KS test, the values on the x-axis can be freely reparametrized: for example using $2x$ or $\log x$ on x-axis will result in same value of the test statistic as using just $x$ \citep{press2007numerical}.

The test can be used as either one-sample or two-sample test, both of which are very similar. For two-sample variate the test statistic is calculated based on empirical distribution functions $\hat{F}_1$ and $\hat{F}_2$ derived from two samples and the test statistic
\begin{equation}
	D = \sup_{x} |\hat{F}_1(x) - \hat{F}_2(x)|
\end{equation}
uses the maximum vertical distance of the EDFs. This test statistic is then used to determine the p-value and thus decide whether the null hypothesis can be rejected. For one-sample variate the procedure is similar, but EDF $\hat{F}_2$ is substituted with the CDF that corresponds to the null hypothesis.

\begin{figure}
   \centering
   \includegraphics[width=0.9\textwidth]{kuvat/kstest.pdf}
   \caption{KS test parameter values (brown vertical lines) shown graphically for three samples from Fig.\ \ref{fig:edf}.}
   \label{fig:ks} 
\end{figure}

As an example, let us consider two pairs of samples from Fig.\ \ref{fig:edf}: orange and blue (two samples drawn from different normal distributions) and orange and violet (two samples drawn from the same normal distribution). We can formulate the test and null hypotheses for both pairs as $H_0$=``the two samples are drawn from the same distribution'' and $H_1$=``the two samples are not drawn from the same distribution'' and choose a significance level of for example $\alpha=0.05$ or $\alpha=0.01$.

The test statistic is then calculated. For these samples, we get $D=0.51$ for the orange-blue pair and $D=0.20$ for the orange-violet pair. Test statistics are illustrated in Fig.\ \ref{fig:ks} where the test statistics $D$ are shown as brown vertical lines. According to Python function \texttt{stats.ks\_2samp} from \texttt{SciPy} library \citep{scipy}, these values of $D$ correspond to p-values $9.9\times 10^{-5}$ and $0.44$ respectively, which means that the null hypothesis ``orange and blue samples are drawn from the same distribution'' is rejected at both 0.05 and 0.01 significance levels but the null hypothesis ``orange and violet samples are drawn from the same distribution'' cannot be rejected.

\begin{figure}
   \centering
   \includegraphics[width=0.9\textwidth]{kuvat/kstest-error.pdf}
   \caption{A graphical representation of the KS test ran on another pair of samples drawn from orange and blue distributions in Fig.\ \ref{fig:cdf}.}
   \label{fig:ks-error} 
\end{figure}

In this case the KS test produced result that matches the actual distributions from which the samples were drawn. Using different random realizations might have resulted in a different conclusion, for example the pair of EDFs from different normal distributions, shown in Fig.\ \ref{fig:ks-error}, result in $D=0.17$ that corresponds to a p-value of $0.64$ i.e.\ null hypothesis could not have been rejected using the $\alpha$ specified earlier. In a similar manner there can be cases where two samples from one distribution are erroneously determined not to come from the same distribution if the samples differ from each other enough due to random effects.

The example in Fig.~\ref{fig:ks-error} also illustrates one major shortcoming of the KS test: it is not very sensitive to small-scale differences near the tails of the distribution. The orange sample extends much further left, but because EDF is always zero at the lowest allowed value and one at the highest one the vertical distances near the tails are small and the test is most sensitive to differences near the median value of the distribution. On the other hand, the test performs quite well when the samples differ globally or have different means. \citep{feigelson2012modern}

The KS test is also subject to some limitations and it is important to be aware of them in order to avoid misusing it. First of all, the KS test is not distribution free if the model parameters, e.g. mean and standard deviation for normal distribution, are estimated from the dataset that is tested. Thus the tabulated critical values can be used only if model parameters are determined from some other source such as a simulation, theoretical model or another dataset.

Another severe limitation of KS test is that it is only applicable to one-dimensional data. If the dataset has two or more dimensions, there is no unique way of ordering the points to plot EDF and therefore if KS test is used, it is no longer distribution free. Some variants that can handle two or more dimensions have been invented, such as ones by \citet{peacock1983twodimensional} and \citet{fasano1987multidimensional}, but the authors do not provide formal proof of validity of these tests.


\subsection{Other Tests Based on EDFs} \label{sect:edf-tests}
Unsatisfactory sensitivity of the KS test can motivate the use of other more complex tests on some data sets. Such tests are for example the Cram\'er-von Mises test (CvM) and Anderson-Darling (AD) test, both of which have their strengths. Similar to KS test, both of these can be used as one-sample or two-sample variants.  

First of these tests integrates over the squared difference between the EDF of the sample and CDF from the model or two EDFs in case of two-sample test. The test statistic $W^2$ for one-sample case can be expressed formally as
\begin{equation}
	W^2 = \int_{-\infty}^{\infty}[\hat F_1(x) - F_0(x)]^2\ dF_0(x)
\end{equation}
For two-sample version, the theoretical CDF $F_0$ has to be replaced with another empirical distribution function $\hat F_2$.

Due to integration, the CvM test is able to differentiate distributions based on both local and global differences, which causes it to often perform better than the KS test. Similar to the KS test, the CvM test also suffers from EDFs or an EDF and a CDF being equal at the ends of the data range, which again makes the test less sensitive to differences near the tails of the distribution. 

In order to achieve constant sensitivity over the entire range of values, the statistic has to be weighted according to the proximity of the ends of the distribution. The AD test does this with its test statistic defined as
\begin{equation}
	A^2 = N \int_{-\infty}^{\infty} \frac{[\hat F_1(x) - F_0(x)]^2}{F_0(x)[1-F_0(x)]}\ dF_0(x)
\end{equation}
where $N$ is the number of data points in sample. This weighing makes the test more powerful than the KS and CvM tests in many cases. \citep{bohm2010introduction, feigelson2012modern}

Also other more specific tests exist, such as the Kuiper test which is well suited for cyclic measurements. The test should always be chosen to match the dataset such that it best differentiates between the null and research hypotheses.

\section{Cluster Analysis with DBSCAN} \label{sect:cluster-analysis}
The aim of cluster analysis is to find groups of similar data points within a data set, these groups being called clusters \citep{han2000data}. Ideally data points within each cluster are as similar to each other as possible and dissimilar to data points outside the cluster \citep{han2000data}. It has many applications in multiple fields ranging from machine learning to biology and it can also be applied in astronomy and astrophysics to e.g. classify objects and possibly even discover new ways to classify them \citep{ball2010data, han2000data}. This is how or example \citet{mukherjee1998three} were able to divide gamma ray bursts into three distinct categories separated by a combination of their durations, brightnesses and spectra.

The definition of a cluster is intentionally loose, allowing the exact definition to vary between problems to which it is applied \citep{tan2006introduction}. Differences can arise for example from whether one data point can be part of multiple clusters and whether all data points should be part of some cluster or if the dataset contains noise not belonging to any cluster. Many different algorithms have been developed for performing cluster analysis, each best suited for some specific type of data \citep{han2000data}. The algorithm used should always be chosen based on the desired properties of the clusters and the properties of the data set.

One clustering algorithm that is well suited for finding clusters based on a density threshold in a spatial data set is DBSCAN, originally presented by \citet{ester1996density}. It produces a density-based clustering, meaning that the identified clusters consist of data points residing in high-density regions, each cluster separated from others by a low-density region \citep{han2000data}. This definition allows both finding clusters of arbitrary shape and working with noisy data with data points that do not belong to any cluster \citep{ester1996density}.

The algorithm uses two parameters defining the properties of the clusters: $\varepsilon$ (sometimes denoted Eps) and MinPts. The $\varepsilon$ parameter resembles the linking length used by the \textsc{fof} algorithm introduced in section \ref{sect:halofinding} as it sets the size of the neighbourhood used when determining whether a data point is part of a cluster. The distance function can be chosen freely to best fit the problem, possible metrics being e.g. Euclidean or Manhattan (also known as taxicab in some sources) distance. The points within the distance $\varepsilon$ from a given data point are said to belong to its $\varepsilon$\=/neighbourhood. This definition implies that a data point also belongs to its own $\varepsilon$\=/neighbourhood.

If more than MinPts data points, including the point itself, belong to the $\varepsilon$\=/neighbourhood of a data point, it belongs to a cluster and is defined to be a core point of that cluster. If a core point has another core point in its $\varepsilon$\=/neighbourhood, the two core points belong to the same cluster. For example in Fig.\ \ref{fig:DBSCAN-twocores} the red points are all core points of one cluster, but as no blue points are within $\varepsilon$\=/neighbourhood of any red point, the blue points form their own cluster. When MinPts~=~2, the resulting clustering is identical to one produced by the \textsc{fof} algorithm.

\begin{figure}
    \centering
    \input{kuvat/tex/DBSCAN-twocores.pdf_tex}
    \caption{An example data set shown as dots, surrounded by circles showing the extent of $\varepsilon$\=/neighbourhoods of each data point. To reduce clutter, all but one $\varepsilon$\=/neighbourhood border are shown in grey. When MinPts = 4, the data points are separated to two clusters, shown in red and blue here. As every point has at least four data points within its $\varepsilon$\=/neighbourhood, all data points are core points.}\label{fig:DBSCAN-twocores}
\end{figure}

Some points might belong to an $\varepsilon$\=/neighbourhood of a core point but have $\varepsilon$\=/neighbourhoods with less than MinPts data points in them. These points are border points and they are also members of the cluster containing the core point in the $\varepsilon$\=/neighbourhood of the border point. For example in Fig.\ \ref{fig:DBSCAN-borderpoint} the light blue border point belongs to the cluster defined by the dark blue core points. Cluster membership does not propagate further from border points, meaning that e.g. the grey data point next to the light blue point in Fig.\ \ref{fig:DBSCAN-borderpoint} is not part of any cluster despite belonging to the $\varepsilon$\=/neighbourhood of one of the cluster members as this member is a border point.

\begin{figure}
    \centering
    \input{kuvat/tex/DBSCAN-borderpoint.pdf_tex}
    \caption{A single cluster with MinPts = 4 and $\varepsilon$ shown with circles. The dark blue points all have at least four points in their $\varepsilon$\=/neighbourhood so they are core points, but the light blue point has an $\varepsilon$\=/neighbourhood of only three points. Thus, despite it being part of the $\varepsilon$\=/neighbourhood of a core point, it is not a core point. Instead, it is a border point of the cluster. The grey point in the $\varepsilon$\=/neighbourhood of the light blue point does not have four points in its $\varepsilon$\=/neighbourhood nor does it belong to an $\varepsilon$\=/neighbourhood of any core point, so it is not part of the cluster. Similarly none of the other grey points have enough particles in their neighbourhoods so they do not belong to any cluster.}\label{fig:DBSCAN-borderpoint}
\end{figure}

\begin{figure}
    \centering
    \input{kuvat/tex/DBSCAN-ambiquity.pdf_tex}
    \caption{In this situation the grey data point could belong to either red or blue cluster. The DBSCAN algorithm adds it to the cluster that is discovered first, so depending on the order of the data points it might be part of either red or blue cluster.}\label{fig:DBSCAN-ambiquity}
\end{figure}

In a situation where a border point is in $\varepsilon$\=/neighbourhoods of core points belonging to different clusters, as happens for example to the grey data point in Fig.\ \ref{fig:DBSCAN-ambiquity}, it is not clear to which cluster the border point should be assigned. The density-based definition of a cluster presented by \citet{ester1996density} would classify these border points to all clusters whose core points reside in the $\varepsilon$\=/neighbourhoods of these border point. Often a clustering with each data point belonging to a maximum of one cluster is desired instead, so the DBSCAN algorithm classifies this kind of border points only to the first cluster they are discovered in. 

This introduces some possibly unexpected behaviours. First of all, while the algorithm is deterministic when run multiple times on the same data set, the clustering is dependent on the order of the data points \citep{schubert2017dbscan}. If the order of data points is permuted, the cluster to which these border points between two clusters are assigned might change. Fortunately this is a fairly rare occasion and the effects of assigning a single border point to one cluster or another are often insignificant \citep{schubert2017dbscan}.

\begin{figure}
    \centering
    \input{kuvat/tex/DBSCAN-singlecore.pdf_tex}
    \caption{Another clustering with MinPts = 4. One border point from red, blue and yellow clusters each resides within the $\varepsilon$\=/neighbourhood of the grey point, but as these are border points (shown with lighter colours), the grey point belongs to none of the other clusters. The grey point has four points in its $\varepsilon$\=/neighbourhood, making it a core point. If the three other clusters are discovered before the cluster membership of the grey data point is determined, it becomes the only data point of its own cluster.}\label{fig:DBSCAN-singlecore}
\end{figure}

Another edge case is even more extreme. Consider the configuration of data points is Fig.\ \ref{fig:DBSCAN-singlecore} where the $\varepsilon$\=/neighbourhood of the grey point contains one border point from each of the three other clusters. As these are only border points, the grey point is not member of any of the three coloured clusters, but its $\varepsilon$\=/neighbourhood has four points, so it is a core point of its own. Now assuming that the red, blue and yellow clusters are all found before the grey point is processed, the grey data point becomes the core point of a cluster containing only a single data point. These clusters with fewer than MinPts members are of course even rarer than the occasions when the cluster membership of a border point is ambiguous, as they require border points of multiple clusters stretching close to the core area of a small cluster without merging with it. Even with these fairly rarely occurring undesirable features, the DBSCAN algorithm is in many cases more robust than the \textsc{fof} algorithm, as choosing an appropriately large value of MinPts for example reduces the probability of a thin bridge of data points connecting otherwise separate clusters.

The properties of the identified clusters naturally depend on the used parameters $\varepsilon$ and MinPts. Different heuristics exist for determining suitable values for them. For MinPts, \citet{schubert2017dbscan} provide a value of twice the dimensionality of the dataset, e.g. for two dimensional data MinPts = 4 should be chosen. For some datasets a higher value can be better if the dataset for example contains duplicate data points or is very large or high-dimensional \citep{schubert2017dbscan}.

\begin{figure}
    \centering
    \includegraphics{kuvat/k-distances-singlesim.pdf}
    \caption{Six $k^{\text{th}}$ nearest neighbour distance plots depicting the angular distances between subhaloes in the simulations used in this thesis with the values of $k$ ranging from 2 to 7. These  Variations in the location of the elbow are reasonably small when $k$ is varied.}\label{fig:k-distances-singlesim}
\end{figure}

According to \citet{ester1996density}, $\varepsilon$ can be set after choosing the value of MinPts by inspecting the distances of data points to their $k$\textsuperscript{th} nearest neighbour. Here $k$ can be set to the same value as MinPts, though the results do not seem to vary greatly if $k$ is increased \citep{ester1996density}. When the $k$\=/distances are sorted in descending order and plotted, the resulting plot often shows an elbow similar to the one in the scree plot in Fig.\ \ref{fig:scree}. Examples of such $k$\=/distance plots can be seen in Fig.\ \ref{fig:k-distances-singlesim} that shows $k$\=/distance plots for six different values of $k$ ranging from 2 to 7. \citet{ester1996density} suggest that the distance corresponding to the elbow should be used as a value of $\varepsilon$, with the points to the right of the elbow corresponding to noise. The location of the elbow is usually not very sensitive to small changes in $k$, as can be seen in Fig.\ \ref{fig:k-distances-singlesim} where the location of the elbow changes less than 0.1 radians over the six different values of $k$. If domain knowledge is available, $\varepsilon$ and MinPts can also be chosen differently to best suit the data set \citep{schubert2017dbscan}.


\chapter{Findings from Halo Catalogue Analysis} \label{chapt:results}
The simulations introduced in section \ref{sect:simruns} offer a wide variety of possibilities for studying galaxy groups resembling the Local Group using statistical tools. In this chapter, the properties of the Milky Way and M31 analogues are studied in order to find out how their properties are distributed over the ranges allowed by the identification criteria. After this, in sections~\ref{sect:hf} and \ref{sect:hf-anisotropy}, the Hubble flows surrounding the Local Group analogues are studied. The gathered information is then used in section \ref{sect:statistical_estimate} to construct a model for estimating the mass of the Local Group.

\section{Properties of the Local Group Analogues} \label{sect:properties_of_LGs}
As the Local Group analogue identification criteria described in section \ref{sect:zooms} allow for a range of different subhalo pairs, the properties of the found Local Group analogues vary. This can be seen in Fig. \ref{fig:LGproperties} where both distance between the main subhaloes and the velocity components span nearly the entire allowed range of values. The plot also contains a histogram of overdensity within 2 Mpc of the Milky Way analogue.

\begin{figure}
    \centering
    \includegraphics{kuvat/LGproperties.pdf}
    \caption{Histograms of the mutual radial and tangential velocities (top row) and distances (bottom left panel) of the main subhaloes in each Local Group analogue found in the simulations. The bottom right panel shows the overdensity within 2 Mpc of the Milky Way analogue in each simulation. The densities are calculated based on the combined mass of subhaloes, so the values of overdensity are somewhat smaller than they would be if all mass within 2~Mpc was included.}\label{fig:LGproperties}
\end{figure}

The peak of the radial velocity histogram is located near 100 km/s, meaning that most of the identified Local Group analogues have radial velocities near the measured value for the Andromeda Galaxy \citep{vandermarel2012m31}. Pairs with low radial velocity are rare, as according to the timing argument in order to have a radial velocity close to zero the mass of the Local Group would typically have to be small, which is excluded by the mass criterion used. For tangential velocities, the distribution is much more uniform, though values near the ends of the allowed interval are still fairly rare. The \citet{vandermarel2012m31} tangential velocity measurement result of 17~km/s would also fall in one of the densely populated bins, though the tangential velocity measurements have much larger uncertainties than radial velocities: the $1\sigma$ confidence region of \citet{vandermarel2012m31} extends to 34.3~km/s, covering more than half of the allowed range in the Local Group analogue identification criterion.

Modern distance estimates for M31 range from 770 to 800 kpc with errors of few tens of kpc, meaning that the distance is slightly shorter than in most of the simulated Local Group analogues but still near the peak of the distribution \citep{mcconnachie2005distances}. Analogues with shorter distances between the primaries are somewhat rare whereas distances at the longer end of the allowed range are more common. This is likely an effect of the mass and radial velocity criteria together with the fact that a spherical shell of constant thickness has larger volume on larger distances and thus, if a constant probability density function is assumed, also more subhaloes.

As can be seen in the overdensity histogram, most of the Local Group analogues are located in an overdense region. Densities in the histogram only include the mass within subhaloes, so the total densities within the examined 2~Mpc regions is somewhat larger. There is also a small number of Local Group analogues in very underdense regions. This is possible only if the primary subhaloes are on the lower end of the allowed mass range, they have only low mass satellites and there are no other comparable structures within the 2~Mpc distance. Very high density regions are also rare as they require either a large number of smaller subhaloes or a subhalo with a mass close to that of the primaries of the Local Group to fall within the 2~Mpc region.

\begin{figure}
    \centering
    \includegraphics{kuvat/LGmasses.pdf}
    \caption{The masses of the more massive primary plotted against the combined mass of the two primary galaxies, each black dot representing one Local Group analogue. Grey lines show the allowed range: the more massive primary has to contain at least half of the combined mass and the smaller one has to have a mass of at least $4 \times 10^{11}~\mathrm{M_{\astrosun}}$.}\label{fig:LGmasses}
\end{figure}

The combined mass of the primaries in the Local Group analogues and the distribution of mass between the subhaloes also varies from simulation to simulation. This is illustrated in Fig. \ref{fig:LGmasses} that shows the mass of the more massive of the primaries in each simulation against the combined mass of the pair. The allowed range of the masses is outlined with the grey lines, the lower one showing the threshold where the more massive subhalo holds exactly half of the total mass and the upper one illustrating the maximum possible mass in the more massive subhalo when the Local Group analogue criteria require the smaller one to have a mass of at least $4 \times 10^{11}~\mathrm{M_{\astrosun}}$.

There are Local Group analogues both with almost equal masses and with the smaller subhalo barely exceeding the minimum mass, though as can be seen in the figure, the former case is more common. Especially for total masses larger than $2 \times 10^{12}~\mathrm{M_{\astrosun}}$ the area near the upper limit for the mass of the more massive primary is nearly vacant of data points. This is likely caused by environments with density high enough for them to be able to host the more massive primary usually having other massive subhaloes in them too. Thus, in dense environments it is common that more than one subhalo with a mass larger than that of the smaller primary is present within the 2~Mpc distance used to check that the two primaries are the dominant subhaloes in their surroundings.

\begin{figure}
    \centering
    \includegraphics{kuvat/masshistogram.pdf}
    \caption{Histograms showing the masses of the identified Local Group analogue primaries. The blue and yellow histograms contain only the less or more massive subhalo of each primary respectively whereas the green histogram shows the masses of all subhaloes in a primary pair of a Local Group analogue.}\label{fig:masshistogram}
\end{figure}

The Local Group identification criteria allow the combined masses of the primaries between $8 \times 10^{11}~\mathrm{M_{\astrosun}}$ and $1 \times 10^{13}~\mathrm{M_{\astrosun}}$, but whereas the lower end of the range is populated, there are no Local Group analogues with combined mass of $7 \times 10^{12}~\mathrm{M_{\astrosun}}$ or more. This is likely because high mass subhaloes are relatively rare in the sample: only about one fifth of all subhaloes in a primary pair of a Local Group analogue have a mass of more than $3 \times 10^{12}~\mathrm{M}_{\astrosun}$. This is illustrated in Fig.~\ref{fig:masshistogram} that shows histograms of masses both separately for the smaller and larger primary in each pair and all primaries together. It can also be seen that the masses do not span the full allowed range, as the most massive subhalo in a Local Group analogue has a mass of $4.3 \times 10^{12}~\mathrm{M}_{\astrosun}$, % 4.31347031218e+12 M_sun
clearly below the maximum allowed mass of $5 \times 10^{12}~\mathrm{M}_{\astrosun}$. It is notable how the distribution of the masses of the larger subhaloes in the primary pairs extends all the way to the lowest masses whereas there are no primary pairs with the mass of the smaller subhalo exceeding $3 \times 10^{12}~\mathrm{M}_{\astrosun}$. % sum(allhalomasses > 3e12)*1.0/len(allhalomasses) = 0.20351758794

Overall, the 199 Local Group analogues chosen from the zoom-in simulations span a wide range of different two-halo systems. Only the values near the very edges of the allowed ranges in each variable are rare. This allows an analysis of correlations of the mass of the system with its other properties.


\section{Hubble Flow Measurements} \label{sect:hf}
The Local Group analogues analysed in this project do not only differ in the properties of the primaries but also in the environments in which they reside. The primaries can for example have few or many satellites around them and the velocity dispersion within the system can vary. Similarly, the volume further out from the system can be densely populated with subhaloes possibly arranged in filaments or walls, or the Local Group analogue can be located in a very low density environment. These environmental factors are explored in this section and a way to quantify a selection of them for further analysis is presented.

\subsection{Different Hubble Flows}

\begin{figure}
    \centering
    \includegraphics{kuvat/hubblediagrams.pdf}
    \caption{Hubble flows around Milky Way analogues in four simulations differing in for example the total number of subhaloes, scatter of the flow, scatter within the Local Group and the number of bound structures present within the flow. Each black dot represents one subhalo in a simulation and subhaloes are plotted up to the distance at which the first type 2 particle resides.}\label{fig:hubblediagrams}
\end{figure}

A selection of these different environments is illustrated in Fig.~\ref{fig:hubblediagrams} showing radial velocities as seen from the lower mass primary plotted against the distance of the subhalo. This corresponds to what a Hubble diagram would look like as seen from the Milky Way analogue of that simulation. Each of the four panels contains all subhaloes in that simulation up to the closest type 2 particle. The simulations shown in this plot are chosen such that they show a variety of different Hubble flows, but naturally four simulations can never show the full range of all 199 analysed simulations nor all structures that can occur.

The simulations on the top row in Fig.~\ref{fig:hubblediagrams} both show fairly small dispersion of radial velocities at each distance, corresponding to a cold Hubble flow. In the upper left plot, there is a region with a much larger velocity dispersion between approximately 4 and 5~Mpc, likely corresponding to a massive subhalo surrounded by satellite galaxies, some of which are receding and some approaching the observer as they orbit the primary or primaries of the system. The upper right plot on the other hand shows no such variation in the velocity dispersion, apart from the dispersion within the nearest 1~Mpc or so being higher than in the rest of the plot due to the velocities of the satellite galaxies within the Local Group analogue, whereas the upper left plot has fairly low velocity dispersion within the Local Group.

The simulations presented in the bottom row of the figure show considerably higher velocity dispersions. Especially the bottom right plot, with radial velocities in it easily spanning a range of 400~km/s at all distances except within 1~Mpc of the Milky Way analogue, differs from the top row plots. The bottom right plot also has a much greater number of subhaloes than the other plots. The bottom left plot, on the other hand, seems to show a Hubble flow that is divided into two parts with a vertical offset of around 100~km/s. This kind of situation might occur, for example, if the Local Group analogue is moving relative to the surrounding structures.

These kinds of differences in the surroundings of the Local Group analogues might be correlated with the mass of the primaries and could thus provide a way to gauge the mass of the real Local Group by observing its surroundings. These correlations are explored in section \ref{sect:statistical_estimate}, but in order to conduct statistical analysis, the properties of the Hubble flow have to be quantified first.


\subsection{Measuring the Hubble Flow} \label{sect:hubblemeasurement}
The first step in quantifying the properties of the Hubble flows around the Local Group analogues is to determine the basic shapes of each flow. In this work, the radial velocity field is modelled as consisting of two parts: at small distances when going outward from the Milky Way analogue, the radial velocity stays nearly constant with a possibly large scatter, and at larger radii a positive linear relationship between distance and radial velocity is expected. Naturally this is not an exact description of the situation, as the transition from a gravitationally bound Local Group analogue to a linear Hubble flow is not instantaneous and the gravitational pull of the Local Group also affects objects outside it.

Fitting a linear relation to the linear Hubble flow observed around the Milky Way analogues is, in principle, simple, as for example the ordinary least squares regression (see section \ref{sect:regression}) can be used. Determining the distance from which the linear model can be applied, on the other hand, is less straightforward and multiple possible schemes exist. Possibly the simplest option would be to set an arbitrary distance threshold and exclude all subhaloes within this distance from the fitting procedure. For some simulations, this will likely exclude more data points than is necessary and for others it is possible that not enough points will be excluded. Even when the exclusion threshold is chosen carefully, in a large sample a constant distance will always be suboptimal for some simulations, so a solution that is able to adapt to the properties of each simulation is desired.

\begin{figure}
    \centering
    \includegraphics{kuvat/hubblefit.pdf}
    \caption{Hubble flow fit (black line) calculated by excluding a number of innermost subhaloes resulting in the smallest mean squared residual. For this simulation, this was achieved when excluding the first 1.4~Mpc of subhaloes, producing a measured Hubble flow slope of 87.0~km/s/Mpc. The excluded subhaloes are shown as grey points and subhaloes used to fit the Hubble flow as black dots.}\label{fig:hubblefit} %86.9929348817 km/s/Mpc, 1.43211533035 Mpc
\end{figure}

When not otherwise stated, in this work the number of nearest subhaloes to be ignored is chosen to minimize the sum of squares of the residuals from the fit. This is done by finding the number of innermost subhaloes up to 2~Mpc away from the Milky Way analogue to be ignored in order to get the smallest mean squared residual for the ordinary least squares fit. The final result of this kind of fitting procedure is illustrated in Fig.~\ref{fig:hubblefit}, where all subhaloes are shown as dots: the excluded ones in grey and ones used in fitting in black. The best fit line is also shown. In this simulation the excluded range corresponds well to the distinction a human would likely make when asked to determine the distance at which the motions inside Local Group are no longer the dominant effect on the radial velocity.

This approach is naturally not without weaknesses. For example if the innermost subhaloes within Local Group have a very small radial velocity dispersion, no subhaloes might be excluded as a large number of data points with small dispersion can reduce the mean squared residual even if they skew the result. Similarly groups of data points with large scatter can get excluded even if they otherwise follow the Hubble flow neatly. Luckily these problems are usually not severe as even when the maximum of 2 Mpc of the nearest region is excluded from fitting, most simulations still have more than 3~Mpc worth of data to use for fitting and thus small changes in including or excluding subhaloes in the innermost regions do not produce large effects. This is also confirmed in the following subsection where the dependence of Hubble flow parameters on the distance are explored.

When the results of the fitting are expressed in this work, the Hubble flow fitting result is not expressed in the most familiar slope-intercept form of $y = kx + b$ giving the slope and the $y$ intercept. Instead the result is expressed in terms of the Hubble flow slope $H_0$ and the distance at which the fitted radial velocity is zero, corresponding to giving the $x$ intercept instead of $y$. This convention is chosen as the zero point indicates the distance at which subhaloes are no longer bound to the Local Group, which in turn can be used to probe the mass of the Local Group. Sometimes the velocity dispersion of the Hubble flow is also interesting. In order to calculate it, the Hubble flow fit is first subtracted from the radial velocities, after which the velocity dispersion is obtained by taking the standard deviation of the differences.


\subsection{Dependence of the Hubble Flow Parameters on Distance}
The distance from the Milky Way analogue to the nearest type 2 particle varies from simulation to simulation as was seen in Fig.~\ref{fig:uncontaminatedDistances} in section~\ref{sect:zooms}. The range of closest to furthest distance to the nearest type 2 particle is also quite substantial: the closest particle can be closer than 2 Mpc away from the Milky Way analogue but some simulations contain only type 1 particles in a spherical volume with radius of more than 10~Mpc. Thus if the Hubble flow measurements were very sensitive to the distance up to which data is available, simulations might appear to have different Hubble flows just because they are measured up to different distances.

\begin{figure}
    \centering
    \includegraphics{kuvat/overdensity+H0.pdf}
    \caption{Median $H_0$ results (blue curve) and overdensities (orange curve) in the simulation sample up to a given radius. The plot shows clearly how both of the values evolve within the Local Group, including the effect of the M31 and its satellites, and how the values stabilize at larger radii. The overdensity is measured using only mass contained in subhaloes and thus at large distances it tends to a value smaller than unity. At each radius the plot only contains simulations that are free of type~2 particles up to that distance, so the last measured values are medians of values from very few simulations.}\label{fig:overdensity+H0}
\end{figure}

The possibility of this affecting the results of Hubble flow measurements is explored in Fig.~\ref{fig:overdensity+H0}, the blue curve of which shows the median value of the Hubble flow slope $H_0$ when all subhaloes within a given radius are used to construct an ordinary least squares fit. As the closest subhaloes are also part of the data used in fitting, the fits done only based on the nearest subhaloes produce negative values of $H_0$. The smallest values are seen at the radius where the M31 analogues reside in the simulations, produced by the M31 and its satellite system moving towards the Milky Way. Moving the exclusion threshold further out, the measured values of $H_0$ grow as more and more of the actual Hubble flow is included until at approximately 3~Mpc the median value reaches a maximum and no longer changes.

The same plot also includes a curve showing the median overdensity within a radius, mass being measured as the combined mass of all subhaloes with a centre of potential within the given distance from the Milky Way. Near the Milky Way the overdensity is large but it decreases when the examined volume grows until M31 is reached, at which point the median overdensity rises again, showing a bump in the curve. After this the decrease in density continues until after 3~Mpc there is very little mass belonging to subhaloes compared to the  volume. As the mass only contains mass enclosed in subhaloes, at large radii the overdensity does not tend to one but to a smaller value.

The two curves in Fig.~\ref{fig:overdensity+H0} correspond to each other nicely: at regions with high overdensity structures are gravitationally bound and $H_0$ measurements yield negative values and, at larger distances, as the overdensity decreases the values of $H_0$ rise closer to the global value in the simulation. At very large distances of more than 7~Mpc the $H_0$ curve shows slight variations, but as there are only few simulations that are free of type 2 particles up to those distances, these are likely not actual features but instead produced by local variations in the Hubble flow of possibly as few as a single simulation.

It is also notable that the measured median value of $H_0$ rises to a value of more than 80~km/s/Mpc, i.e. clearly higher than the global value of 67.77~km/s/Mpc for the simulations. This can happen as the Local Group analogue selection criteria also places indirect restrictions on the surroundings of the system. This effect can also possibly be used when determining the mass of the Local Group.

As the point after which the measured median value for $H_0$ no longer changes is reached fairly soon, it seems that e.g. 5~Mpc region containing only type~1 particles is sufficient for obtaining a reasonably reliable estimate of the shape of the Hubble flow surrounding the Local Group analogue. This is important for further statistical analysis as only a small fraction of the simulations is free of type 2 particles up to for example 7~Mpc whereas more than half of the simulations still contain only type 1 particles within 5~Mpc.

\begin{figure}
    \centering
    \includegraphics{kuvat/shelledH0.pdf}
    \caption{Median $H_0$ when fitted in all subhaloes with distances from the Milky Way analogue in a moving 2~Mpc bin. The grey areas show ranges containing the middle 90\%, 75\% and 50\% of the measurements. The value of $H_0$ is calculated only for bins that do not contain any type 2 particles, which results in the total number of data ponints decreasing outwards. For approximately the last megaparsec, the small number of available data points causes the 90\% and 75\% limits merging.}\label{fig:shelledH0}
\end{figure}

As the $H_0$ values shown in Fig.~\ref{fig:overdensity+H0} include all subhaloes up to a given distance, they  differ from values obtained as described in section~\ref{sect:hubblemeasurement}. The plot also does not give  any insight into whether the observed relation between distance and radial velocity in simulations is linear or not. In order to examine these effects, Fig.~\ref{fig:shelledH0} shows how the median $H_0$ changes as a function of distance when measured using all subhaloes in a moving 2~Mpc bin. The plot also shows the extents of the ranges containing the middle 90\%, 75\% and 50\% of the $H_0$ values in each bin. The merging of different ranges at large radii is caused by the plot including in each bin only those simulations that do not have type~2 particles in that bin, so there are very few data points in the last bins.

Like Fig.~\ref{fig:overdensity+H0}, this plot also shows how initially the values of $H_0$ grow when the bin moves away from the Milky Way analogue, but the curve reaches a significantly higher value of $H_0$ than the curve in Fig.~\ref{fig:overdensity+H0}. This is caused by the bin including both M31 and its satellites moving towards the Milky Way analogue and subhaloes outside the Local Group moving away, which results in the measured $H_0$ being exaggerated. At larger distances the measurement no longer has subhaloes bound to the Local Group analogue in it and it stabilizes to a value near the one in Fig.~\ref{fig:overdensity+H0}.

As the unbinned figure, Fig.~\ref{fig:shelledH0} also starts to show some variation in the value of $H_0$ near a distance of 7~Mpc for the bin centre. This is also likely at least partially related to the number of simulations from which the median is calculated getting smaller, but actual features in the data set can also affect the result. For example massive subhaloes moving relative to the Milky Way and hosting their own satellite systems can also result in anomalous values of $H_0$ when the distance range used in the measurement is small.

\begin{figure}
    \centering
    \includegraphics{kuvat/zeros.pdf}
    \caption{Median Hubble flow zero point distance from the Milky Way analogue, measured from a Hubble flow fitted to all subhaloes in a moving 2 Mpc bin in each simulation, accompanied with the ranges containing the middle 90\%, 75\% and 50\% of the measurements. As in Fig.~\ref{fig:shelledH0}, the zero point is calculated only in bins that do not contain type 2 particles and thus for large distances there are less measurements, resulting in the 90\% and 75\% ranges merging.}\label{fig:shelledZeros}
\end{figure}

%Mpc	uncontaminated sims
%0	199
%1	199
%2	198
%3	194
%4	173
%5	125
%6	69
%7	27
%8	11

% # of data points in each bin:
% [197 197 197 197 197 196 196 196 195 195 194 193 193 193 192 190 188 188
% 184 176 171 167 162 159 153 150 144 143 137 134 129 126 115 108 107 103
%  93  88  85  81  77  69  62  59  52  48  43  37  37  35  31  28  26  21
%  20  20  18  17  16  15  14  13  13  12  11  11  11  10   9   8   7   7
%   6   6   6   6   6   6   6   6]

In addition to the measured value of $H_0$, the zero point of the fitted Hubble flow can also change if the distance range within which the fit is calculated changes. This is explored in Fig.~\ref{fig:shelledZeros} showing how the Hubble flow zero point changes if it is measured from a fit to all subhaloes with radial distances in a 2~Mpc bin and the bin centre is moved. Unlike $H_0$, the zero point does not change drastically based on whether M31 analogue and its satellites are inside the bin limits. This is likely caused by the steepening of the fitted line getting cancelled out by the data to which the line is fitted being further from the origin. 

This plot also suggests that a volume with a relatively small radius of e.g. 5~Mpc or even less is sufficient to gain good fitting results, as there does not seem to be any systematic change in the zero point at large distances. As this result is compatible with the conclusions drawn from Figs.~\ref{fig:overdensity+H0} and \ref{fig:shelledH0}, a maximum distance of 5~Mpc is adopted for the Hubble flow fitting using the procedure described in the previous subsection. This allows using a constant upper limit for the range used in fitting while still being able to use more than half of all Local Group analogues in the analysis and without losing the accuracy of the fitting.

As with the $H_0$ plots, the zero point measurement results also vary more on larger distances. When looking at the plotted ranges, it is clear that in the outermost bins the scatter is very large. Even ``negative distances'' to the zero point are present, meaning that the fit would suggest that even the Local Group is expanding. These kinds of results are possible when the data set used in fitting no longer contains subhaloes that are bound to the Local Group analogue and the slope of the fitted line is more gradual and closer to the global value of 67.77~km/s/Mpc, at which point the zero point can move close to the Milky Way analogue or if the $H_0$ is very small even to ``negative'' distances. This means that the fit is no longer sensitive to the overdensity within the Local Group. Especially when only distant subhaloes are used, the negative radial velocities within the Local Group might not affect the fit significantly.


\section{Anisotropy of the Hubble Flow} \label{sect:hf-anisotropy}
Overall, the plots in the previous subsection suggest that the Hubble flows around the analysed Local Group analogues are dependent on the distance at which the value is measured, up to about 3~Mpc, after which the measured parameters no longer change. The possibility of directional dependencies, on the other hand, has been completely ignored up to now. In order to also investigate possible anisotropies in the Hubble flows, this subsection explores the distribution of subhaloes and their radial velocities on the sky as seen from the Milky Way analogue in each simulation.

In order to obtain comparable results from as many simulations as possible, the anisotropies are studied within a sphere with a radius of 5~Mpc centred at the Milky Way analogue. This offers a balance between having a sufficiently large volume to study without compromising the size of the available sample too much when simulations where type 2 particles are present in this volume are excluded from the analysis. The distribution of subhaloes at these scales is anisotropic: the subhaloes tend to be arranged in filaments and walls instead of being uniformly distributed. The velocities of these structures and the location of the Local Group relative to them can affect the observed radial velocities. The aim of this section is to find out whether the radial velocities of these subhaloes are also anisotropic.

\begin{figure}
    \centering
    \includegraphics{kuvat/mollweide-anisotropy.pdf}
    \caption{Subhaloes between 1.5 and 5.0~Mpc away from a Milky Way analogue, shown as they would be seen on the sky for an observer located within the Milky Way.  The radial velocity of each subhalo relative to the Hubble flow fit is shown with colour: red subhaloes are moving away from the observer at larger velocities than predicted by the Hubble flow fit and blue subhaloes at smaller velocity or possibly even approaching. The orientation of the coordinate frame in the plot is arbitrary.}\label{fig:mollweide-anisotropy}
\end{figure}

An example of a distribution of radial velocities is shown in Fig.~\ref{fig:mollweide-anisotropy}, where the positions of surrounding subhaloes in one simulation are shown as seen from the Milky Way analogue. This plot illustrates how the subhaloes form dense structures separated by voids with few to no subhaloes in them. In order to avoid obstructing the structures with the satellites of the Local Group primaries, the subhaloes are plotted starting at the distance of 1.5~Mpc. The colours of the dots in the plot indicate how the subhaloes move relative to the Hubble flow fitted to the data.

Visual inspection of the plot would suggest that the radial velocities might also be anisotropically distributed. For example, approximately between -45\textdegree\ and -60\textdegree\, near the vertical centreline of the plot, there is a collection of subhaloes coloured almost uniformly red, whereas most of the bluest subhaloes moving towards the observer at velocities of 450~km/s or more relative to the Hubble flow are located near each other at the right edge of the plot.

This kind of visual inspection, especially when done only to individual simulation outputs, cannot be the sole means of determining whether the radial velocities are anisotropically distributed. In the following subsections, two approaches to quantifying the possible anisotropy are introduced. First, in section \ref{sect:angular-dependencies}, the relationship between the radial velocities and the angle between the observed subhalo and the M31 analogue in the simulation is studied. Later, in section \ref{sect:clustering}, another approach using the clustering algorithm introduced in section \ref{sect:cluster-analysis} is used to study the structures surrounding the Local Group analogues.

\subsection{Angular Dependencies}\label{sect:angular-dependencies}
Many established ways of orienting the coordinate system around the observer are not usable in the dark matter only simulations analysed in this thesis. For example, the plane of the galaxy, let alone the orbit of anything analogous to the Sun, is not resolved. Thus no uniform coordinate system is used across the simulations in this work.

Instead, when a directional binning is desired, subhaloes are classified based on how they relate to the vector pointing from the less massive Local Group primary (the Milky Way analogue) to the more massive one (the M31 analogue) in each simulation. This is chosen as, due to the Local Group being dominated by the two primary galaxies, it is expected that near the Local Group analogues the radial velocity field is different in directions parallel and perpendicular to this vector. The movement of the Milky Way analogues relative to the mass centres of their corresponding Local Group analogues also affects the observed velocities.

As no another similarly justified coordinate axis that could be used with the line connecting the Milky Way and M31 analogues exists, when subhaloes are binned based on their observed location on the sky, the angle between the M31 and the subhalo (marked with $\phi$) is used as the only coordinate. Thus a subhalo located very near the M31 analogue on the sky would have a small value of $\phi$ and a subhalo that is located on the line connecting the Milky way and M31 analogues in the direction away from M31 as seen from the Milky Way would have $\phi = 180 \degree$.

\begin{figure}
    \centering
    \includegraphics{kuvat/directionalHF.pdf}
    \caption{Hubble constants (top panel) and Hubble flow zero points (bottom panel) measured in 20\textdegree\ bins in each simulation, the angle being measured as the angle between M31 and each subhalo as seen from the Milky Way. The dots mark the mean result and the whiskers show the standard deviation in each bin. Note that in the zero point distance plot the distance axis does not start from zero but from $-6$~Mpc. These negative values correspond to Hubble flows where the fit predicts a positive radial velocity at the Milky Way analogue.}\label{fig:directionalHF}
\end{figure}

This scheme will naturally result in bins with constant angular width having smaller volume when located near the line connecting the primaries and larger when closer to perpendicular to it, and thus it is not suitable for e.g. comparing numbers of subhaloes in each bin without scaling the counts according to the volume within each bin. Nonetheless, the binning works well for calculating quantities like Hubble flow parameters, as long as even the smallest bins are large enough to have a sufficient number of data points in them.

This kind of angular binning was used to produce Fig.~\ref{fig:directionalHF} that shows how $H_0$ and the zero point of the Hubble flow change as a function of $\phi$. For this plot, $20 \degree$ bins were used and the Hubble flows were fitted using subhaloes up to 5~Mpc away from the Milky Way analogue. Simulations that had their first type~2 particle at a distance of less than 5~Mpc were not used. Any bins containing fewer than 15 subhaloes were also excluded from the analysis. A separate Hubble flow fit was calculated for each bin in each simulation using the fitting procedure introduced in section~\ref{sect:hubblemeasurement}. The figure shows the means and standard deviations of the results in each bin.

From the plots, it can be seen that at distances of less than 5~Mpc the Hubble flow around the simulated Milky Way analogues is not isotropic. This is clear from looking at the zero point distance plot, where the three bins closest to the direction towards the M31 analogue have the mean zero point at more than 2~Mpc away from the Milky way analogue whereas when looking away from the M31 analogues, the mean value is negative, i.e. the radial velocity of the Hubble flow has not yet decreased to 0~km/s when the Milky Way analogue is reached, resulting in negative values of the zero point distance.

This is likely caused, to a great extent, by a combination of the observer being located at the Milky Way analogue instead of the centre of the Local Group and the Local Group selection criteria requiring the observer to be moving toward the M31 analogue. In the previous section, it was demonstrated that the Local Group clearly has an effect on the radial velocities of the surrounding subhaloes. The effect is likely centred approximately on the mass centre of the Local Group, with possibly some asymmetry caused by the Local Group being dominated by the two most massive galaxies. This means that even if the Hubble flow is isotropic as seen from the centre of the Local Group, the observer located in a Milky Way analogue sees an anisotropic velocity field. The movement of the Milky Way analogue further affects the results: if the centre of mass of the Local Group is assumed to be at rest relative to its surroundings, the movement of the Milky Way analogue towards it will add either a positive or negative radial velocity to all subhaloes within each bin. These effects cause the plot to appear like the measured values of zero point are offset to the right.

The differences in the $H_0$ plot are more subtle, but these measurements are not independent of the direction either. Perhaps the most prominent effect here is that when $H_0$ is measured near $\phi = 90\degree$, the obtained values tend to be smaller than when measured either towards or away from the M31 analogue. As the $1 \sigma$ region is fairly wide, the differences between measurements done near the ends of the allowed range are difficult to discern from this plot, though it appears that the values of $H_0$ measured towards the M31 analogue might be slightly higher than when measured away from it.

\begin{figure}
    \centering
    \includegraphics{kuvat/threeDirectionH0.pdf}
    \caption{The distribution of $H_0$ values when the Hubble flow is fitted using subhaloes near the M31 analogue on the sky ($\phi \leq 45 \degree$, blue histogram), in the opposite direction ($\phi \geq 135 \degree$, green histogram) or off-axis ($45 \degree < \phi < 135 \degree$, yellow histogram).}\label{fig:threeDirectionH0}
\end{figure}

In order to compare the values of $H_0$ when measured towards or away from the M31 analogue against each other or results obtained in the perpendicular direction, Fig.~\ref{fig:threeDirectionH0} shows the histograms of $H_0$ fitting results in three bins: $\phi \leq 45 \degree$, $45 \degree < \phi < 135 \degree$ and $\phi \geq 135 \degree$. This plot shows that the measurements done in either direction along the axis connecting the Milky Way and M31 analogues have the peak of the distribution at a higher value than the measurements in the off-axis direction. The highest measured values of $H_0$ are also more common in the two on-axis bins. The difference is especially clear between the measurements done in the $\phi < 45\degree$ and the off-axis bins, with the difference between the off-axis and $\phi > 135\degree$ bins being smaller.

The same observation can be made by comparing the medians of the $H_0$ measurements in each bin. The smallest median $H_0$ of 86.9~km/s/Mpc is measured in the off-axis bin, with the $\phi > 135\degree$ bin having a slightly higher median value of 89.9~km/s/Mpc. The highest value was measured in the bin with $\phi < 45\degree$ where the median $H_0$ was 94.2~km/s/Mpc. This agrees well with what is seen in Figs.~\ref{fig:directionalHF} and \ref{fig:threeDirectionH0}, hinting that the Hubble flows around the Milky Way analogues are indeed anisotropic at the examined distance scales.


\subsection{Clustering} \label{sect:clustering}
One way of studying the anisotropies of the Hubble flow in more detail is to apply a clustering algorithm to the subhaloes in each simulation and then fit the Hubble flow separately to subsets of haloes depending on their cluster memberships. In this work, clustering is used to find groups of subhaloes located near each other on the sky as seen from the Milky Way analogue. This is done with the DBSCAN algorithm introduced in section~\ref{sect:cluster-analysis} using the angular distance, measured in radians, as the distance metric. Subhaloes located within 1.5~Mpc from the Milky Way analogues are excluded from the clustering in order to avoid structures within the Local Group obstructing more distant structures.

\begin{figure}
    \centering
    \includegraphics{kuvat/clusteringParameters.pdf}
    \caption{Mean number of clusters found with different DBSCAN parameters in a simulation, calculated from all simulations that have no type 2 particles within 5~Mpc of the Milky Way analogue.}\label{fig:clusteringparameters}

	\bigskip

    \includegraphics{kuvat/clusterDiameter.pdf}
    \caption{Mean diameter of clusters found with different DBSCAN parameters, the diameter of a cluster being defined as the largest distance between a pair of points that are members of the cluster. The white area in the lower right corresponds to the area where no clusters were found. }\label{fig:clusterdiameter}
\end{figure}

The algorithm has two free parameters, MinPts and $\varepsilon$ (see section \ref{sect:cluster-analysis}), values of which have to be chosen before the clustering is performed. The effect of changing the parameters on the mean number and diameter of the found clusters is explored in Figs.~\ref{fig:clusteringparameters} and~\ref{fig:clusterdiameter}. As can be seen from Fig.~\ref{fig:clusteringparameters}, for small values of MinPts the number of found clusters per simulation tends to be very high as a cluster can contain only e.g. a single subhalo or a pair of subhaloes. Only when $\varepsilon$ is set to a very high value does the number of found clusters get smaller as the clusters start to merge, up to a point when nearly all subhaloes form a single cluster.

When MinPts is increased, the number of found clusters decreases as clusters with less dense central regions no longer qualify as clusters. The only exception to this is that at high values of $\varepsilon$, increasing the required MinPts can break large clusters with multiple dense regions into two or more smaller clusters that are no longer connected with core points. These changes are also reflected in Fig.~\ref{fig:clusterdiameter}, where merging and separating of clusters affects the mean diameter. Increasing MinPts has little effect on the diameters only for small values of $\varepsilon$ , which is explained by clusters with small $\varepsilon$ being either small or non-existent for all values of MinPts.

\begin{figure}
    \centering
    \includegraphics{kuvat/clusteringExamples.pdf}
    \caption{Results of DBSCAN clustering runs with different parameters on a single simulation. Subhaloes in each cluster are depicted with coloured dots and subhaloes that do not belong to any cluster are shown with small black dots. In the left column, $\varepsilon$ increases from top to bottom whereas MinPts stays constant and in the right column $\varepsilon$ is constant and MinPts changes.}\label{fig:clusteringExamples}
\end{figure}

These effects can also be seen when comparing clustering results when different clustering parameters are used in one simulation. Fig.~\ref{fig:clusteringExamples} shows six different clusterings for a simulation, illustrating the differences that arise when the value of either of the parameters is altered. When $\varepsilon$ grows in the left column, the most notable effect is that clusters get larger and some merge with other clusters. New clusters are also identified, such as the violet cluster in the lower region of the middle plot that is not present in the top plot. In the right column, increasing the value of MinPts has an opposite effect: clusters get smaller and some clusters disappear altogether while others are divided into separate clusters.

Figs.~\ref{fig:clusteringparameters}--\ref{fig:clusteringExamples} aid in choosing the values of the parameters by revealing clearly poor combinations that lead to nearly all subhaloes belonging to one cluster or to a large number of small clusters with only a few members in each. Still, determining the best values for $\varepsilon$ and MinPts requires using tools introduced in section~\ref{sect:cluster-analysis}. As the clustering is done based on the position of the subhaloes as seen on the sky, the data set is two-dimensional and thus the recommended value of MinPts is 4, which is used in the analysis.

\begin{figure}
    \centering
    \input{kuvat/tex/4-distances.pdf_tex}
    \caption{Sorted 4-distance graphs for subhaloes between $1.5$ and $5.0$ Mpc from the Local Group in each simulation that is uncontaminated at least up to 5 Mpc. Each line shows the distances from each particle in one simulation to its \nth{4} nearest neighbour in ascending order. The plotted distances are measured as the angular distance between the subhaloes as seen from the Milky Way analogue.
    }\label{fig:4-distances}
\end{figure}

The accompanying value of $\varepsilon$ is selected using a 4-distance graph (see section~\ref{sect:cluster-analysis}) shown in Fig.~\ref{fig:4-distances}. The lines representing different simulations are fairly tightly packed near the elbow in the plot, suggesting that using the same value of $\varepsilon$ for all simulations produces acceptable results. Due to the scatter of the lines, selecting the optimal value is challenging, though the elbow seems to be located near the value $\varepsilon = 0.16$.

\begin{figure}
    \centering
    \includegraphics{kuvat/smallClusteringVariations.pdf}
    \caption{The effect of slightly variying the clustering parameters around the values $\varepsilon=0.16$ and $\mathrm{MinPts}=4$ chosen to be used with the DBSCAN algorithm. The changes have a small effect on both the number and size of the clusters, causing clusters to merge, separate, appear and disappear in addition to making individual subhaloes near the edges of clusters to join to or detach from clusters.}\label{fig:clusteringvariations}
\end{figure}

The resulting clustering with parameters $\varepsilon=0.16$ and $\mathrm{MinPts}=4$ appears to produce reasonable results in Fig.~\ref{fig:clusteringExamples}. In order to assess effects of small changes in the clustering parameters, Fig.~\ref{fig:clusteringvariations} shows the clustering in one simulation with $\varepsilon=0.16$ and $\mathrm{MinPts}=4$ together with clusterings that have slightly smaller or larger values of $\varepsilon$ or MinPts. Differences that arise from changes in MinPts seem to be small and in this case manifest only in the smallest cluster disappearing when MinPts is increased to 5 subhaloes.

Changing the value of $\varepsilon$ has a more significant effect as it causes clusters to merge or separate. In the plot with $\varepsilon = 0.14$, a very small cluster has separated from the light blue cluster near the centre of the plot and a couple of points are no longer within the clusters they used to belong to. On the other hand, choosing a larger value for the $\varepsilon$ causes the light blue cluster to absorb most of its neighbouring clusters.

None of the changes in the clustering result produced by adjusting either of the clustering parameters seem to motivate changing the parameters from the suggested values, so in this work the DBSCAN clustering is performed using $\varepsilon=0.16$ and $\mathrm{MinPts}=4$. The distance range of the subhaloes is also kept constant: only subhaloes that are 1.5--5.0~Mpc away from the Milky Way analogue are included in the clustering and any further analysis done based on the clustering results. Thus when Hubble flow fitting is done in conjunction with clustering, the  fit is also calculated using all data points inside this range.

\begin{figure}
    \centering
    \includegraphics{kuvat/mollweide+hubble.pdf}
    \caption{The clusters found by DBSCAN in one simulation shown both on the sky as seen from the Milky Way analogue (top panel) and as members of the Hubble flow (bottom panel), each cluster shown with its own colour and subhaloes that are not part of any cluster as black dots. The clusters that are separate from the bulk of the subhaloes in the Hubble flow plot, i.e. the orange cluster and its pink and brown neighbours, are also located near each other and clearly apart from the other clusters on the sky.}\label{fig:mollweide+hubble}
\end{figure}

When clustering of the subhaloes in a simulation has been done, the clustering result can be used to analyse the possible anisotropies of the Hubble flow. A visual inspection of Hubble flow plots for each simulation with cluster memberships of the subhaloes indicated with colours suggests that in many simulations the Hubble flow parameters are dependent on the viewing direction. This kind of plot for one simulation is shown in Fig.~\ref{fig:mollweide+hubble}, where the top panel shows the subhaloes on the sky as seen from the Milky Way analogue and the bottom panel shows a Hubble diagram for the subhaloes. The subhaloes in both panels are coloured so that each colour corresponds to one cluster identified by DBSCAN and subhaloes that do not belong to any cluster are shown as small black dots. 

In Fig.~\ref{fig:mollweide+hubble}, the difference between different clusters is clearly visible. For example, the orange cluster and the smaller pink and brown clusters are located near each other on the sky and they are also clearly above the rest of the clusters in the Hubble flow plot. If these two branches of the Hubble flow were each fitted separately, they would likely have fairly similar values of $H_0$, but whereas the lower branch transitions from negative to positive radial velocity near 2~Mpc distance, the upper branch has its zero point at a much smaller distance.

The scatter around each Hubble flow also seems to vary from cluster to cluster. For example the subhaloes in the orange cluster seem to follow a linear relation between radial velocity and distance quite tightly whereas at large distances the darkest of the blue clusters shows some scatter at all distances, but especially at distances larger than 4.5~Mpc the radial velocities differ by hundreds of km/s. The sudden increase in the scatter likely accompanies a massive subhalo that has a satellite system of its own, resulting in large variation in radial velocities over a small distance range.


\subsection{Hubble Flow Measurements on Clusters} \label{sect:clustermeasurements}

\begin{figure}
    \centering
    \includegraphics{kuvat/clusteredHFparameters.pdf}
    \caption{Hubble constant and zero point results when the subhaloes used in fitting are filtered based on their cluster membership status. The vertical line shows the median of the results from different simulations with the box covering the interquartile range of the data. The whiskers extend up to the extreme data points or a maximum of 1.5 times the interquartile range, whichever is smaller. If there are data points outside the whiskers, they are shown as black circles. There are also some outliers that fall outside the range of the plotted range in the zero point plot: two for subhaloes in clusters ($-8.9$ and $-7.8$~Mpc), three for subhaloes that do not belong to any cluster ($-8.0$, $-4.7$ and $13.9$~Mpc) and three for clustered subhaloes with the mass criteria ($-19.6$, $-17.2$ and $3.8$~Mpc).}\label{fig:clusteredHFparameters}
\end{figure}

In order to avoid deriving results by examining all simulations by eye, for further analysis of the Hubble flows the subhaloes were divided into two categories per simulation: the ones that belong to a cluster identified by DBSCAN and those that do not. For quick inspection, Fig.~\ref{fig:clusteredHFparameters} shows boxplots of the Hubble flow fitting results when subhaloes in each simulation are divided to these two categories. For comparison, the figure also includes results obtained when all subhaloes are used in fitting. Finally, both panels show fitting results when clusters with at least one member that has a mass of $8 \times 10^{11} \mathrm{M}_{\astrosun}$ or more are excluded from the analysis. This is included as it can offer information about the effect of massive subhaloes, possibly with their own satellite systems, within the Hubble flow.

As more than one cluster can be identified in a simulation, there are multiple possible ways of determining the Hubble flow parameters. In this work, a Hubble flow is fitted to all subhaloes in each cluster containing at least 10 subhaloes, separately for each cluster, and the $H_0$ and zero point values for a simulation are determined as the mean of the values from individual clusters. Fortunately the simulation shown in Fig.~\ref{fig:mollweide+hubble} is fairly typical in the sense that clusters with at least 10 members are usually fairly large and they often cover a substantial range of distances in the allowed range of $1.5$--$5.0$~Mpc, and thus the individual fits should be reasonably reliable. The averaging also reduces the impact of a single anomalous fitting result as most simulations have five or more clusters with at least 10 members.

One simulation is excluded from this analysis. The Hubble flow in it has a fairly low velocity dispersion up to around 3.5~Mpc and the radial velocity grows linearly, after which the scatter increases to around 400~km/s on all distances up to the first type 2 particle, with most of the subhaloes in this region having radial velocities that are smaller than a linear fit up to 3.5~Mpc would predict. When a Hubble flow is fitted in this simulation using all subhaloes between 1.5 and 5.0~Mpc, this results in a value of $H_0$ as low as only $4.3$~km/s/Mpc, which does clearly not resemble the case of the surroundings of the real Local Group. This anomalous value of $H_0$ also results in a zero point distance that is not physically meaningful, and thus it is not shown in either panel of Fig.~\ref{fig:clusteredHFparameters}.

Fig.~\ref{fig:clusteredHFparameters} shows that differences between the medians of both zero point distances and $H_0$ values vary fairly little between the different categories of subhaloes: all of the medians are within the interquartile ranges of the corresponding measurements done using all subhaloes. For $H_0$ measurements, only the values measured from subhaloes that are not members in any cluster have a slightly lower median than the rest of the measurements while the other three medians are very close to each other. For the zero point distances, both clustered categories have slightly lower medians than the other two categories.


The boxplots show some differences between the categories, but do not allow determining whether the differences arise from the characteristics of the samples or are just effects of random sampling. A conclusion can be derived by applying the statistical testing methods introduced in section~\ref{sect:statistical_tests}. Assuming that the $H_0$ and zero point values are normally distributed is not justified, so a test that can be used with two arbitrarily distributed data sets is an optimal choice. Thus the Kolmogorov--Smirnov test, described in section~\ref{sect:ks}, is applied. The test is applied to pairs of categories, and a null hypothesis ``these two data sets are drawn from the same distribution'' is used with a significance level $\alpha$ of 0.01. All other categories are tested against the results obtained using all subhaloes, in addition to which the two clustered categories, with and without the mass criterion, are tested against each other.

For the $H_0$ measurements, only the fitting results obtained using subhaloes that do not belong to any cluster differs significantly from the results from all subhaloes at this significance level (p-value of 0.002), whereas the other pairs of measurements have p-values of more than 0.01. The zero point distances measured using either category of subhaloes in clusters, on the other hand, do differ significantly from the distances obtained using all subhaloes, whereas all other tested pairs show no statistically significant difference. Introducing the mass criterion to the clustered subhaloes did not produce a significant change in either Hubble flow parameter when compared to the clustered subhaloes without mass criterion.

All in all, the differences between the categories in Fig.~\ref{fig:clusteredHFparameters} are small. In addition to $H_0$ and zero point, it is also interesting to determine how the velocity dispersion around the fitted Hubble flow depends on which subhaloes are used in each simulation. In this work, this was determined by subtracting the velocity predicted by the Hubble flow fit from velocities of all subhaloes used in fitting and then taking the sample standard deviation of the differences. For clustered categories, this is done separately for each cluster that exceeds the subhalo number threshold, and the velocity dispersion for the whole simulation is defined as the mean of the values for the individual clusters.

\begin{figure}
    \centering
    \includegraphics{kuvat/clusteredHFdispersions.pdf}
    \caption{Velocity dispersions around the Hubble flows fitted to the same four categories of subhaloes as in Fig.~\ref{fig:clusteredHFparameters}. For the clustered categories, a Hubble flow is fitted and the velocity dispersion calculated separately for each cluster with at least 10 subhaloes and the velocity disperison for the whole simulation is defined as the mean of the values for individual clusters.}\label{fig:clusteredHFdispersions}
\end{figure}

A boxplot of these velocity dispersions for the same four categories as in Fig.~\ref{fig:clusteredHFparameters} is shown in Fig.~\ref{fig:clusteredHFdispersions}. From this plot, it is clear that the velocity dispersion is more sensitive to changes in the set of subhaloes used in fitting than the Hubble flow parameters are. In fact, when using the Kolmogorov--Smirnov test and the same criteria for statistical significance as with the Hubble flow parameters, all pairs of velocity dispersion distributions differ from each other significantly with p-values smaller than $10^{-9}$.

For simulations resembling the one in Fig.~\ref{fig:mollweide+hubble}, where the velocity dispersion within each cluster is considerably smaller than between separate clusters, it is anticipated that the velocity dispersion in clustered categories is smaller than when it is calculated for all subhaloes in the simulation. According to Fig.~\ref{fig:clusteredHFdispersions} this seems to be a common situation as both of the clustered categories show smaller velocity dispersions than the values obtained using all subhaloes in each simulation.

When clusters with at least one member that has a mass of more than\break${8 \times 10^{11}~\mathrm{M}_{\astrosun}}$ are excluded, the velocity dispersions decrease further. This is likely caused by the most massive subhaloes often having their own satellite systems with a large velocity dispersion, so excluding them brings the velocity dispersion down. The subhaloes that do not belong to any cluster also produce somewhat smaller velocity dispersions compared to all subhaloes, suggesting that the outlier subhaloes tend to belong to their own clusters with possibly small internal velocity dispersion.

The fact that Figs.~\ref{fig:clusteredHFparameters} and \ref{fig:clusteredHFdispersions} show variation both within each category and between different categories suggests that they can contain information about the surroundings of the Local Group analogues. Thus it is possible that they are also correlated to its mass and can be used to gauge the mass of the real Local Group by observing the galaxies around it.

%% Without 299 (current case):
% minimum of 19 unclustered
% one simulation with no clusters with maximum subhalo mass less than 8e11

%Outliers:
%allH0s: []
%inClusterH0s: []
%outClusterH0s: []
%massCutH0s: []
%allZeros: []
%inClusterZeros: [-8.9767548409940083, -7.8088226386120771]
%outClusterZeros: [-8.0111344090876742, 13.948152600724681, -4.6546938793249391]
%massCutZeros: [-19.592030420767582, 3.8009152628627074, -17.236259847473118]
%
%all vs out H0
%Ks_2sampResult(statistic=0.23529411764705885, pvalue=0.0021843678266674133)
%all vs in H0
%Ks_2sampResult(statistic=0.20168067226890762, pvalue=0.013337790957546882)
%all vs mass cut H0
%Ks_2sampResult(statistic=0.13409770687936193, pvalue=0.22001875645197524)
%in vs mass cut H0
%Ks_2sampResult(statistic=0.1217775245691497, pvalue=0.32294487474492201)
%
%all vs out zeros
%Ks_2sampResult(statistic=0.1428571428571429, pvalue=0.1618262031903146)
%all vs in zeros
%Ks_2sampResult(statistic=0.22689075630252103, pvalue=0.0035245070788159968)
%all vs mass cut zeros
%Ks_2sampResult(statistic=0.22261786070360351, pvalue=0.0045795500673263554)
%in vs mass cut H0
%Ks_2sampResult(statistic=0.08567155675829663, pvalue=0.75889326913208843)
%
%all vs out dispersion
%Ks_2sampResult(statistic=0.42857142857142855, pvalue=2.9872668415791548e-10)
%out vs in dispersion
%Ks_2sampResult(statistic=0.69747899159663873, pvalue=1.8904500291770192e-26)
%in vs mass cut dispersion
%Ks_2sampResult(statistic=0.42671984047856432, pvalue=3.9839757899044583e-10)


\section{Statistical Estimate of the Local Group Mass}\label{sect:statistical_estimate}
Numerous variables have been measured from the simulation sample in the previous sections. In this section, a selection of these variables are used to estimate the mass of the Local Group. The goal is to improve on the accuracy of the timing argument (see section~\ref{sect:timing-argument}) by allowing more variables to be used in the prediction. The analysis is carried out on the 119 halo catalogues that contain a Local Group analogue that is surrounded by a 5~Mpc region free of type 2 and 3 particles. The simulation that was excluded in section~\ref{sect:clustermeasurements} due to the anomalous environment of the Local Group is not part of this sample.

Out of all the variables that can be determined from the available dark matter only simulations, twelve were chosen to be used in this analysis. Three of these are the distance together with the radial and tangential velocity components of the M31 analogue as seen from the Milky Way analogue. The remaining nine consist of the value of $H_0$, the distance to the Hubble flow zero point and the velocity dispersion around the Hubble flow, all of which are measured separately for all subhaloes, subhaloes that belong to a cluster and subhaloes that are not members of any cluster. Although including e.g. the Hubble flow measurements done on the clusters filtered using the maximum subhalo mass in the cluster could improve the results, only variables that are reasonably easy to measure for a large number of objects are included in the analysis. All variables related to the Hubble flow were measured as described in section~\ref{sect:clustermeasurements}.

In order to quickly gain insight into the correlations of these variables with the Local Group mass, Fig.~\ref{fig:pca-scattermatrix} presents a scatterplot matrix containing the Local Group mass, the mass predicted using the timing argument and the twelve measured variables. For each pair of variables, both scatter plot and the Pearson correlation coefficient are shown. In this work, the mass of the Local Group in a simulation is defined as the combined mass of the dark matter haloes corresponding to the Milky Way and M31 analogues. In order to avoid outliers dominating the plot, the most extreme outliers have been excluded based on their Hubble flow zero point distances: if any of the three measured zero point distances is larger than 10~Mpc or smaller than $-2$~Mpc, the simulation is not shown in any panels. This resulted in four simulations not being shown in this plot. Each variable has also been normalized so that its mean is zero and variance unity.

\begin{figure}
    \centering
    \includegraphics{kuvat/PCA/scattermatrix-tightOutlierCriteria.pdf}
    \caption{A matrix of scatterplots (upper right) and correlation coefficients (lower left) exploring the correlations between the measured variables and the mass of the Local Group (LG) analogue. The mass predicted by the timing argument (TA) is also shown. The numbers in the row labels correspond to the numbered categories in the column labels. In the labels, Hubble flow is abbreviated as HF. To emphasize the correlations between variables, four simulations with anomalous Hubble flow zero point distances are excluded from the plot. The correlation coefficients are colour-coded so that darker colour corresponds to stronger correlation.}\label{fig:pca-scattermatrix}
\end{figure}

From this plot, it is clear that no individual variable is strongly correlated with the Local Group mass, but weak correlations can still be observed. Especially the radial velocity of M31 and Hubble flow zero measurements both for all subhaloes and the ones in clusters seem to be weakly correlated with the mass. This correlation is nearly as strong as between the timing argument estimate and the real mass of the system. This suggests that it is likely possible to construct a model with accuracy exceeding that of the timing argument.

Surprisingly, the distance between the Milky Way and M31 analogues does not seem to be correlated to the Local Group mass practically at all, despite it being a key measurement when the timing argument is used. The lack of correlation between tangential velocity of M31 and the mass, on the other hand, is less surprising but also fortunate, as tangential velocity is difficult to measure. For all Hubble flow measurements, the values measured outside clusters show weaker correlations with the mass than their counterparts measured using all or clustered subhaloes. Despite some of the variables  not being directly correlated with the mass of the Local Group, none were excluded and all twelve are used in the following analysis.

The scatter matrix in Fig.~\ref{fig:pca-scattermatrix} also reveals that many of the measured variables are correlated with each other. For example $H_0$ and the Hubble flow zero point distance are clearly correlated with each other, as are pairs of different Hubble flow velocity dispersion measurements with each other. This can cause problems when a linear model is fitted to the data, as correlated predictors can both increase noise and produce wildly different results based on which predictors are included in the analysis. 

In order to avoid these problems, a new set of uncorrelated variables can be generated from the original predictors. In this work, this is done by extracting the principal components as described in section~\ref{sect:pca}: each variable is first standardized to have unit variance and zero mean and the principal components are then extracted using the \texttt{PCA} class provided in \texttt{decomposition} module of scikit-learn~version~0.17.0 \citep{scikit-learn}. As these principal components are guaranteed to be orthogonal, the problems that arise from the correlations among the original predictors are avoided.

Often most of the variation in the data set is covered by the first few principal components, especially when the variables are correlated. In these cases, including all components in the regression analysis is not beneficial and can even reduce the accuracy of the model. To aid choosing the best number of principal components to retain, Figs.~\ref{fig:pca-cumulative-variance} and \ref{fig:scree} explore the amount of variation in the data explained by each principal component.

\begin{figure}
    \centering
    \includegraphics{kuvat/PCA/cumulative_variances.pdf}
    \caption{The cumulative variance explained by the first principal components.}\label{fig:pca-cumulative-variance}
\end{figure}

As can be seen in Fig.~\ref{fig:pca-cumulative-variance}, the first few principal components only account for a fairly small amount of the total variation in the data set with even the first component explaining just slightly over 20~\% of it. Still, the cumulative amount increases steadily, and after the \nth{7} component more than 90~\% of the variance is explained. The last components contain very little of the total variation in the data set, which suggests that at least some number of components can be safely excluded when constructing the model for predicting the mass of the Local Group.

As explained in section~\ref{sect:pca-excluding}, a scree plot showing the amount of variance explained by individual principal components can aid in choosing the number of components to be used in further analysis. A scree plot for this data set is presented in Fig.~\ref{fig:pca-scree}. Unfortunately it does not contain a distinctive elbow that would indicate a suitable number of principal components to be retained. However, the first two components explain roughly equal amount of variance and after them the plot has a considerable drop, which suggests that two components might be a good number to be retained.  

\begin{figure}
    \centering
    \includegraphics{kuvat/PCA/scree.pdf}
    \caption{A scree plot showing the amount of total variance explained by each principal component.}\label{fig:pca-scree}
\end{figure}


Although the explained variances did not offer a clear choice for the number of principal components to be used in mass estimation, the accuracies of the mass estimates obtained using different numbers of principal components can be compared to find the best value. When all available data is used to construct the model, it is possible to overfit the model to perform well on the values in the test set but possibly poorly on other data sets as is explained in section~\ref{sect:cross-validation}. In order to make recognizing potential overfitting possible, 40~\% of the data set was designated as the training set and set aside to be used in evaluating the accuracy of the final model. The training set, consisting of the remaining 60~\% of the data, was then standardized and the principal components extracted. The means and standard deviations for repeating the standardizing for other data sets are presented in table~\ref{tab:standardizing} and the weights that define these principal components in table~\ref{tab:PCs}.

In order to avoid overfitting in the training set, the same data points are not used for both fitting and calculating the error of the fit. Instead, 10\=/fold cross-validation (see section~\ref{sect:cross-validation}) is performed for each number of principal components. 
This means that for each number of principal components, ten linear fits were constructed such that each fit used 90~\% of the available data for fitting. The remaining 10~\% was then used to test the accuracy of the model by calculating the root-mean-square error of its predictions. The RMS errors (black dots) in this training phase are shown in Fig.~\ref{fig:pca-rmse-training}, together with the RMS error of timing argument mass estimates on the same data points (red line).

\begin{figure}
    \centering
    \includegraphics{kuvat/PCA/training-RMSE.pdf}
    \caption{RMS error of Local Group mass estimates in the training set using 10\=/fold cross-validation (black dots), together with the RMS error of the timing argument applied to the same simulations (red line).}\label{fig:pca-rmse-training}
\end{figure}

From this plot we can see that linear regression using any number of principal components performed better than the timing argument. However, the smallest RMS error is achieved when using 10 principal components. The differences between different numbers of principal components are again small, and therefore the small increase in performance does not seem to justify including this many principal components. The second best value is obtained when two principal components are used, and thus the final model is chosen to be fitted using two components.

When the first two principal components are used to predict the mass of the Local Group using the training set, the regression coefficients for them are approximately $\beta_1 = 2.392 \times 10^{11}$ and $\beta_2 = -1.968 \times 10^{11}$ such that $\beta_1$ corresponds to the first principal component $x_1$ and $\beta_2$ to the second component $x_2$. With these coefficients, the estimated Local Group mass can be expressed as
\begin{equation} \label{equ:pcr}
	M_{LG} = \beta_1 x_{1} + \beta_2 x_{2}
\end{equation}
in units of $\mathrm{M}_{\astrosun}$. 

This model can be used to predict the mass of any Local Group analogue for which the 12 variables are measured. This requires first scaling the data set using the scaling parameters (table~\ref{tab:standardizing}) learned from the training set and projecting the scaled data points to the plane defined by the first two principal components (table~\ref{tab:PCs}). The regression model given in equation \ref{equ:pcr} can then be applied to the data to obtain an estimate of the Local Group mass.

The model was then applied to the test set. As the real masses for these simulated systems are known, the errors of the estimates can be calculated, producing a RMS error of $1.06 \times 10^{12}\ \mathrm{M}_{\astrosun}$. For comparison, timing argument had a RMS error of $1.42 \times 10^{12}\ \mathrm{M}_{\astrosun}$ in the same data set. %and ordinary least squares estimate on the 12 original variables without extracting principal components $1.121 \times 10^{12}\ \mathrm{M}_{\astrosun}$.
This corresponds to a 25~\% improvement over the timing argument performance in the test set, though as the difference between the two methods was clearly smaller in the training phase, it is possible that the difference is exaggerated by e.g. the test set containing many simulations where timing argument performs exceptionally poorly.

Nonetheless, it is clear that, at the expense of measuring more variables, it is possible to use tools of statistics and machine learning to obtain better estimates of the mass of the Local Group in a cosmological simulation. Unfortunately, applying the same method to observations is more complicated. First of all, all analysis in this thesis is based on dark matter haloes, some of which would not host a galaxy and thus be observable with current technology. This means that at least the clustering parameters need to be adjusted and even then the number of spatially reasonably sized DBSCAN clusters with at least ten members would be reduced. Furthermore, not all galaxies within the distance range of 1.5 to 5.0~Mpc used in this thesis are currently known and the probability of a galaxy being observed is not uniform across the sky.

Nonetheless, given that radial velocities and distances of sufficiently many galaxies over a sufficiently large fraction of the sky are available, together with the velocity vector and distance of M31, preforming the analysis steps presented in this thesis is straightforward. Further study could also improve the accuracy of the model if e.g. possible non-linear relationships between variables are considered. It is also possible that some variables, e.g. the tangential velocity of the M31 analogues, could be excluded from the model without much loss of accuracy or that including new variables would improve the predictions.


\chapter{Conclusions}
In this thesis, a sample of 448 subhalo catalogues from cosmological simulations were analysed and unique Local Group analogues within them identified. The properties of the resulting 199 subhalo pairs resembling the Local Group were then analysed, revealing that they offer a good coverage of the parameter space allowed by the Local Group identification criteria. The Hubble flows surrounding the systems were also analysed for the 120 simulations that were free of type 2 and 3 particles in a sphere with 5~Mpc radius surrounding the Local Group analogue. For Hubble flow analysis, a procedure for choosing the best range of subhaloes to be included in fitting was also developed.

The dependence of the Hubble flow slope and zero point on the distance and direction as seen from the Milky Way analogues was also investigated. As expected, the fitting results remained nearly constant at distances greater than 3~Mpc from the Milky Way analogues, where the Local Group no longer has a significant effect on its surroundings, but the directional results were more interesting. The Hubble flow zero points fitted to subhaloes located near the M31 analogues as seen from the Milky Way analogues tend to be much closer compared to fits calculated using subhaloes in other directions. This is related to the location and movement of the Milky Way analogues within the Local Group analogues. The values of $H_0$ also showed signs of anisotropy with values measured both towards and away from the M31 analogues tending to be larger than ones fitted using subhaloes in the intermediate region.

For further analysis of the anisotropy of the Hubble flow, the DBSCAN clustering algorithm was used to identify subhaloes located near each other on the sky. After determining the best clustering parameters for this data set, the clustering was performed for subhaloes with distances between 1.5 and 5.0~Mpc from the Milky Way analogue in each subhalo catalogue with no type 2 or 3 particles in this region. Then separate Hubble flows were fitted for haloes that did not belong to any cluster, haloes in each individual cluster and haloes in each individual cluster with all members less massive than $8 \times 10^{11}\ \mathrm{M}_{\astrosun}$. In addition to the $H_0$ and zero points of the Hubble flow, the velocity dispersion around it was measured for all categories of subhaloes in each simulation.

The Kolmogorov-Smirnov test was then used to compare the measurements done using all haloes to all other categories, and the clustered categories with and without the mass criterion to each other. Significance level $\alpha = 0.01$ was used to test the null hypotheses that the studied pairs of measurement samples are drawn from the same distribution. With $H_0$ measurements it was possible to reject the null hypothesis that the values obtained using haloes outside clusters come from the same distribution as values for all haloes, whereas for the Hubble flow zero point measurements, both clustered categories differed from the ones obtained using all haloes. The velocity dispersions showed most variation between categories with all tested pairs resulting in rejection of the null hypothesis.

Finally, the variables measured previously were used to construct a model for predicting the combined mass of the pair of primary subhaloes in Local Group analogues based on the properties of the pair and the Hubble flow surrounding it. This was done by extracting the principal components of the variables and fitting a linear regression model to the first two components. When tested on a portion of the data set not used in fitting, this model produced a root-mean-square error of $1.06 \times 10^{12}\ \mathrm{M}_{\astrosun}$, a clear improvement over the timing argument, which had a root-mean-square error of $1.42 \times 10^{12}\ \mathrm{M}_{\astrosun}$ on the same data set. This is also comparable to many other methods of estimating the mass of the Local Group and its members introduced in section~\ref{sect:other_mass_estimation}, and it should be noted that the estimates with much smaller standard deviations tend to be inconsistent with each other, suggesting that some of the errors might be underestimated.

This acts as a clear demonstration that the tools of statistics and machine learning can serve as a valuable aid for estimating the mass of the Local Group, despite the model constructed in this thesis being very limited. For example refining the list of included parameters and using domain knowledge to improve the model was left for future work. This could mean e.g. studying the effects of leaving some of the measured variables out of the model and modelling non-linear relationships. Due to the dark matter only nature of the simulations, no variables related to baryonic processes are included in the model even though they could both offer valuable information about the system and affect the dark matter haloes hosting the baryons. Repeating the analysis on a larger data set could also aid in determining whether an optimal number of principal components was chosen to be retained and to possibly yield more exact results and more accurate estimate of the error, as dividing the sample of 119 simulations into training and test sets results in both sets being quite small.

Applying the model to the real Local Group also requires further work. The clustering parameters are currently optimized for the dark matter only simulations used within this thesis, containing subhaloes that would likely not contain an observable galaxy, and thus the parameters need to be tweaked before applying the clustering to any observational data. In addition to this, observational coverage of the 5~Mpc region surrounding the Local Group is not as complete as the subhalo catalogues from a simulation are, as, in addition to observing the possibly faint objects, determining distances and masses of distant objects accurately is challenging. Also, even the velocity vector of M31, a very easy object to observe, has uncertainty of tens of km/s \citep{vandermarel2012m31, vandermarel2018first}.

Nonetheless, having the tools for analysing the observations swiftly as they become available is valuable. This approach is also applicable for incomplete data, making it a usable tool even at present time. Further, as the Local Group is a very special environment in the Universe, this kind of statistical analysis has to be conducted on simulated data, for which the relatively quick-to-run dark matter only simulations used in this thesis offered an excellent opportunity. Now that the core concepts have been tested, it is in principle simple to apply the same techniques to outputs of hydrodynamical simulations.


\clearpage
\addcontentsline{toc}{chapter}{Bibliography}
\bibliographystyle{plainnat}
\bibliography{lahteet}

\newpage
\appendix
\chapter{Model Parameters} \label{app:parameters}
\begin{table}[ht]
	\centering
	\begin{tabular}{ r | c c c }
	    	Variable & Mean & Standard deviation & Unit \\
	    	\hline
			$H_0$ & 91.7489 & 13.3984 & km/s/Mpc \\
			$H_0$ (clustered) & 90.2513 & 9.1087 & km/s/Mpc \\
			$H_0$ (not clustered) & 83.4606 & 18.8621 & km/s/Mpc \\
			HF zero & 1.4031 & 0.4303 & Mpc \\
			HF zero (clustered) & 1.2594 & 0.4870 & Mpc \\
			HF zero (not clustered) & 1.2855 & 0.6214 & Mpc \\
			$\sigma_{radvel}$ & 107.0503 & 27.7604 & km/s \\
			$\sigma_{radvel}$ (clustered) & 99.2486 & 23.4161 & km/s \\
			$\sigma_{radvel}$ (not clustered) & 83.6529 & 25.8087 & km/s \\
			$v_{r, M31}$ & 99.2486 & 23.4161 & km/s \\
			$v_{t, M31}$ & 26.6303 & 10.7823 & km/s \\
			$r_{M31}$ & 0.8538 & 0.0753 & Mpc \\
	\end{tabular}
	\caption{Standardization parameters learned from the training set containing 60~\% of the full data set, for the twelve variables used to construct the principal components. The names and abbreviations are same as used in Fig.~\ref{fig:pca-scattermatrix} and the surrounding text.} \label{tab:standardizing}
\end{table}

\setlength\tabcolsep{0.25em}
\begin{table}
	\centering
	\rotatebox{90}{
	\begin{tabular}{l | l l l | l l l | l l l | l l l}
      \thead{PC} & \thead{$H_0$} & \thead{$H_0$\\(clustered)} & \thead{$H_0$\\(not clustered)} & \thead{HF zero} & \thead{HF zero\\(clustered)} & \thead{HF zero\\(not clustered)} & \thead{$\sigma_{radvel}$} & \thead{$\sigma_{radvel}$\\(clustered)} & \thead{$\sigma_{radvel}$\\(not clustered)} & \thead{$v_{r, LG}$} & \thead{$v_{t, LG}$} & \thead{$r_{LG}$}\\
      \hline
\makecell{1} & \makecell{0.015} & \makecell{-0.087} & \makecell{-0.318} & \makecell{0.129} & \makecell{0.087} & \makecell{-0.157} & \makecell{0.404} & \makecell{0.439} & \makecell{0.407} & \makecell{0.439} & \makecell{-0.058} & \makecell{-0.354}\\
	\makecell{2} & \makecell{-0.500} & \makecell{-0.379} & \makecell{-0.320} & \makecell{-0.493} & \makecell{-0.191} & \makecell{-0.327} & \makecell{-0.054} & \makecell{-0.130} & \makecell{-0.115} & \makecell{-0.130} & \makecell{0.073} & \makecell{-0.251}\\
	\makecell{3} & \makecell{0.081} & \makecell{0.344} & \makecell{-0.373} & \makecell{0.221} & \makecell{0.072} & \makecell{-0.395} & \makecell{0.320} & \makecell{-0.429} & \makecell{0.199} & \makecell{-0.429} & \makecell{-0.106} & \makecell{0.076}\\
	\makecell{4} & \makecell{-0.367} & \makecell{-0.223} & \makecell{0.001} & \makecell{0.092} & \makecell{0.577} & \makecell{0.340} & \makecell{0.243} & \makecell{-0.125} & \makecell{0.165} & \makecell{-0.125} & \makecell{0.470} & \makecell{0.141}\\
	\makecell{5} & \makecell{-0.139} & \makecell{-0.270} & \makecell{0.285} & \makecell{0.049} & \makecell{-0.312} & \makecell{0.410} & \makecell{0.304} & \makecell{-0.219} & \makecell{0.311} & \makecell{-0.219} & \makecell{-0.485} & \makecell{-0.195}\\
	\makecell{6} & \makecell{0.242} & \makecell{-0.091} & \makecell{0.019} & \makecell{0.001} & \makecell{-0.608} & \makecell{0.021} & \makecell{0.093} & \makecell{-0.041} & \makecell{0.326} & \makecell{-0.041} & \makecell{0.654} & \makecell{0.129}\\
	\makecell{7} & \makecell{0.166} & \makecell{-0.700} & \makecell{-0.127} & \makecell{0.515} & \makecell{-0.036} & \makecell{-0.208} & \makecell{0.011} & \makecell{-0.006} & \makecell{-0.268} & \makecell{-0.006} & \makecell{-0.104} & \makecell{0.272}\\
	\makecell{8} & \makecell{-0.235} & \makecell{0.053} & \makecell{-0.158} & \makecell{-0.269} & \makecell{-0.119} & \makecell{0.026} & \makecell{0.126} & \makecell{0.171} & \makecell{0.189} & \makecell{0.171} & \makecell{-0.261} & \makecell{0.807}\\
	\makecell{9} & \makecell{0.108} & \makecell{-0.002} & \makecell{0.472} & \makecell{-0.254} & \makecell{0.041} & \makecell{-0.292} & \makecell{0.678} & \makecell{0.037} & \makecell{-0.376} & \makecell{0.037} & \makecell{0.081} & \makecell{0.047}\\
	\makecell{10} & \makecell{-0.368} & \makecell{0.299} & \makecell{-0.288} & \makecell{0.341} & \makecell{-0.350} & \makecell{0.327} & \makecell{0.252} & \makecell{0.078} & \makecell{-0.509} & \makecell{0.078} & \makecell{0.089} & \makecell{-0.057}\\
	\makecell{11} & \makecell{-0.548} & \makecell{0.135} & \makecell{0.478} & \makecell{0.403} & \makecell{-0.105} & \makecell{-0.438} & \makecell{-0.177} & \makecell{0.069} & \makecell{0.206} & \makecell{0.069} & \makecell{0.050} & \makecell{0.024}\\
	\makecell{12} & \makecell{0.000} & \makecell{0.000} & \makecell{-0.000} & \makecell{-0.000} & \makecell{0.000} & \makecell{0.000} & \makecell{-0.000} & \makecell{-0.707} & \makecell{0.000} & \makecell{0.707} & \makecell{0.000} & \makecell{-0.000}
	\end{tabular}
	}
	\caption{Principal components of the twelve predictor variables measured from the simulations. These principal components have been extracted from the training set.}\label{tab:PCs}
\end{table}

\end{document}