Skip to content

Commit

Permalink
Merge pull request #9 from CEED/setup-vulcan
Browse files Browse the repository at this point in the history
Improved robustness for high number refinements and DOFs [setup-vulcan]
  • Loading branch information
vladotomov authored Dec 7, 2017
2 parents b379d5c + d2ee50b commit cbc68d3
Show file tree
Hide file tree
Showing 19 changed files with 955 additions and 869 deletions.
86 changes: 60 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ discretization and explicit high-order time-stepping.

Laghos is based on the discretization method described in the following article:

> V. Dobrev, Tz. Kolev and R. Rieben,<br>
> [High-order curvilinear finite element methods for Lagrangian hydrodynamics](https://doi.org/10.1137/120864672), <br>
> *SIAM Journal on Scientific Computing*, (34) 2012, pp.B606–B641.
> V. Dobrev, Tz. Kolev and R. Rieben <br>
> [High-order curvilinear finite element methods for Lagrangian hydrodynamics](https://doi.org/10.1137/120864672) <br>
> *SIAM Journal on Scientific Computing*, (34) 2012, pp. B606–B641.
Laghos captures the basic structure of many compressible shock hydrocodes,
including the [BLAST code](http://llnl.gov/casc/blast) at [Lawrence Livermore
Expand Down Expand Up @@ -54,10 +54,9 @@ Laghos supports two options for deriving and solving the ODE system, namely the
algorithm of interest for high orders. For low orders (e.g. 2nd order in 3D),
both algorithms are of interest.

The full assembly options relies on constructing and utilizing global mass and
force matrices stored in compressed sparse row (CSR) format.

The [partial assembly](http://ceed.exascaleproject.org/ceed-code) option defines
The full assembly option relies on constructing and utilizing global mass and
force matrices stored in compressed sparse row (CSR) format. In contrast, the
[partial assembly](http://ceed.exascaleproject.org/ceed-code) option defines
only the local action of those matrices, which is then used to perform all
necessary operations. As the local action is defined by utilizing the tensor
structure of the finite element spaces, the amount of data storage, memory
Expand Down Expand Up @@ -86,14 +85,14 @@ Other computational motives in Laghos include the following:
preparation and the application costs are important for this operator.
- Domain-decomposed MPI parallelism.
- Optional in-situ visualization with [GLVis](http:/glvis.org) and data output
for visualization / data analysis with [VisIt](http://visit.llnl.gov).
for visualization and data analysis with [VisIt](http://visit.llnl.gov).

## Code Structure

- The file `laghos.cpp` contains the main driver with the time integration loop
starting around line 370.
starting around line 431.
- In each time step, the ODE system of interest is constructed and solved by
the class `LagrangianHydroOperator`, defined around line 312 of `laghos.cpp`
the class `LagrangianHydroOperator`, defined around line 375 of `laghos.cpp`
and implemented in files `laghos_solver.hpp` and `laghos_solver.cpp`.
- All quadrature-based computations are performed in the function
`LagrangianHydroOperator::UpdateQuadratureData` in `laghos_solver.cpp`.
Expand All @@ -119,7 +118,7 @@ Other computational motives in Laghos include the following:
Laghos has the following external dependencies:

- *hypre*, used for parallel linear algebra, we recommend version 2.10.0b<br>
https://computation.llnl.gov/casc/hypre/software.html,
https://computation.llnl.gov/casc/hypre/software.html

- METIS, used for parallel domain decomposition (optional), we recommend [version 4.0.3](http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/OLD/metis-4.0.3.tar.gz) <br>
http://glaros.dtc.umn.edu/gkhome/metis/metis/download
Expand All @@ -128,10 +127,10 @@ Laghos has the following external dependencies:
https://github.com/mfem/mfem

To build the miniapp, first download *hypre* and METIS from the links above
and put everything on the same level as Laghos:
and put everything on the same level as the `Laghos` directory:
```sh
~> ls
Laghos/ hypre-2.10.0b.tar.gz metis-4.0.tar.gz
Laghos/ hypre-2.10.0b.tar.gz metis-4.0.tar.gz
```

Build *hypre*:
Expand All @@ -142,6 +141,8 @@ Build *hypre*:
~/hypre-2.10.0b/src> make -j
~/hypre-2.10.0b/src> cd ../..
```
For large runs (problem size above 2 billion unknowns), add the
`--enable-bigint` option to the above `configure` line.

Build METIS:
```sh
Expand All @@ -151,22 +152,29 @@ Build METIS:
~/metis-4.0.3> cd ..
~> ln -s metis-4.0.3 metis-4.0
```
This build is optional, as MFEM can be build without METIS by specifying
`MFEM_USE_METIS = NO` below.

Clone and build the parallel version of MFEM:
```sh
~> git clone [email protected]:mfem/mfem.git ./mfem
~> cd mfem/
~/mfem> git checkout laghos-v1.0
~/mfem> make parallel -j
~/mfem> cd ..
```
The above uses the `laghos-v1.0` tag of MFEM, which is guaranteed to work with
Laghos v1.0. Alternatively, one can use the latest versions of the MFEM and
Laghos `master` branches (provided there are no conflicts. See the [MFEM
building page](http://mfem.org/building/) for additional details.

Build Laghos
```sh
~> cd Laghos/
~> make
~/Laghos> make
```

For more details, see the [MFEM building page](http://mfem.org/building/).
This can be followed by `make test` and `make install` to check and install the
build respectively. See `make help` for additional options.

## Running

Expand All @@ -181,7 +189,8 @@ mpirun -np 8 laghos -p 1 -m data/square01_quad.mesh -rs 3 -tf 0.8 -no-vis -pa
mpirun -np 8 laghos -p 1 -m data/cube01_hex.mesh -rs 2 -tf 0.6 -no-vis -pa
```

The latter produces the following density plot (when run with `-vis` instead of `-no-vis`)
The latter produces the following density plot (when run with the `-vis` instead
of the `-no-vis` option)

![Sedov blast image](data/sedov.png)

Expand All @@ -197,7 +206,8 @@ mpirun -np 8 laghos -p 0 -m data/square01_quad.mesh -rs 3 -tf 0.5 -no-vis -pa
mpirun -np 8 laghos -p 0 -m data/cube01_hex.mesh -rs 1 -cfl 0.1 -tf 0.25 -no-vis -pa
```

The latter produces the following velocity magnitude plot (when run with `-vis` instead of `-no-vis`)
The latter produces the following velocity magnitude plot (when run with the
`-vis` instead of the `-no-vis` option)

![Taylor-Green image](data/tg.png)

Expand All @@ -212,7 +222,8 @@ mpirun -np 8 laghos -p 3 -m data/rectangle01_quad.mesh -rs 2 -tf 2.5 -cfl 0.025
mpirun -np 8 laghos -p 3 -m data/box01_hex.mesh -rs 1 -tf 2.5 -cfl 0.05 -no-vis -pa
```

The latter produces the following specific internal energy plot (when run with `-vis` instead of `-no-vis`)
The latter produces the following specific internal energy plot (when run with
the `-vis` instead of the `-no-vis` option)

![Triple-point image](data/tp.png)

Expand Down Expand Up @@ -245,30 +256,53 @@ round-off distance from the above reference values.

## Performance Timing and FOM

Each time step in Laghos contains 4 major distinct computations:
Each time step in Laghos contains 3 major distinct computations:

1. The inversion of the global kinematic mass matrix (CG H1).
2. The inversion of the local thermodynamic mass matrices (CG L2).
3. The force operator evaluation from degrees of freedom to quadrature points (Forces).
4. The physics kernel in quadrature points (UpdateQuadData).
2. The force operator evaluation from degrees of freedom to quadrature points (Forces).
3. The physics kernel in quadrature points (UpdateQuadData).

By default Laghos is instrumented to report the total execution times and rates,
in terms of millions of degrees of freedom (megadofs), for each of these
computational phases.
in terms of millions of degrees of freedom per second (megadofs), for each of
these computational phases. (The time for inversion of the local thermodynamic
mass matrices (CG L2) is also reported, but that takes a small part of the
overall computation.)

Laghos also reports the total rate for these major kernels, which is a proposed
**Figure of Merit (FOM)** for benchmarking purposes. Given a computational
allocation, the FOM should be reported for different problem sizes and finite
element orders, as illustrated in the sample scripts in the [timing](./timing)
directory.

A sample run on the [Vulcan](https://computation.llnl.gov/computers/vulcan) BG/Q
machine at LLNL is:

```
srun -n 393216 laghos -pa -p 1 -tf 0.6 -no-vis
-pt 322 -m data/cube_12_hex.mesh
--cg-tol 0 --cg-max-iter 50 --max-steps 2
-ok 3 -ot 2 -rs 5 -rp 3
```
This is Q3-Q2 3D computation on 393,216 MPI ranks (24,576 nodes) that produces
rates of approximately 168497, 74221, and 16696 megadofs, and a total FOM of
about 2073 megadofs.

To make the above run 8 times bigger, one can either weak scale by using 8 times
as many MPI tasks and increasing the number of serial refinements: `srun -n
3145728 ... -rs 6 -rp 3`, or use the same number of MPI tasks but increase the
local problem on each of them by doing more parallel refinements: `srun -n
393216 ... -rs 5 -rp 4`.

## Versions

In addition to the main MPI-based CPU implementation in https://github.com/CEED/Laghos,
the following versions of Laghos have been developed

- A serial version in the [serial](./serial) directory.
- [GPU version](https://github.com/dmed256/Laghos/tree/occa-dev) based on [OCCA](http://libocca.org/).
- [GPU version](https://github.com/dmed256/Laghos/tree/occa-dev) based on
[OCCA](http://libocca.org/).
- A [RAJA](https://software.llnl.gov/RAJA/)-based version in the
[raja-dev](https://github.com/CEED/Laghos/tree/raja-dev) branch.

## Contact

Expand Down
Loading

0 comments on commit cbc68d3

Please sign in to comment.