Merge pull request #9 from CEED/setup-vulcan

Improved robustness for high number refinements and DOFs [setup-vulcan]
CEED · Dec 7, 2017 · cbc68d3 · cbc68d3
2 parents b379d5c + d2ee50b
commit cbc68d3
Show file tree

Hide file tree

Showing 19 changed files with 955 additions and 869 deletions.
diff --git a/README.md b/README.md
@@ -17,9 +17,9 @@ discretization and explicit high-order time-stepping.
 
 Laghos is based on the discretization method described in the following article:
 
-> V. Dobrev, Tz. Kolev and R. Rieben,<br>
-> [High-order curvilinear finite element methods for Lagrangian hydrodynamics](https://doi.org/10.1137/120864672), <br>
-> *SIAM Journal on Scientific Computing*, (34) 2012, pp.B606–B641.
+> V. Dobrev, Tz. Kolev and R. Rieben <br>
+> [High-order curvilinear finite element methods for Lagrangian hydrodynamics](https://doi.org/10.1137/120864672) <br>
+> *SIAM Journal on Scientific Computing*, (34) 2012, pp. B606–B641.
 
 Laghos captures the basic structure of many compressible shock hydrocodes,
 including the [BLAST code](http://llnl.gov/casc/blast) at [Lawrence Livermore
@@ -54,10 +54,9 @@ Laghos supports two options for deriving and solving the ODE system, namely the
 algorithm of interest for high orders. For low orders (e.g. 2nd order in 3D),
 both algorithms are of interest.
 
-The full assembly options relies on constructing and utilizing global mass and
-force matrices stored in compressed sparse row (CSR) format.
-
-The [partial assembly](http://ceed.exascaleproject.org/ceed-code) option defines
+The full assembly option relies on constructing and utilizing global mass and
+force matrices stored in compressed sparse row (CSR) format.  In contrast, the
+[partial assembly](http://ceed.exascaleproject.org/ceed-code) option defines
 only the local action of those matrices, which is then used to perform all
 necessary operations. As the local action is defined by utilizing the tensor
 structure of the finite element spaces, the amount of data storage, memory
@@ -86,14 +85,14 @@ Other computational motives in Laghos include the following:
   preparation and the application costs are important for this operator.
 - Domain-decomposed MPI parallelism.
 - Optional in-situ visualization with [GLVis](http:/glvis.org) and data output
-  for visualization / data analysis with [VisIt](http://visit.llnl.gov).
+  for visualization and data analysis with [VisIt](http://visit.llnl.gov).
 
 ## Code Structure
 
 - The file `laghos.cpp` contains the main driver with the time integration loop
-  starting around line 370.
+  starting around line 431.
 - In each time step, the ODE system of interest is constructed and solved by
-  the class `LagrangianHydroOperator`, defined around line 312 of `laghos.cpp`
+  the class `LagrangianHydroOperator`, defined around line 375 of `laghos.cpp`
   and implemented in files `laghos_solver.hpp` and `laghos_solver.cpp`.
 - All quadrature-based computations are performed in the function
   `LagrangianHydroOperator::UpdateQuadratureData` in `laghos_solver.cpp`.
@@ -119,7 +118,7 @@ Other computational motives in Laghos include the following:
 Laghos has the following external dependencies:
 
 - *hypre*, used for parallel linear algebra, we recommend version 2.10.0b<br>
-   https://computation.llnl.gov/casc/hypre/software.html,
+   https://computation.llnl.gov/casc/hypre/software.html
 
 -  METIS, used for parallel domain decomposition (optional), we recommend [version 4.0.3](http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/OLD/metis-4.0.3.tar.gz) <br>
    http://glaros.dtc.umn.edu/gkhome/metis/metis/download
@@ -128,10 +127,10 @@ Laghos has the following external dependencies:
   https://github.com/mfem/mfem
 
 To build the miniapp, first download *hypre* and METIS from the links above
-and put everything on the same level as Laghos:
+and put everything on the same level as the `Laghos` directory:
 ```sh
 ~> ls
-Laghos/ hypre-2.10.0b.tar.gz   metis-4.0.tar.gz
+Laghos/  hypre-2.10.0b.tar.gz  metis-4.0.tar.gz
 ```
 
 Build *hypre*:
@@ -142,6 +141,8 @@ Build *hypre*:
 ~/hypre-2.10.0b/src> make -j
 ~/hypre-2.10.0b/src> cd ../..
 ```
+For large runs (problem size above 2 billion unknowns), add the
+`--enable-bigint` option to the above `configure` line.
 
 Build METIS:
 ```sh
@@ -151,22 +152,29 @@ Build METIS:
 ~/metis-4.0.3> cd ..
 ~> ln -s metis-4.0.3 metis-4.0
 ```
+This build is optional, as MFEM can be build without METIS by specifying
+`MFEM_USE_METIS = NO` below.
 
 Clone and build the parallel version of MFEM:
 ```sh
 ~> git clone [email protected]:mfem/mfem.git ./mfem
 ~> cd mfem/
+~/mfem> git checkout laghos-v1.0
 ~/mfem> make parallel -j
 ~/mfem> cd ..
 ```
+The above uses the `laghos-v1.0` tag of MFEM, which is guaranteed to work with
+Laghos v1.0. Alternatively, one can use the latest versions of the MFEM and
+Laghos `master` branches (provided there are no conflicts. See the [MFEM
+building page](http://mfem.org/building/) for additional details.
 
 Build Laghos
 ```sh
 ~> cd Laghos/
-~> make
+~/Laghos> make
 ```
-
-For more details, see the [MFEM building page](http://mfem.org/building/).
+This can be followed by `make test` and `make install` to check and install the
+build respectively. See `make help` for additional options.
 
 ## Running
 
@@ -181,7 +189,8 @@ mpirun -np 8 laghos -p 1 -m data/square01_quad.mesh -rs 3 -tf 0.8 -no-vis -pa
 mpirun -np 8 laghos -p 1 -m data/cube01_hex.mesh -rs 2 -tf 0.6 -no-vis -pa
 ```
 
-The latter produces the following density plot (when run with `-vis` instead of `-no-vis`)
+The latter produces the following density plot (when run with the `-vis` instead
+of the `-no-vis` option)
 
 ![Sedov blast image](data/sedov.png)
 
@@ -197,7 +206,8 @@ mpirun -np 8 laghos -p 0 -m data/square01_quad.mesh -rs 3 -tf 0.5 -no-vis -pa
 mpirun -np 8 laghos -p 0 -m data/cube01_hex.mesh -rs 1 -cfl 0.1 -tf 0.25 -no-vis -pa
 ```
 
-The latter produces the following velocity magnitude plot (when run with `-vis` instead of `-no-vis`)
+The latter produces the following velocity magnitude plot (when run with the
+`-vis` instead of the `-no-vis` option)
 
 ![Taylor-Green image](data/tg.png)
 
@@ -212,7 +222,8 @@ mpirun -np 8 laghos -p 3 -m data/rectangle01_quad.mesh -rs 2 -tf 2.5 -cfl 0.025
 mpirun -np 8 laghos -p 3 -m data/box01_hex.mesh -rs 1 -tf 2.5 -cfl 0.05 -no-vis -pa
 ```
 
-The latter produces the following specific internal energy plot (when run with `-vis` instead of `-no-vis`)
+The latter produces the following specific internal energy plot (when run with
+the `-vis` instead of the `-no-vis` option)
 
 ![Triple-point image](data/tp.png)
 
@@ -245,30 +256,53 @@ round-off distance from the above reference values.
 
 ## Performance Timing and FOM
 
-Each time step in Laghos contains 4 major distinct computations:
+Each time step in Laghos contains 3 major distinct computations:
 
 1. The inversion of the global kinematic mass matrix (CG H1).
-2. The inversion of the local thermodynamic mass matrices (CG L2).
-3. The force operator evaluation from degrees of freedom to quadrature points (Forces).
-4. The physics kernel in quadrature points (UpdateQuadData).
+2. The force operator evaluation from degrees of freedom to quadrature points (Forces).
+3. The physics kernel in quadrature points (UpdateQuadData).
 
 By default Laghos is instrumented to report the total execution times and rates,
-in terms of millions of degrees of freedom (megadofs), for each of these
-computational phases.
+in terms of millions of degrees of freedom per second (megadofs), for each of
+these computational phases. (The time for inversion of the local thermodynamic
+mass matrices (CG L2) is also reported, but that takes a small part of the
+overall computation.)
 
 Laghos also reports the total rate for these major kernels, which is a proposed
 **Figure of Merit (FOM)** for benchmarking purposes.  Given a computational
 allocation, the FOM should be reported for different problem sizes and finite
 element orders, as illustrated in the sample scripts in the [timing](./timing)
 directory.
 
+A sample run on the [Vulcan](https://computation.llnl.gov/computers/vulcan) BG/Q
+machine at LLNL is:
+
+```
+srun -n 393216 laghos -pa -p 1 -tf 0.6 -no-vis
+                      -pt 322 -m data/cube_12_hex.mesh
+                      --cg-tol 0 --cg-max-iter 50 --max-steps 2
+                      -ok 3 -ot 2 -rs 5 -rp 3
+```
+This is Q3-Q2 3D computation on 393,216 MPI ranks (24,576 nodes) that produces
+rates of approximately 168497, 74221, and 16696 megadofs, and a total FOM of
+about 2073 megadofs.
+
+To make the above run 8 times bigger, one can either weak scale by using 8 times
+as many MPI tasks and increasing the number of serial refinements: `srun -n
+3145728 ... -rs 6 -rp 3`, or use the same number of MPI tasks but increase the
+local problem on each of them by doing more parallel refinements: `srun -n
+393216 ... -rs 5 -rp 4`.
+
 ## Versions
 
 In addition to the main MPI-based CPU implementation in https://github.com/CEED/Laghos,
 the following versions of Laghos have been developed
 
 - A serial version in the [serial](./serial) directory.
-- [GPU version](https://github.com/dmed256/Laghos/tree/occa-dev) based on [OCCA](http://libocca.org/).
+- [GPU version](https://github.com/dmed256/Laghos/tree/occa-dev) based on
+  [OCCA](http://libocca.org/).
+- A [RAJA](https://software.llnl.gov/RAJA/)-based version in the
+  [raja-dev](https://github.com/CEED/Laghos/tree/raja-dev) branch.
 
 ## Contact