This directory contains three standalone parallel Fortran 2018 programs:
- An unsteady 2D heat equation solver: heat-equation.f90
- A simple "Hello, world!" program: hello.f90
- An asynchronous "Hello, world!" program: async-hello.f90
These programs demonstrate Fortran's parallel features commonly referred to as "Coarray Fortran."
- The GCC, NAG, HPE Cray, or Intel Fortran compilers
- Only if using GCC: The OpenCoarrays compiler wrapper (
caf
) and program launcher (cafrun
)
The numerical algorithm uses 2nd-order-accurate central finite differencing in space and 1st-order-accurate explicit Euler advancement in time.
With the GCC Fortran compiler (gfortran
) and the
OpenCoarrays parallel runtime library installed, compile this program
as a standalone file and run it as follows:
caf -o heat heat-conduction.f90
cafrun -n 2 ./heat
where you may replace 2
in the above line with the desired number of images.
With the Intel ifx
Fortran compiler installed,
ifx -o heat -coarray heat-equation.f90
export FOR_COARRAY_NUM_IMAGES=2
./heat
This Cray compiler can compile the two *hello.f90
programs.
A compiler bug prevents the compilation of the heat equation
solver. Execute the following commands:
module load PrgEnv-cray
ftn -o async-hello async-hello.f90
salloc -N1 -t60 -Am2878 -C cpu -q interactive
srun -n32 async-hello
where the salloc
command requests one node for interactive use and the
srun
command launches the compiled async-hello
program in 32 images.
Execute
gfortran -o heat -fcoarray=single heat-equation.f90
./heat
In addition to demonstrating parallel features of Fortran 2018, this example
shows an object-oriented, functional programming style based on Fortran's
user-defined operators such as the .laplacian.
operator defined in this
example. To demonstrate the expressive power and flexibility of this
approach, try modifying the modifying the main program to use 2nd-order
Runge-Kutta time advancement:
T_half = T + 0.5*dt*alpha* .laplacian. T
call T%exchange_halo
sync all
T = T + dt*alpha* .laplacian. T_half
call T%exchange_halo
sync all
You'll need to append , T_half
to the declaration type(subdomain_2D_t) T
.
With some care, you could modify the main program to use any desired order of
Runge-Kutta algorithm without changing any of the supporting code.
This example also demonstrates a benefit of Fortran's facility for declaring a
procedure to be pure
: the semantics of pure
procedures essentially
guarantees that the above right-hand-side expressions can be evaluated fully
asynchronously across all images. No operator can modify state that would be
observable by another operator other than via the first operator's result. This
would be true even if an operator executing on one image performs communication
to get data from another image via a coarray. To reduce communication waiting
times, however, each image in our example proactively puts data onto
neighboring images. Puts generally outperform gets because the data can be
shipped off as soon the data are ready. With the exception of one coarray
allocation in the define
procedure, all procedures are asynchronous and all
image control is exposed in the main program.
Try adjusting the delay_magnitude
constant to larger or smaller non-negative
values. For each new value, recompile once and rerun the program multiple times.
Explain the resulting program output.