You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ESPResSo benchmarks measure physical quantities, such as pressures and energies, and compare them against reference values. Some algorithms like the FFT involve a large number of floating-point operations that inevitably lead to precision loss. On x86 architectures that implement extended precision format, ESPResSo benefits from the 80bit representation when computing reductions involving a long sequence of basic arithmetic operations (trilinear interpolation, Taylor series) or exponentials: intermediate values are stored in 80bit wide registers, and only get truncated to 64bit wide floats when pushed to the stack or heap memory, typically at the end of the calculation.
The ESPResSo testsuite uses different tolerances based on whether the algorithm is implemented for 64bit or 32bit floating-point values. Typically the latter is used when offloading to the GPU. However, we do not have a mechanism in place to detect whether the FPU uses 80bit or 64bit wide registers. For this reason, the ESPResSo team needs to periodically adjust tolerances by running the testsuite on RISC architectures, which don't have 80bit wide registers.
The recently merged P3M ionic crystal benchmark for ESPResSo revealed relatively large deviations from the expected solutions. While the calculated energy of the crystal is correct, its relative deviation from the reference value is 200 times larger on Deucalion's ARMv8.2-A (Fujitsu A64FX) compared to Vega's Zen2 (AMD EPYC Rome 7H12). While we could update the test tolerance accordingly, this would prevent us from detecting unexpected accuracy losses on hardware where ESPResSo is known to leverage the extended precision format.
If there is a portable way of detecting hardware precision from the Python interface, we could tailor the test tolerances using a table. This might not be trivial, because precision also depends on compiler flags, such as architecture-dependent optimization and "fast math" optimizations. In addition, when using long-range solvers like P3M, the number of mathematical operations increases with the system size, which affects precision loss.
The text was updated successfully, but these errors were encountered:
The ESPResSo benchmarks measure physical quantities, such as pressures and energies, and compare them against reference values. Some algorithms like the FFT involve a large number of floating-point operations that inevitably lead to precision loss. On x86 architectures that implement extended precision format, ESPResSo benefits from the 80bit representation when computing reductions involving a long sequence of basic arithmetic operations (trilinear interpolation, Taylor series) or exponentials: intermediate values are stored in 80bit wide registers, and only get truncated to 64bit wide floats when pushed to the stack or heap memory, typically at the end of the calculation.
The ESPResSo testsuite uses different tolerances based on whether the algorithm is implemented for 64bit or 32bit floating-point values. Typically the latter is used when offloading to the GPU. However, we do not have a mechanism in place to detect whether the FPU uses 80bit or 64bit wide registers. For this reason, the ESPResSo team needs to periodically adjust tolerances by running the testsuite on RISC architectures, which don't have 80bit wide registers.
The recently merged P3M ionic crystal benchmark for ESPResSo revealed relatively large deviations from the expected solutions. While the calculated energy of the crystal is correct, its relative deviation from the reference value is 200 times larger on Deucalion's ARMv8.2-A (Fujitsu A64FX) compared to Vega's Zen2 (AMD EPYC Rome 7H12). While we could update the test tolerance accordingly, this would prevent us from detecting unexpected accuracy losses on hardware where ESPResSo is known to leverage the extended precision format.
If there is a portable way of detecting hardware precision from the Python interface, we could tailor the test tolerances using a table. This might not be trivial, because precision also depends on compiler flags, such as architecture-dependent optimization and "fast math" optimizations. In addition, when using long-range solvers like P3M, the number of mathematical operations increases with the system size, which affects precision loss.
The text was updated successfully, but these errors were encountered: