-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Experimental/corbett/jit #1333
Conversation
@rrsettgast @klevzoff @francoishamon take a look. |
Hi @corbett5 ! Would this |
@TotoGaz yes you will be able to pre-compile things. |
…d jit compilation, testing and configuration still needed
…ss to avoid cumbersome preprocessor usage, only managed to remove the kernel name string from the arglist, which at least enforces the name being correct
…sedov 1-rank passing
…due to file conflicts if multiple runs try to compile a kernel simultaneously
7916cda
to
9bc353d
Compare
#define KernelDispatch KernelDispatchJIT | ||
#else | ||
#define KernelDispatch KernelDispatchTemplate | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the comments, this is the part of the current implementation I want to change the most. It should be possible, but wound up eating hours while I was trying to get it implemented the way I would prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to make the two dispatcher classes similar by replacing NAME
and HEADER
parameters in KernelDispatchJIT
with with just KERNEL_TYPE
parameter (i.e. template template parameter, like in the non-JIT dispatcher).
NAME
could then be extracted as (I'm trying to decide if this is 100% robust):
string const fullName = LvArray::system::demangleType< KERNEL_TYPE< SUBREGION_TYPE, FE_TYPE, CONSTITUTIVE_TYPE > >();
string const name = fullName.substr( 0, fullName.find( '<' ) );
HEADER
could be replaced by adding inside each "leaf" kernel class something like
template< typename SUBREGION_TYPE, typename FE_TYPE, typename CONSTITUTIVE_TYPE >
class MyKernel
{
static constexpr char const source_location[] = __FILE__;
...
};
// Sad part: this is required in C++14 to avoid linker errors
template< typename SUBREGION_TYPE, typename FE_TYPE, typename CONSTITUTIVE_TYPE >
constexpr char const MyKernel< SUBREGION_TYPE, FE_TYPE, CONSTITUTIVE_TYPE >::source_location[];
and accessing as
string const header = KERNEL_TYPE< SUBREGION_TYPE, FE_TYPE, CONSTITUTIVE_TYPE >::source_location;
The downside is having this extra stuff to remember to put in kernels. Maybe there's a better way to associate kernel class with source file name?
These ideas aren't too pretty, just thought I'd mention them. They allow us to gets rid of JITTI_DECL
/JITTI_TPARAM
macros entirely and unify the two dispatchers, possibly even making them specializations as the comment above suggests, and allowing for a per-kernel JITting decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the end goal is to always use JITTI
, although that doesn't mean that we need to always jit things at run-time. We need to have the capability to pre-jit everything at build-time, and I think that would suffice.
I'll need to merge from dev and revert the submodule change prior to merge, but this is functional and in a good place to review. I still need to check some things on lassen, but was able to get 100% of the regression tests passing on quartz. It did require multiple passes as it is possible for multiple tests to try to compile the same kernel simultaneously resulting in file system errors that kill the runs. This had to be avoiding internally via some mpi checks, but when multiple executables are running simultaneously it might require adding sentinel files or something similar, I'm open to ideas to mitigate that issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just have one suggestion (not necessarily a good one) re: unifying the two dispatchers.
) | ||
# | ||
# Specify all sources | ||
# | ||
set( finiteElement_sources | ||
FiniteElementDiscretization.cpp | ||
FiniteElementDiscretizationManager.cpp | ||
kernelInterface/kernelJIT.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not included in the PR, and I can't see it being generated anywhere either
camp::tuple< ARGS ... > m_args; | ||
}; | ||
|
||
jitti::CompilationInfo getKernelCompilationInfo( const string & header ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find the definition of this function anywhere. Is it in kernelJIT.cpp
?
#define KernelDispatch KernelDispatchJIT | ||
#else | ||
#define KernelDispatch KernelDispatchTemplate | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to make the two dispatcher classes similar by replacing NAME
and HEADER
parameters in KernelDispatchJIT
with with just KERNEL_TYPE
parameter (i.e. template template parameter, like in the non-JIT dispatcher).
NAME
could then be extracted as (I'm trying to decide if this is 100% robust):
string const fullName = LvArray::system::demangleType< KERNEL_TYPE< SUBREGION_TYPE, FE_TYPE, CONSTITUTIVE_TYPE > >();
string const name = fullName.substr( 0, fullName.find( '<' ) );
HEADER
could be replaced by adding inside each "leaf" kernel class something like
template< typename SUBREGION_TYPE, typename FE_TYPE, typename CONSTITUTIVE_TYPE >
class MyKernel
{
static constexpr char const source_location[] = __FILE__;
...
};
// Sad part: this is required in C++14 to avoid linker errors
template< typename SUBREGION_TYPE, typename FE_TYPE, typename CONSTITUTIVE_TYPE >
constexpr char const MyKernel< SUBREGION_TYPE, FE_TYPE, CONSTITUTIVE_TYPE >::source_location[];
and accessing as
string const header = KERNEL_TYPE< SUBREGION_TYPE, FE_TYPE, CONSTITUTIVE_TYPE >::source_location;
The downside is having this extra stuff to remember to put in kernels. Maybe there's a better way to associate kernel class with source file name?
These ideas aren't too pretty, just thought I'd mention them. They allow us to gets rid of JITTI_DECL
/JITTI_TPARAM
macros entirely and unify the two dispatchers, possibly even making them specializations as the comment above suggests, and allowing for a per-kernel JITting decision.
add_custom_command( OUTPUT ${CMAKE_BINARY_DIR}/include/kernelJITCompileCommands.hpp | ||
WORKING_DIRECTORY ${CMAKE_BINARY_DIR} | ||
COMMAND python ${CMAKE_CURRENT_LIST_DIR}/../LvArray/src/jitti/generateCompileCommandsHeader.py | ||
${CMAKE_BINARY_DIR}/compile_commands.json | ||
--cpp ${CMAKE_CURRENT_LIST_DIR}/kernelInterface/kernelJIT.cpp | ||
--hpp ${CMAKE_BINARY_DIR}/include/kernelJITCompileCommands.hpp | ||
--include ${CMAKE_BINARY_DIR}/include | ||
--linker ${CMAKE_CXX_COMPILER} ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of relying on headers that you generate on the file system, could it have been more robust to store the C++ pre-processing output as a string alongside the compile/link commands directly into the lib?
And that would be more "modern" JIT style imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think you could do that, although it would require the user to list out the functions that they want to JIT at configuration time in CMake. Then for each of those functions we could pre-process jitti/templateSource.cpp
. But only up to a point, since we don't know what template params the user will want. Then at run time when we know the params we could compile the pre-processed source with the additional command line definitions of JITTI_TEMPLATE_PARAMS
and JITTI_TEMPLATE_PARAMS_STRING
.
However, how we pass a string embedded in our library to the compiler without using the file system is beyond me. Not to mention that at the end of the day the compiler is going to spit out a library on the file system that we then have to open. So this frees us from having to have access to the same headers used to build, but unless we do something really fancy I think file system access is a requirement.
Also at least with CUDA the cost of the compilation time itself will greatly outweigh the cost of opening and pre-processing the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also at least with CUDA the cost of the compilation time itself will greatly outweigh the cost of opening and pre-processing the source.
It's more a problem of having a consistent self contained GEOSX installation than a performance problem imho.
Embedding the pre-processed source files directly into GEOSX would help at non relying on sources that may be modified behind the scenes.
I do not know all the JIT details so maybe it's not a problem anyway: I do not want to solve problems that do not exist.
option( ENABLE_JITTI "Build all compute kernels just-in-time at runtime." OFF ) | ||
|
||
if ( ENABLE_JITTI ) | ||
message( "JITTI is ENABLED") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way we've been doing this is GEOSX_ENABLE_JITTI
, or maybe it's GEOSX_USE_JITTI
, I forget. But either way I like defined/not defined instead of 1/0.
#define KernelDispatch KernelDispatchJIT | ||
#else | ||
#define KernelDispatch KernelDispatchTemplate | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the end goal is to always use JITTI
, although that doesn't mean that we need to always jit things at run-time. We need to have the capability to pre-jit everything at build-time, and I think that would suffice.
constexpr bool compilerIsNVCC = false; | ||
#endif | ||
|
||
jitti::CompilationInfo getKernelCompilationInfo( const string & header ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my idea is that for each kernel header we'd have a source that includes the header and has the method to get thejitti::CompilationInfo
(s) for the kernel(s) in the header. That way if the kernel (header) changes and the user rebuilds GEOSX then the compile time in the compilation info gets updated and so things will be re-jitted.
I think with this implementation it won't rejit if say the solid mechanics kernels are modified and GEOSX is rebuilt, only if some dependency of KernelBase.hpp
is modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is probably true in the current implementation.
This is the first thing I'll take a look at changing from the current implementation, and then I'll look at adding in the compile-time pre-jit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
This is a proof of concept for a dynamic library based JIT. It takes the constitutive and element dispatch out of
regionBasedKernelApplication
and instead compiles and links in the kernel at runtime. I confirmed that it is passing theSSLE_sedov_01
problem on Quartz with Clang debug and GCC release and on Lassen with Clang debug.It should pass all the one rank problems if I were to
#include
the appropriate headers for all the various kernels at the bottom ofKernelBase.hpp
like I did forSolidMechanicsSmallStrainExplicitNewmarkKernel.hpp
. As is there are some compilation errors (with the main code not the JIT) if I do this, but structuring things like this is a terrible idea so I just proceeded with my hack.Each run re-compiles the kernels it needs, this is wasteful but would be easily fixed.
Multiple rank runs fail because each rank tries to compile the kernel, easily fixed with some synchronization.
Related to
GEOS-DEV/LvArray#225