You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following this technique: Detecting Space Leaks, I discovered that hvx seems to spend a lot of time in evaluate. At every step of eg subGradLoop, it has to evaluate the objective, look up functions, etc. My suggestion is to add an intermediate evaluation step, which compiles the objective down to a single function that just takes all the variables and applies it. This could have the added benefit of adding a layer in which other implementations of the arithmetic could be done if one wanted to do this, eg with cuda - instead of evaluating EAdd to GHC's + and similarly other functions we could compute a piece of code that would go to cuda. Eventually maybe the whole of subgradLoop could work on the gpu. Do you think such a pre-compilation step would be of benefit to the speed of hvx? Can you suggest other places to improve the performance?
The text was updated successfully, but these errors were encountered:
Following this technique: Detecting Space Leaks, I discovered that hvx seems to spend a lot of time in evaluate. At every step of eg
subGradLoop
, it has to evaluate the objective, look up functions, etc. My suggestion is to add an intermediate evaluation step, which compiles the objective down to a single function that just takes all the variables and applies it. This could have the added benefit of adding a layer in which other implementations of the arithmetic could be done if one wanted to do this, eg with cuda - instead of evaluating EAdd to GHC's + and similarly other functions we could compute a piece of code that would go to cuda. Eventually maybe the whole ofsubgradLoop
could work on the gpu. Do you think such a pre-compilation step would be of benefit to the speed of hvx? Can you suggest other places to improve the performance?The text was updated successfully, but these errors were encountered: