Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customized cost functions for faster contractions #171

Closed
jcmgray opened this issue Oct 23, 2021 · 4 comments
Closed

Customized cost functions for faster contractions #171

jcmgray opened this issue Oct 23, 2021 · 4 comments

Comments

@jcmgray
Copy link
Collaborator

jcmgray commented Oct 23, 2021

As part of some work on contracting some very large tensor networks we found that a good heuristic cost to optimize for is not just 'FLOPs' but also 'WRITE' - i.e. the size of the intermediate output tensors regardless of the cost to product it.

For example, an outer product with shapes like (d, 1), (1, d)->(d, d) has the same WRITE cost as matrix multiplication with shapes like (d, d), (d, d) -> (d, d) despite having a FLOPs cost d times lower.

If you take a cost as FLOPS + alpha * WRITE and optimize for this you can get some significant and fairly generic gains across hardware in terms of actual efficiency and contraction time. Relevant probably for e.g. #103.

Here (you can ignore what the contraction is and the 'index sorting' bit), we have FLOPS and WRITE on top and actual contraction time on bottom on a single cpu thread:

cpu-alpha-efficiency

You can see there's about a 5x speedup.

Here's the same for a larger contraction on an A100 (big fast gpu) with a speedup of 8x:

gpu-alpha-efficiency

In both cases choosing alpha=64 is very decent despite probably having different ratios of cpu cycles to memory speed etc etc. Generally there seems to be a window where neither the FLOPS or WRITE cost is compromised too much. EDIT: I should emphasize that the above contractions tend to produce very low arithmetic intensity contractions, so the speedups will not always be this large.

I've had this in my own branch of opt_einsum for a while (just for the dp optimizer, but which one can use to take any large contraction path/tree and iteratively 'reconfigure' subtrees of), but it would be nice to have it in the main opt_einsum and it also might be useful to others.

Currently the interface is like:

# use the default alpha (probably 64)
opt = DynamicProgramming(minimize='combo')

# use a specific alpha
opt = DynamicProgramming(minimize='combo-100')

@dgasmith, thoughts on including?

@jcmgray
Copy link
Collaborator Author

jcmgray commented Oct 23, 2021

Oh the other useful thing to note is that just taking the 'WRITE' cost - i.e. the sum of the sizes of all the intermediates - is the most relevant quantity for computing the space complexity of back propagating through a contraction (since roughly speaking all intermediates must be kept to compute gradients), which is often the more practical limit than actual time cost.

@jcmgray
Copy link
Collaborator Author

jcmgray commented Oct 23, 2021

Mostly for fun, here is the actual effect of changing alpha on the contraction trees (/paths):

trees

[Cost and size of contractions are node and edge width+color respectively.]

On the left is a alpha=0 contraction, with many low-efficiency contraction of a very large tensor with a very small tensor, and on the right is an alpha=64 contraction, where the effort is concentrated into fewer, more balanced and thus more efficient contractions.

@dgasmith
Copy link
Owner

This will be great to get in and seems to be an evolution of the different greedy approaches. We seem to have many choices that we wish to parametrize over time and give them simple names- an alternative approach could be leaning into the object-based count object such as:

cost_fn = CostFunction(alpha=64, ...)
cost = cost_fn(indices, removed, ...)

@jcmgray
Copy link
Collaborator Author

jcmgray commented Jan 17, 2022

These are now available with #181.

@jcmgray jcmgray closed this as completed Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants