Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache refactor to improve performance and footprint #1576

Closed
samuelchassot opened this issue Sep 5, 2024 · 2 comments
Closed

Cache refactor to improve performance and footprint #1576

samuelchassot opened this issue Sep 5, 2024 · 2 comments
Assignees

Comments

@samuelchassot
Copy link
Collaborator

Here are some ideas to improve the Stainless cache implementation to improve its performance and reduce its footprint:

  • Implement some subsumption to remove from VCs assumptions that were not necessary to prove the goal (would improve cache hit rate)
  • Implement LRU-based eviction policy to limit the size of the cache (improve performance to load and footprint)
  • Use hashing instead of storing the entire program + VC in the cache (reduce size even if reducing the trust; a good hash function would make collision very unlikely and is acceptable)
    see: https://stackoverflow.com/questions/46329956/how-to-correctly-generate-sha-256-checksum-for-a-string-in-scala/46332228#46332228
  • Replace the one-file current approach with a multiple files approach, using hash for names, in a similar way to what Apple is doing with MacOS internal files (would reduce the load time, make eviction easier to manage)
@vkuncak vkuncak added the feature label Sep 5, 2024
@vkuncak
Copy link
Collaborator

vkuncak commented Sep 5, 2024

Looks like the first thing to do is storing hashes. Copying and testing the post from stackoverflow, this works. We should perhaps keep bytes; strings are only for human inspection.

import java.security.MessageDigest
import java.util.HexFormat
object Test:
  val bytes = MessageDigest.getInstance("SHA-256")
    .digest("any string".getBytes("UTF-8"))

  @main
  def sha256: Unit =  
    val s = HexFormat.of().formatHex(bytes)
    println(s)
    // 1e57a452a094728c291bc42bf2bc7eb8d9fd8844d1369da2bf728588b46c4e75
    // same as output of:
    //    echo -n "any string" | openssl dgst -sha256

Someone could also write a dedicated hash table for storing fixed-length caches in a bytearray or long array. It should be notably faster than Hashmap, as we only need one bytearray to store hashes, and use the ending of the hash multiplied by some costant (by 32 if we have byte array) to index into this array.

@vkuncak vkuncak added the cache label Sep 6, 2024
@samuelchassot samuelchassot self-assigned this Oct 16, 2024
@samuelchassot
Copy link
Collaborator Author

Implemented in #1591

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants