Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New instruction string.hash #60

Open
Liedtke opened this issue Feb 2, 2023 · 5 comments
Open

New instruction string.hash #60

Liedtke opened this issue Feb 2, 2023 · 5 comments

Comments

@Liedtke
Copy link

Liedtke commented Feb 2, 2023

We have another instruction that is added to v8 for our performance experiments:

  • string.hash(str: string) -> int32, opcode 0xfbaa
  • returns an implementation defined 32 bit hash value

The reasoning for this instruction is that any JavaScript engine already requires hashed strings, so with stringrefs there is already space reserved in the object for it and there are fast engine-internal implementations available handling different kinds of internal string representations etc.
Most languages require hashes for data structures and with stringrefs they can't add a hash property to strings, so keeping an application-defined hash together with the string is difficult / impacts performance.

@dcodeIO
Copy link
Contributor

dcodeIO commented Feb 3, 2023

Wondering: Would it make sense to extend this concept so any kind of host reference can be hashed / reuse what only an engine in control of the references can do? Say, if one wants to implement a Set or Map in WebAssembly, and the key there is an anyref (that might also be a stringref) for example, then there is currently no good way to reliably generate a hash, since there is no numeric pointer and the contents are not necessarily known. Right now seems that one would import JS Set or Map to tap into the engine's hashing, but that won't be portable?

@jakobkummerow
Copy link
Collaborator

Extending built-in hashes to other objects could indeed make sense, but for WasmGC objects is most likely post-MVP material. It could well be that for stringrefs it'll be a post-MVP topic as well; for now we are just enabling experimentation.

Regarding application-defined-ness, it gets even worse: V8's hashes are seeded, and V8 allows embedders to randomize the hash seed on startup, which means that the same string will have a different hash when the program runs in a different process. AFAIK Chrome does not make use of this feature, but Node does (to protect against certain kinds of DoS attacks). It's not clear to me whether this would be a problem for Wasm use cases; it's also not clear to me what could be done differently if it is determined to be a problem: changing the internal hashing strategy would be a regression for use cases that want the randomization, storing an internal and a Wasm-exposed hash separately (1) would be nasty or impossible to implement and (2) would undo most of the efficiency benefits of making string.hash a built-in operation. So this is certainly something that needs to be thought through before standardizing string.hash.

@eqrion
Copy link
Contributor

eqrion commented Feb 6, 2023

SpiderMonkey does not have a pre-computed hash stored on most JS strings. It is computed on demand and then cached through other means.

Also, is this the first time in the web platform we'd directly expose a platform provided hash code for JS strings? If so, this seems like something that may need to be floated with TC39, and also exposed through the String prototype.

@jakobkummerow
Copy link
Collaborator

@eqrion:

V8 also computes string hashes on demand; we certainly shouldn't create an expectation or specification that string.hash is always just field load.

I agree with the concern about exposing internal hashes.

Feel free to drag your feet on implementing this instruction. We don't even know yet how helpful it'll be. We'll report back when we have performance results from experimenting!

@wingo
Copy link
Collaborator

wingo commented Feb 23, 2023

I think it may make sense to define the hash function. The reason is that you would like to be able to generate static hash dispatch tables for dispatch over a known set of strings:

(br_table $L0 $L1 ... $LN-1
   (i32.rem_u (string.hash (local.get $str)) (i32.const N))

But of course you will only be able to do that if you know the hash function at compile-time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants