Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF/XML parser could be easily optimized #25

Open
Tpt opened this issue Jul 17, 2020 · 2 comments
Open

RDF/XML parser could be easily optimized #25

Tpt opened this issue Jul 17, 2020 · 2 comments

Comments

@Tpt
Copy link
Collaborator

Tpt commented Jul 17, 2020

The current RDF/XML parser is quite naive and copies the latest context each time an opening tag is read. I believe the parser could be easily speedup by avoiding such copies.

@thadguidry
Copy link

thadguidry commented Jan 1, 2021

Additional areas to consider would be utilizing and verifying that intrinsic functions in processors are being used and taken advantage of when available.

A few things I've thought of as I've perused your code:

Intrinsic String Compare within XML

#[target_feature(enable = "sse4.2")] (Intel Skylake processors and above) could be utilized for a lot of the string comparison being done in XML parser.rs
I don't know the Rust ecosystem, but noticed the intrinsic functions defined here https://doc.rust-lang.org/std/intrinsics/index.html but didn't see any mm_cmp_xxxxx (string compare) functions, so not sure how Rust plays that out, perhaps resorts to LLVM at times, but then the code functions need to be conditionally aligned for that and compiler hints added. (I'm more familiar with how Java deals with this @IntrinsicCandidate annotations, etc. And to see if intrinsic methods are being utilized or not and where in compiled code, you add: -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining)

Here's the new String Compare functions available from SSE4.2 :
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=898,2862,2861,2860,2863,2864,2865&techs=SSE4_2&cats=String%25252525252525252525252520Compare

and talked about in this developer article :

https://software.intel.com/content/www/us/en/develop/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4.html

Intrinsic escaping with SIMD

Escaping in api\src\model.rs could possibly be more performant using SIMD instructions something like https://docs.rs/v_escape/0.15.0/v_escape/ and written about more here https://brandur.org/nanoglyphs/008-actix#simd-escape (or there might be something in Rust core or the ecosystem that can do that now. I also noticed from a release 2 years ago this on Rust 1.27 https://github.com/rust-lang/rust/blob/master/RELEASES.md#libraries-22

SIMD (Single Instruction Multiple Data) on x86/x86_64 is now stable. This includes arch::x86 & arch::x86_64 modules which contain SIMD intrinsics, a new macro called is_x86_feature_detected!, the #[target_feature(enable="")] attribute, and adding target_feature = "" to the cfg attribute.

@thadguidry
Copy link

thadguidry commented Dec 18, 2022

@Tpt Looks like there is already a library jetscii that handles sizes of 8 or 16-bit characters since it uses instructions PCMPESTRI and PCMPESTRM on CPUs that use SSE4.2. (and there are other String Compare functions as noted in previous comment)
Some benchmarks are noted on quick-xml which we use already in the parser.
So maybe an easy performance win?
The other area might be in serialization and hashing, which I leave it to others to find appropriate SIMD libraries in the Rust ecosystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants