Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental support for zstd compression. #3577

Merged
merged 32 commits into from
Apr 4, 2023

Conversation

mulugetam
Copy link
Contributor

@mulugetam mulugetam commented Jun 14, 2022

This PR adds experimental support for lz4 (native) and zstd (with and without dictionary) compression algorithms. The rationale for adding this support is discussed here: #3354.

Users would be able to set the compression type by setting the index.codec with the following values:

  • "LZ4" -- lz4 (native) compression.
  • "ZSTD" -- zstd compression, with dictionary.
  • "ZSTDNODICT" -- zstd compression, without dictionary.

The core classes are packaged under org.apache.lucene.codecs.experimental.

The compression algorithms have been tested and benchmarked using OpenSearch-Benchmark and StoredFieldsBenchmark . The results show significant gains in performance.

Signed-off-by: Mulugeta Mammo [email protected]

Description

A detailed description is available here: #3354

Issues Resolved

#3354

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 9d3cbe0dbd8e41daef93385b6c77aa2ca6c95d6b
Log 5973

Reports 5973

@willyborankin
Copy link
Contributor

Small thing so far:

  • You need to add signature files for new libs + licenses otherwise build fails :-)
    Take a look here: ZSTD snapshotting compression #2996 it will reduce my changes.
    ./gradlew precommit/check will help

@reta
Copy link
Collaborator

reta commented Jun 14, 2022

@dblock @nknize I think we should take the sandbox approach for experimental features. We have the ability to provide CodecService from the plugin(s), using sandbox plugin for that would make sense. Thoughts?

@mulugetam
Copy link
Contributor Author

mulugetam commented Jun 14, 2022

@willyborankin I have added signature files and licenses for the dependencies.

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 759642278598f5f17fe12af5a6efa052e06e9972
Log 5987

Reports 5987

@mulugetam
Copy link
Contributor Author

@reta the PR is labelled "experimental" just because we'd like to make it available for users together with the snapshotting compression that @willyborankin has been doing. The codecs work well and have significant performance gains.

@dblock
Copy link
Member

dblock commented Jun 14, 2022

@dblock @nknize I think we should take the sandbox approach for experimental features. We have the ability to provide CodecService from the plugin(s), using sandbox plugin for that would make sense. Thoughts?

I really don't know anything about this stuff, deferring to @nknize, at a high level, yes.

Overall this PR looks good to me.

@dblock
Copy link
Member

dblock commented Jun 14, 2022

@mulugetam You should get it to green either way, fix DCO, etc.

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 3de2f6b7a00dfe4f700f2d45e314f7a1ca53c7fa
Log 5997

Reports 5997

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 6999a5496e1955e4c632044cee86cddcdb50654b
Log 5998

Reports 5998

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success e95ae663dfd0cda0550d5bb7ffd589590b87312a
Log 6019

Reports 6019

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think license headers need updates, otherwise LGTM.

@dblock dblock requested review from mch2 and kartg June 15, 2022 13:13
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 1a558e751816fb87ddf54b3e529c65ab1a8c3691
Log 6027

Reports 6027

@mulugetam mulugetam requested a review from dblock June 15, 2022 22:29
Copy link
Contributor

@willyborankin willyborankin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments.

@willyborankin
Copy link
Contributor

@dblock @nknize I think we should take the sandbox approach for experimental features. We have the ability to provide CodecService from the plugin(s), using sandbox plugin for that would make sense. Thoughts?

IMHO I think we can add them as a new experimental feature with default behaviour as it works now.

@mulugetam
Copy link
Contributor Author

@willyborankin have fixed the issues your raised. Thanks :-)

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 9c02870d30fb33ae000402d2ace647beba069188
Log 6176

Reports 6176

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 566217cba00fa09825b71c0b8e2dcff414a1c371
Log 6177

Reports 6177

Copy link
Contributor

@willyborankin willyborankin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a small question.

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 11376fadc35b0e83b90bbfe4a5c621963e0c48f9
Log 6191

Reports 6191

Copy link
Contributor

@willyborankin willyborankin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @reta, @kartg, @dblock and @mch2 Could you please review this PR?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dblock
Copy link
Member

dblock commented Apr 3, 2023

I'm going to merge on 🟢

@dblock dblock changed the title Add experimental support for zstd and lz4 (native) compression algorithms. Add experimental support for zstd compression. Apr 3, 2023
@dblock
Copy link
Member

dblock commented Apr 3, 2023

@nknize we need you to dismiss your review/approve to merge

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good movement here. Still feels open but at least it's sandboxed. LGFN...

@mulugetam
Copy link
Contributor Author

@dblock Only the bc-fips vulnerability issue outstanding now. What is the plan? Remove it?

@dblock
Copy link
Member

dblock commented Apr 4, 2023

@dblock Only the bc-fips vulnerability issue outstanding now. What is the plan? Remove it?

Separate issue from this PR. Care to open one?

@dblock dblock merged commit f071c9b into opensearch-project:main Apr 4, 2023
@dblock dblock added the backport 2.x Backport to 2.x branch label Apr 4, 2023
@dblock
Copy link
Member

dblock commented Apr 4, 2023

I tagged this as backport to 2.x, watch that pass/fail/may need manual labor.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 4, 2023
* Add experimental support for zstd and lz4 (native) compression.

Add experimental support for zstd (with and without dictionary support) and lz4 (native) compressions as discussed in: #3354.

Users would be able to set the index.codec setting with the values "lz4_native" (for lz4 native), "zstd" (for zstd with dictionary), and "zstd_no_dict" (for zstd without a dictionary).

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix license issues, add tests for zstd, zstd_no_dict, and lz4_native compressions.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix DCO and and issues with CodecTests.java.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix forbidden api violation error for lz4-java.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix license headers. Remove and fix unnecessary fields.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix magic numbers. Use more restrictive access modifiers.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Use protected access modifier for Zstd and LZ4 compression mode classes.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Allow negative compression levels for zstd. Use more restrictive access modifiers.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Use a more restrictive permission for loading zstd-jni and lz4-java libraries.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Rename a file (follow a consistent version naming convention).

Signed-off-by: Mulugeta Mammo <[email protected]>

* Refactor and create a new custom-codecs sandbox module.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Remove blank lines.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Restore Lucene92CustomCodec to extend from FilterCodec.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Make use of the compressionLevel argument.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Make Lucene92CustomCodec abstract and use a package-private access modifier.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix missing JavaDoc issues. Remove unused field in PerFieldMappingPostingFormatCodec.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix lint errors.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix the description for the custom-codecs plugin.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix wildcard import and improve documentation.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Access control exception fixed. Removed lz4-java support for now.

- PRs were made to zstd-jni and lz4-java to use AccessController.doPrivileged.
- The zstd-jni PR is merged since version 1.5.4-1.
- The lz4-java support temporarily removed until the PR gets merged.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Upgrade plugin to use Lucene95Codec. Rename files accordingly.

- Upgrade plugin to use Lucene95Codec. Rename files accordingly.
- Fix lint issue with plugin-security.
- Remove thridPartyAudit that was there for supporting lz4-java.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Add test cases for compression/decompression. Other minor changes.

  - add test cases for compression/decompression.
  - rename package.
  - add a CHANGELOG entry.
  - add more checks for signed integer arithmetic.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Remove ES grant in plugin-security.policy. Fix minor javadoc issue.

  - Remove ES grant in plugin-security.policy file.
  - Replace @link and @see to fix javadoc error.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Upgrade jettison version to 1.5.4.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Update SHA for jettison 1.5.4.

Signed-off-by: Mulugeta Mammo <[email protected]>

---------

Signed-off-by: Mulugeta Mammo <[email protected]>
Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]>
(cherry picked from commit f071c9b)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@reta
Copy link
Collaborator

reta commented Apr 4, 2023

@dblock Only the bc-fips vulnerability issue outstanding now. What is the plan? Remove it?

Separate issue from this PR. Care to open one?

@dblock it is already in work: #6629

mitrofmep pushed a commit to mitrofmep/OpenSearch that referenced this pull request Apr 5, 2023
* Add experimental support for zstd and lz4 (native) compression.

Add experimental support for zstd (with and without dictionary support) and lz4 (native) compressions as discussed in: opensearch-project#3354.

Users would be able to set the index.codec setting with the values "lz4_native" (for lz4 native), "zstd" (for zstd with dictionary), and "zstd_no_dict" (for zstd without a dictionary).

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix license issues, add tests for zstd, zstd_no_dict, and lz4_native compressions.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix DCO and and issues with CodecTests.java.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix forbidden api violation error for lz4-java.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix license headers. Remove and fix unnecessary fields.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix magic numbers. Use more restrictive access modifiers.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Use protected access modifier for Zstd and LZ4 compression mode classes.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Allow negative compression levels for zstd. Use more restrictive access modifiers.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Use a more restrictive permission for loading zstd-jni and lz4-java libraries.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Rename a file (follow a consistent version naming convention).

Signed-off-by: Mulugeta Mammo <[email protected]>

* Refactor and create a new custom-codecs sandbox module.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Remove blank lines.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Restore Lucene92CustomCodec to extend from FilterCodec.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Make use of the compressionLevel argument.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Make Lucene92CustomCodec abstract and use a package-private access modifier.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix missing JavaDoc issues. Remove unused field in PerFieldMappingPostingFormatCodec.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix lint errors.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix the description for the custom-codecs plugin.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Fix wildcard import and improve documentation.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Access control exception fixed. Removed lz4-java support for now.

- PRs were made to zstd-jni and lz4-java to use AccessController.doPrivileged.
- The zstd-jni PR is merged since version 1.5.4-1.
- The lz4-java support temporarily removed until the PR gets merged.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Upgrade plugin to use Lucene95Codec. Rename files accordingly.

- Upgrade plugin to use Lucene95Codec. Rename files accordingly.
- Fix lint issue with plugin-security.
- Remove thridPartyAudit that was there for supporting lz4-java.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Add test cases for compression/decompression. Other minor changes.

  - add test cases for compression/decompression.
  - rename package.
  - add a CHANGELOG entry.
  - add more checks for signed integer arithmetic.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Remove ES grant in plugin-security.policy. Fix minor javadoc issue.

  - Remove ES grant in plugin-security.policy file.
  - Replace @link and @see to fix javadoc error.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Upgrade jettison version to 1.5.4.

Signed-off-by: Mulugeta Mammo <[email protected]>

* Update SHA for jettison 1.5.4.

Signed-off-by: Mulugeta Mammo <[email protected]>

---------

Signed-off-by: Mulugeta Mammo <[email protected]>
Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]>
Signed-off-by: Valentin Mitrofanov <[email protected]>
reta pushed a commit that referenced this pull request Apr 5, 2023
* Add experimental support for zstd and lz4 (native) compression.

Add experimental support for zstd (with and without dictionary support) and lz4 (native) compressions as discussed in: #3354.

Users would be able to set the index.codec setting with the values "lz4_native" (for lz4 native), "zstd" (for zstd with dictionary), and "zstd_no_dict" (for zstd without a dictionary).



* Fix license issues, add tests for zstd, zstd_no_dict, and lz4_native compressions.



* Fix DCO and and issues with CodecTests.java.



* Fix forbidden api violation error for lz4-java.



* Fix license headers. Remove and fix unnecessary fields.



* Fix magic numbers. Use more restrictive access modifiers.



* Use protected access modifier for Zstd and LZ4 compression mode classes.



* Allow negative compression levels for zstd. Use more restrictive access modifiers.



* Use a more restrictive permission for loading zstd-jni and lz4-java libraries.



* Rename a file (follow a consistent version naming convention).



* Refactor and create a new custom-codecs sandbox module.



* Remove blank lines.



* Restore Lucene92CustomCodec to extend from FilterCodec.



* Make use of the compressionLevel argument.



* Make Lucene92CustomCodec abstract and use a package-private access modifier.



* Fix missing JavaDoc issues. Remove unused field in PerFieldMappingPostingFormatCodec.



* Fix lint errors.



* Fix the description for the custom-codecs plugin.



* Fix wildcard import and improve documentation.



* Access control exception fixed. Removed lz4-java support for now.

- PRs were made to zstd-jni and lz4-java to use AccessController.doPrivileged.
- The zstd-jni PR is merged since version 1.5.4-1.
- The lz4-java support temporarily removed until the PR gets merged.



* Upgrade plugin to use Lucene95Codec. Rename files accordingly.

- Upgrade plugin to use Lucene95Codec. Rename files accordingly.
- Fix lint issue with plugin-security.
- Remove thridPartyAudit that was there for supporting lz4-java.



* Add test cases for compression/decompression. Other minor changes.

  - add test cases for compression/decompression.
  - rename package.
  - add a CHANGELOG entry.
  - add more checks for signed integer arithmetic.



* Remove ES grant in plugin-security.policy. Fix minor javadoc issue.

  - Remove ES grant in plugin-security.policy file.
  - Replace @link and @see to fix javadoc error.



* Upgrade jettison version to 1.5.4.



* Update SHA for jettison 1.5.4.



---------



(cherry picked from commit f071c9b)

Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]>
@reta
Copy link
Collaborator

reta commented Apr 5, 2023

@mulugetam @dblock for 2.7.0, we should wait for ZSTD 1.5.5 update (the JNI one), they fixed rare corruption bug [1], thanks

[1] https://github.com/facebook/zstd/releases/tag/v1.5.5

mulugetam added a commit to mulugetam/documentation-website that referenced this pull request May 1, 2023
  - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577).

Signed-off-by: Mulugeta Mammo <[email protected]>
mulugetam added a commit to mulugetam/documentation-website that referenced this pull request May 1, 2023
  - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577).

Signed-off-by: Mulugeta Mammo <[email protected]>
Naarcha-AWS added a commit to opensearch-project/documentation-website that referenced this pull request Jun 5, 2023
* Add documentation for the custom-codecs plugin.

  - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577).

Signed-off-by: Mulugeta Mammo <[email protected]>

* Update _api-reference/index-apis/create-index.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _install-and-configure/plugins.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _install-and-configure/plugins.md

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
opensearch-trigger-bot bot pushed a commit to opensearch-project/documentation-website that referenced this pull request Jun 5, 2023
* Add documentation for the custom-codecs plugin.

  - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577).

Signed-off-by: Mulugeta Mammo <[email protected]>

* Update _api-reference/index-apis/create-index.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _install-and-configure/plugins.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _install-and-configure/plugins.md

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
(cherry picked from commit 817b730)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Naarcha-AWS added a commit to opensearch-project/documentation-website that referenced this pull request Jun 5, 2023
* Add documentation for the custom-codecs plugin.

  - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577).



* Update _api-reference/index-apis/create-index.md



* Update _install-and-configure/plugins.md



* Update _install-and-configure/plugins.md



---------




(cherry picked from commit 817b730)

Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <[email protected]>
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
)

* Add documentation for the custom-codecs plugin.

  - Add documentation for the custom-codecs plugin (opensearch-project/OpenSearch#3577).

Signed-off-by: Mulugeta Mammo <[email protected]>

* Update _api-reference/index-apis/create-index.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _install-and-configure/plugins.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _install-and-configure/plugins.md

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Mulugeta Mammo <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants