diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt index 92644131ea..bcae11fda2 100644 --- a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt +++ b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt @@ -84,6 +84,7 @@ Levenshtein [Oo]nboarding p\d{2} pebibyte +[Pp]erformant [Pp]laintext [Pp]luggable [Pp]reconfigure diff --git a/_community_members/sb2k16.md b/_community_members/sb2k16.md new file mode 100644 index 0000000000..80d1c91eda --- /dev/null +++ b/_community_members/sb2k16.md @@ -0,0 +1,24 @@ +--- +short_name: sb2k16 +name: Souvik Bose +photo: '/assets/media/community/members/sb2k16.jpg' +title: 'OpenSearch Community Member: Souvik Bose' +primary_title: Souvik Bose +breadcrumbs: + icon: community + items: + - title: Community + url: /community/index.html + - title: Members + url: /community/members/index.html + - title: 'Souvik Bose's Profile' + url: '/community/members/souvik-bose.html' +github: sb2k16 +job_title_and_company: 'Software Development Engineer at AWS' +personas: + - author +permalink: '/community/members/souvik-bose.html' +redirect_from: '/authors/sb2k16/' +--- + +Souvik Bose is a Software Development Engineer at AWS working on search services. He is a contributor to the Data Prepper project. \ No newline at end of file diff --git a/_posts/2024-10-15-Announcing-Data-Prepper-2.10.0.md b/_posts/2024-10-15-Announcing-Data-Prepper-2.10.0.md new file mode 100644 index 0000000000..ba227909fb --- /dev/null +++ b/_posts/2024-10-15-Announcing-Data-Prepper-2.10.0.md @@ -0,0 +1,146 @@ +--- +layout: post +title: Announcing Data Prepper 2.10.0 +authors: +- sb2k16 +- dvenable +date: 2024-10-15 12:30:00 -0600 +categories: + - releases +excerpt: Data Prepper 2.10.0 offers an OpenSearch _bulk API and reads from Amazon Kinesis. +meta_keywords: Data Prepper, OpenSearch bulk API, Kinesis data streams, Kafka, SASL/SCRAM authentication, streaming data ingestion +meta_description: Data Prepper 2.10.0 adds OpenSearch API and Kinesis Data Streams sources for seamless ingestion, plus Kafka SASL/SCRAM support and OpenTelemetry log parsing. +--- + +## Introduction + +Data Prepper 2.10 is now available! +Two major features include a source that sends data to Data Prepper using an API mimicking the OpenSearch `_bulk` API and the ability to read from Amazon Kinesis Data Streams. + + +## OpenSearch API source + +Many existing OpenSearch clients that perform ingestion directly to OpenSearch can now send that data to Data Prepper first. +This means that you can use Data Prepper's buffering and rich processor set before sending data to OpenSearch without having to change clients that are using the OpenSearch `_bulk` API. +A new Data Prepper source named `opensearch_api` has been added that accepts [OpenSearch Document API bulk operation](https://opensearch.org/docs/latest/api-reference/document-apis/bulk/) requests from clients using REST and ingests data into OpenSearch. +The behavior of this source is also quite similar to the existing `http` source. +It supports industry-standard encryption in the form of TLS/HTTPS and HTTP basic authentication. +It also parses incoming requests and creates Data Prepper events and associated event metadata, making it compatible with the `opensearch` sink. +The request body is compatible with the OpenSearch Document API bulk operation and supports all actions: index, create, delete, and update. + +The following two HTTP methods are now supported: + +``` +POST _bulk +POST /_bulk +``` + +The second API specifies the index in the path, so you don't need to include it in the request body. + +Additionally, the following OpenSearch Document API bulk operation query parameters are supported: + +* `pipeline` +* `routing` + +The following example demonstrates how to use the source: + +``` +version: "2" +opensearch-api-pipeline: + source: + opensearch_api: + sink: + - opensearch: + hosts: ["https://opensearch:9200"] + insecure: true + username: "admin" + password: "admin" + index: "${getMetadata(\"opensearch_index\")}" + action: "${getMetadata(\"opensearch_action\")}" + document_id: "${getMetadata(\"opensearch_id\")}" + routing: "${getMetadata(\"opensearch_routing\")}" + pipeline: "${getMetadata(\"opensearch_pipeline\")}" +``` + +Consider the following example request: + +``` +POST _bulk +{ "index": { "_index": "movies", "_id": "tt1979320" } } +{ "title": "Rush", "year": 2013 } +``` + +This request will be ingested into OpenSearch, and a new document will be created under the index `movies` with the document ID `tt1979320` and the document source `{ "title": "Rush", "year": 2013 }`. + +The Data Prepper maintainers are interested in further expanding this source to support other indexing APIs, allowing it to stand in for an OpenSearch cluster in ingestion workloads. +To learn more or provide feedback, see [Provide an OpenSearch API source #4180](https://github.com/opensearch-project/data-prepper/issues/4180). + + +## Kinesis source + +[Amazon Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/introduction.html) is a high-speed streaming data service. +Data Prepper has also introduced a new source named `kinesis` that can be used to ingest stream record data from multiple Kinesis data streams into OpenSearch clusters. +You can configure it to read stream records from either the oldest untrimmed record or from the most recent record. +Moreover, if you enable end-to-end acknowledgements, Kinesis data streams will be checkpointed to prevent duplicate processing of records. + +The following is an example pipeline: + +``` +version: "2" +kinesis-pipeline: + source: + kinesis: + codec: + newline: + streams: + - stream_name: "MyStream1" + initial_position: LATEST + checkpoint_interval: "PT5M" + - stream_name: "MyStream2" + # Enable this if ingestion should start from the start of the stream. + initial_position: EARLIEST + consumer_strategy: "polling" + polling: + max_polling_records: 100 + idle_time_between_reads: "250ms" +``` + + + +## Other features and improvements + +Data Prepper 2.10 has introduced a number of other improvements: + +* The `kafka` source now supports authentication with an Apache Kafka cluster using SASL/SCRAM in addition to the SASL/PLAIN authentication provided in previous versions. +* Data Prepper can now parse OpenTelemetry logs from sources such as Amazon Simple Storage Service (Amazon S3). The new `otel_logs` codec parses data from OpenTelemetry Protocol (OTLP) JSON-formatted files. Now you can write OpenTelemetry logs from [AWS S3 Exporter for OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/awss3exporter/README.md) and read these using Data Prepper. +* Additionally, the maintainers have worked to improve performance through the addition of an internal cache for event keys. Data Prepper administrators can configure this cache as necessary. + + +## Next steps + +* To download Data Prepper, visit the [OpenSearch downloads](https://opensearch.org/downloads.html) page. +* For instructions on how to get started with Data Prepper, see [Getting started with Data Prepper](https://opensearch.org/docs/latest/data-prepper/getting-started/). +* To learn more about the work in progress for Data Prepper 2.11 and other releases, see the [Data Prepper Project Roadmap](https://github.com/orgs/opensearch-project/projects/221). + +## Thanks to our contributors! + +The following community members contributed to this release. Thank you! + +* [chenqi0805](https://github.com/chenqi0805) -- Qi Chen +* [danhli](https://github.com/danhli) -- Daniel Li +* [dependabot[bot]](https://github.com/apps/dependabot) +* [dinujoh](https://github.com/dinujoh) -- Dinu John +* [dlvenable](https://github.com/dlvenable) -- David Venable +* [franky-m](https://github.com/franky-m) +* [graytaylor0](https://github.com/graytaylor0) -- Taylor Gray +* [jayeshjeh](https://github.com/jayeshjeh) -- Jayesh Parmar +* [KarstenSchnitter](https://github.com/KarstenSchnitter) -- Karsten Schnitter +* [kkondaka](https://github.com/kkondaka) -- Krishna Kondaka +* [LeeroyHannigan](https://github.com/LeeroyHannigan) -- Lee +* [linghengqian](https://github.com/linghengqian) -- Ling Hengqian +* [oeyh](https://github.com/oeyh) -- Hai Yan +* [quanghungb](https://github.com/quanghungb) -- qhung +* [san81](https://github.com/san81) -- Santhosh Gandhe +* [sb2k16](https://github.com/sb2k16) -- Souvik Bose +* [shenkw1](https://github.com/shenkw1) -- Katherine Shen +* [srikanthjg](https://github.com/srikanthjg) -- Srikanth Govindarajan diff --git a/assets/media/community/members/sb2k16.jpeg b/assets/media/community/members/sb2k16.jpeg new file mode 100644 index 0000000000..e607f7ff64 Binary files /dev/null and b/assets/media/community/members/sb2k16.jpeg differ