Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YAML] - Kafka write and RAW format #29160

Merged
merged 2 commits into from
Nov 1, 2023

Conversation

ffernandez92
Copy link
Contributor

addresses #28664


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@ffernandez92
Copy link
Contributor Author

ffernandez92 commented Oct 27, 2023

This pull request completes the Kafka integration and RAW feature implementation.

I conducted a quick test using Dataflow (DF) to validate that everything functions as expected, in addition to the unit tests. The test results indicate that everything is working fine.

Screenshot 2023-10-27 at 11 41 12

This is the YAML config I've used (writing here for documenting purposes):

pipeline:
  transforms:
    - type: ReadFromKafka
      config:
        topic: input_topic
        format: RAW
        bootstrap_servers: <kafka_bootstrap_servers>
    - type: WriteToKafka
      config:
        topic: output_topic
        format: RAW
        bootstrap_servers: <kafka_bootstrap_servers>
      input: ReadFromKafka.output

@robertwb Could you please direct me to the location where the YAML documentation is being authored? I've noticed that there are some examples in the README files, but I'm unsure if there is any documentation on the Beam website or if the two are related in any way.

@codecov
Copy link

codecov bot commented Oct 27, 2023

Codecov Report

Merging #29160 (400ef71) into master (e98e37f) will decrease coverage by 0.04%.
Report is 35 commits behind head on master.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #29160      +/-   ##
==========================================
- Coverage   38.36%   38.33%   -0.04%     
==========================================
  Files         687      688       +1     
  Lines      101741   101844     +103     
==========================================
+ Hits        39036    39037       +1     
- Misses      61126    61228     +102     
  Partials     1579     1579              
Flag Coverage Δ
python 29.93% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @liferoad for label python.
R: @robertwb for label java.
R: @bvolpato for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@ffernandez92
Copy link
Contributor Author

Run Python_Coverage PreCommit

@ffernandez92
Copy link
Contributor Author

R: @brucearctor

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@ffernandez92
Copy link
Contributor Author

@brucearctor the Python tests / Python Unit Tests (macos-lastes, 3.8, py38) has failed (It doesn't make sense to fail) but this one doesn't have the usual "Run" option in the comment section to retry. Do you have permissions to re-run it?

@robertwb
Copy link
Contributor

Looks like all tests are passing now.

As for documentation, yeah, I've been putting things in markdown files right in the directory as a start before we're ready to call this stable (which is coming up with 2.52). I've filed #29165 which could probably have sub-issues filed.

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

final SerializableFunction<Row, byte[]> toBytesFn;
if (configuration.getFormat().equals("RAW")) {
int numFields = inputSchema.getFields().size();
if (numFields != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps check its type as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this as a follow-up.

@@ -47,6 +50,9 @@ public class KafkaWriteSchemaTransformProviderTest {

private static final Schema BEAMSCHEMA =
Schema.of(Schema.Field.of("name", Schema.FieldType.STRING));

private static final Schema BEAMRAWSCHEMA =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why aren't these CAP_UNDERSCORE_CASE?

@robertwb robertwb merged commit 9e65a10 into apache:master Nov 1, 2023
99 checks passed
@ffernandez92 ffernandez92 deleted the kafka-yaml-write branch April 23, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants