Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync with open source how #118

Draft
wants to merge 5,274 commits into
base: li_trunk
Choose a base branch
from
Draft

sync with open source how #118

wants to merge 5,274 commits into from

Conversation

lesterhaynes
Copy link

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

Polber and others added 21 commits October 2, 2024 15:36
* [yaml] package kafka_clients 3.1.2 in Kafka Provider jar

Signed-off-by: Jeffrey Kinard <[email protected]>

* fix test failure

Signed-off-by: Jeffrey Kinard <[email protected]>

---------

Signed-off-by: Jeffrey Kinard <[email protected]>
* Delete stale and unused .test-infra/pipelines directory

* Update settings.gradle.kts to remove test-infra-pipelines
* add documentation to IcebergIO's java class

* add example; trigger ITs

* nit
* Update direct_runner.py

Improve the error message for DirectRunner.

* Update direct_runner.py
* Optimize to skip filter application when there is only a single output

* Make SparkTransformOverrides class public for testing

* add related test

* Touch trigger files

* add CHANGES.md
* Adding insertion and enrichment pipeline

* Enhanced Data Schema

* Added Apache Licensed to the notebook

* Adding Chunking Strategy

* removed unused imports

* Modified insertion logic in redis for incorporating chunking strategy

* refacted redis code

* code review changes

* Added chunking code in notebook

* Added code review changes

* Code review changes: using chunking strategy as enum

* Added Code Review Changes

* Code review changes

* Added code review changes

* Added Code Review Changes

* Code review changes

* Ingestion and Enrichment pipeline for OpenSearch Vector DB

* Added logic for reading password from .env file

* Added opensearch vector notebook

* Update credentials.env

* Added code review changes

* Added Description in opensearch notebook

* Added description in opensearch notebook

* Code review changes
Bumps [go.mongodb.org/mongo-driver](https://github.com/mongodb/mongo-go-driver) from 1.17.0 to 1.17.1.
- [Release notes](https://github.com/mongodb/mongo-go-driver/releases)
- [Commits](mongodb/mongo-go-driver@v1.17.0...v1.17.1)

---
updated-dependencies:
- dependency-name: go.mongodb.org/mongo-driver
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The original example was not actually counting the produce but grouping the produce per season. Maybe it's better to rename the variables to reflect this, in order to not confuse the reader.
1. Added links for different handlers and removed code for unreachable conditions
2. Removed patch decorator in test
liferoad and others added 30 commits October 28, 2024 16:43
* Remove remaining Python 3.8 Artifacts

* Clear container files

* Move TensorRT suite to 3.10
* Suppress future warning affecting Dataflow notebook

* fix lint
* use ExecutorService instead of ScheduledExecutorService which swallows exceptions into futures that were not examined

Co-authored-by: Arun Pandian <[email protected]>
* Copy in correct requirements file

* Trigger postcommit
* Reapply disable Gradle cache for expansion service

This reverts commit 379dcd4.

* trigger test
* Upgrade mypy to version 1.13.0

* formatting, yaml io fix
Update MLTransform code to PEP 585 types
…ks (#32856)

Bumps [github.com/nats-io/nats-server/v2](https://github.com/nats-io/nats-server) from 2.10.18 to 2.10.22.
- [Release notes](https://github.com/nats-io/nats-server/releases)
- [Changelog](https://github.com/nats-io/nats-server/blob/main/.goreleaser.yml)
- [Commits](nats-io/nats-server@v2.10.18...v2.10.22)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats-server/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [yaml] Enhance YAML API docs

Signed-off-by: Jeffrey Kinard <[email protected]>

* use single html file

Signed-off-by: Jeffrey Kinard <[email protected]>

* fix test failures

Signed-off-by: Jeffrey Kinard <[email protected]>

* rebase on master

Signed-off-by: Jeffrey Kinard <[email protected]>

---------

Signed-off-by: Jeffrey Kinard <[email protected]>
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.202.0 to 0.203.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](googleapis/google-api-go-client@v0.202.0...v0.203.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* playground precommit move to selfhosted and update

* playground precommit move to selfhosted and update
* Add JobServerOption for jar_cache_dir

Signed-off-by: s21.lee <[email protected]>

* fixed for job_server_test error

- add missing comma

Signed-off-by: s21.lee <[email protected]>

* fix error for missing comma

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

* fix for unit test error

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

* fix test error

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

* fix error for typo

Signed-off-by: s21lee <[email protected]>

* fix test error

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

* fix error

Signed-off-by: s21lee <[email protected]>

* fix error

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

* fix error

Signed-off-by: s21lee <[email protected]>

* fix error

Signed-off-by: s21lee <[email protected]>

* fix error

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

* fix error and lint

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

* fix lint

Signed-off-by: s21lee <[email protected]>

---------

Signed-off-by: s21.lee <[email protected]>
Signed-off-by: s21lee <[email protected]>
Co-authored-by: s21.lee <[email protected]>
* Update docs for MapState and SetState support update

* remove the code formatting

* clarify state support for the Dataflow runner

* clarify state support for the Dataflow runner

* Update website/www/site/data/capability_matrix.yaml

Co-authored-by: Danny McCormick <[email protected]>

---------

Co-authored-by: Danny McCormick <[email protected]>
…32979)

* The motivation for this change is to support caching in Apache Beam.

Apache Beam does the following:
- Pickle Python code
- Send the pickled source code to "worker" VMs
- The workers unpickle and execute the code

In the environment that these Beam pipelines execute, the source code is
in a temporary directory whose name is random and changes. The source
code paths relative to the temporary directory are constant. Using
absolute paths prevents pickled code from being cached because the
absolute path keeps changing. Using relative paths enables this caching
and promises significant resource savings and speed-ups.

Additionally the absolute paths leak information about the directory
structure of the machine pickling the source code. When the pickled code
is passed across the network to another machine, the absolute paths may
no longer be valid when the other machine has a different directory
structure.

The reason for using relative paths rather than omitting the path
entirely is because Python uses the co_filename attribute to create
stack traces.

* The motivation for this change is to support caching in Apache Beam
for Google.

Apache Beam does the following:
- Pickle Python code
- Send the pickled source code to "worker" VMs
- The workers unpickle and execute the code

In the environment that these Beam pipelines execute, the source code is
in a temporary directory whose name is random and changes. The source
code paths relative to the temporary directory are constant. Using
absolute paths prevents pickled code from being cached because the
absolute path keeps changing. Using relative paths enables this caching
and promises significant resource savings and speed-ups.

Additionally the absolute paths leak information about the directory
structure of the machine pickling the source code. When the pickled code
is passed across the network to another machine, the absolute paths may
no longer be valid when the other machine has a different directory
structure.

The reason for using relative paths rather than omitting the path
entirely is because Python uses the co_filename attribute to create
stack traces.

* Simplify.

---------

Co-authored-by: Robert Bradshaw <[email protected]>
* Remove usage of deprecated _serialize

* Correct assignment

* indentation

* lint

* fmt

* fmt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.