Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration for openhouse-java-iceberg-1.5-runtime and openhouse-spark-3.5-runtime #221

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jiang95-dev
Copy link
Collaborator

@jiang95-dev jiang95-dev commented Oct 3, 2024

Summary

Adding 2 integration artifacts: openhouse-java-iceberg-1.5-runtime and openhouse-spark-3.5-runtime. They are leveraging the iceberg-core:1.5.2 and iceberg-spark-runtime-3.5_2.12, and can only be compiled under Java 9+. This client side upgrade is the step after the server side upgrade, and apps upgrade will be the next step.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

There are many duplicating files. Files worth reviewing are the following:

  • build.gradle: Spark 3.5 needs jackson library to be at least 2.15.
  • openhouse.iceberg-conventions-1.5.gradle: This is a duplicate change from server upgrade.
  • java-runtime/build.gradle
  • java-itest/build.gradle
  • spark-runtime/build.gradle
  • spark-runtime: Changed the interface and implemented new functions for logical plans and execution plans to pass the build.
  • spark-itest/build.gradle: Added jvmArgs for Java 17.
  • spark-itest: Half of the tests need to be refactored since Spark 3.5 brought in some changes.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

All unit tests and integration tests passed.

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

@jiang95-dev jiang95-dev changed the title [Draft] Add integration for openhouse-java-iceberg-1.5-runtime and openhouse-spark-3.5-runtime Add integration for openhouse-java-iceberg-1.5-runtime and openhouse-spark-3.5-runtime Oct 11, 2024
@jiang95-dev jiang95-dev marked this pull request as ready for review October 11, 2024 07:16
@@ -0,0 +1,15 @@
plugins {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should layout directories like

integrations/java/iceberg-1.2/..
integrations/java/iceberg-1.5/..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➕ for maintaining backward compatibility, you can use settings.gradle's module renaming feature.

@@ -0,0 +1,579 @@
package com.linkedin.openhouse.javaclient;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is code duplication the only way? Have we considered inheritance to share the code?

Copy link
Collaborator

@HotSushi HotSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Levi for the PR. Most of this code seem to be duplicated, i've mentioned some tricks to reuse existing code, can we give that a try.
If there were any code changes, could you mark them with a github comment so that its easier to review?

force 'org.apache.orc:orc-core:1.8.3'
force 'com.google.guava:guava:31.1-jre'
if (it.path != ':integrations:spark-3.5:openhouse-spark-3.5-itest') {
configurations.all {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't this be overwritten in openhouse-spark-3.5-itest's build.gradle? and leave the changes in this file be..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration takes precedence over the subproject's build file.

@@ -0,0 +1,15 @@
plugins {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➕ for maintaining backward compatibility, you can use settings.gradle's module renaming feature.

import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;

public class OpenHouseTableOperationsTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not reuse the source from other module if its unchanged?

sourceSets {
    test {
        java {
            srcDir project(':other-module').sourceSets.test.java.srcDirs
        }
        resources {
            srcDir project(':other-module').sourceSets.test.resources.srcDirs
        }
    }
}

}

dependencies {
compileOnly project(':client:secureclient')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can declare dependency on java moduel for 1.2 if the code is not changing. basically:

dependencies {
         api (java-rntime_iceberg_1.2) {
                  transitive=false
        }
}

// except for source code written in the module. As a result,
// we remove chances of classpath conflicts during runtime/compiletime.
shadowJar {
dependencies {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shadow block will be thin if we reuse code from iceberg 1.2 module

* com.linkedin.openhouse.javaclient.OpenHouseMetricsReporter property
*/
@Slf4j
public class OpenHouseMetricsReporter implements MetricsReporter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any code changes for this class?

package com.linkedin.openhouse.javaclient;

import com.google.common.annotations.VisibleForTesting;
import com.google.gson.Gson;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any code changes for this class?

@@ -0,0 +1,2 @@
com.linkedin.openhouse.relocated.org.springframework.http.codec.ClientCodecConfigurer=com.linkedin.openhouse.relocated.org.springframework.http.codec.support.DefaultClientCodecConfigurer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't be needed if we reuse 1.2 artifact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants