Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for custom logical types in DatastreamIO #1851

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,7 @@
<exclude>**/constants/**</exclude>
<exclude>**/CustomTransformationImplFetcher.*</exclude>
<exclude>**/JarFileReader.*</exclude>
<exclude>**/CustomTransformationWithShardForIT.*</exclude>
<exclude>**/CustomTransformationWithShardFor*IT.*</exclude>
</excludes>
</configuration>
<executions>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,11 @@
public static class CustomAvroTypes {
public static final String VARCHAR = "varchar";
public static final String NUMBER = "number";
public static final String TIME_INTERVAL_MICROS = "time-interval-micros";
}

static final String LOGICAL_TYPE = "logicalType";

static final Logger LOG = LoggerFactory.getLogger(FormatDatastreamRecordToJson.class);
static final DateTimeFormatter DEFAULT_DATE_FORMATTER = DateTimeFormatter.ISO_LOCAL_DATE;
static final DateTimeFormatter DEFAULT_TIMESTAMP_WITH_TZ_FORMATTER =
Expand Down Expand Up @@ -356,10 +359,19 @@

static void putField(
String fieldName, Schema fieldSchema, GenericRecord record, ObjectNode jsonObject) {
// fieldSchema.getLogicalType() returns object of type org.apache.avro.LogicalType,
// therefore, is null for custom logical types
if (fieldSchema.getLogicalType() != null) {
// Logical types should be handled separately.
handleLogicalFieldType(fieldName, fieldSchema, record, jsonObject);
return;
} else if (fieldSchema.getProp(LOGICAL_TYPE) != null) {
// Handling for custom logical types.
boolean isSupportedCustomType =
handleCustomLogicalType(fieldName, fieldSchema, record, jsonObject);

Check warning on line 371 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L370-L371

Added lines #L370 - L371 were not covered by tests
if (isSupportedCustomType) {
return;

Check warning on line 373 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L373

Added line #L373 was not covered by tests
}
}

switch (fieldSchema.getType()) {
Expand Down Expand Up @@ -419,6 +431,45 @@
}
}

static boolean handleCustomLogicalType(
String fieldName, Schema fieldSchema, GenericRecord element, ObjectNode jsonObject) {
if (fieldSchema.getProp(LOGICAL_TYPE).equals(CustomAvroTypes.TIME_INTERVAL_MICROS)) {
Long timeMicrosTotal = (Long) element.get(fieldName);
boolean isNegative = false;

Check warning on line 438 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L437-L438

Added lines #L437 - L438 were not covered by tests
if (timeMicrosTotal < 0) {
timeMicrosTotal *= -1;
isNegative = true;

Check warning on line 441 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L440-L441

Added lines #L440 - L441 were not covered by tests
}
Long nanoseconds = timeMicrosTotal * TimeUnit.MICROSECONDS.toNanos(1);
Long hours = TimeUnit.NANOSECONDS.toHours(nanoseconds);
nanoseconds -= TimeUnit.HOURS.toNanos(hours);
Long minutes = TimeUnit.NANOSECONDS.toMinutes(nanoseconds);
nanoseconds -= TimeUnit.MINUTES.toNanos(minutes);
Long seconds = TimeUnit.NANOSECONDS.toSeconds(nanoseconds);
nanoseconds -= TimeUnit.SECONDS.toNanos(seconds);
Long micros = TimeUnit.NANOSECONDS.toMicros(nanoseconds);

Check warning on line 450 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L443-L450

Added lines #L443 - L450 were not covered by tests
// Pad 0 if single digit hour.
String timeString =
(hours < 10) ? String.format("%02d", hours) : String.format("%d", hours);
timeString += String.format(":%02d:%02d", minutes, seconds);

Check warning on line 454 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L454

Added line #L454 was not covered by tests
if (micros > 0) {
timeString += String.format(".%d", micros);

Check warning on line 456 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L456

Added line #L456 was not covered by tests
}
String resultString = isNegative ? "-" + timeString : timeString;
jsonObject.put(fieldName, resultString);
return true;

Check warning on line 460 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L459-L460

Added lines #L459 - L460 were not covered by tests
} else if (fieldSchema.getProp(LOGICAL_TYPE).equals(CustomAvroTypes.NUMBER)) {
String number = element.get(fieldName).toString();
jsonObject.put(fieldName, number);
return true;

Check warning on line 464 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L462-L464

Added lines #L462 - L464 were not covered by tests
} else if (fieldSchema.getProp(LOGICAL_TYPE).equals(CustomAvroTypes.VARCHAR)) {
String varcharValue = element.get(fieldName).toString();
jsonObject.put(fieldName, varcharValue);
return true;

Check warning on line 468 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L466-L468

Added lines #L466 - L468 were not covered by tests
}
return false;

Check warning on line 470 in v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java

View check run for this annotation

Codecov / codecov/patch

v2/datastream-common/src/main/java/com/google/cloud/teleport/v2/datastream/transforms/FormatDatastreamRecordToJson.java#L470

Added line #L470 was not covered by tests
}

static void handleLogicalFieldType(
String fieldName, Schema fieldSchema, GenericRecord element, ObjectNode jsonObject) {
// TODO(pabloem) Actually test this.
Expand Down Expand Up @@ -456,12 +507,6 @@
jsonObject.put(
fieldName,
timestamp.atOffset(ZoneOffset.UTC).format(DEFAULT_TIMESTAMP_WITH_TZ_FORMATTER));
} else if (fieldSchema.getLogicalType().getName().equals(CustomAvroTypes.NUMBER)) {
String number = (String) element.get(fieldName);
jsonObject.put(fieldName, number);
} else if (fieldSchema.getLogicalType().getName().equals(CustomAvroTypes.VARCHAR)) {
String varcharValue = (String) element.get(fieldName);
jsonObject.put(fieldName, varcharValue);
} else {
LOG.error(
"Unknown field type {} for field {} in {}. Ignoring it.",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ public void setUp() throws IOException, InterruptedException {
createAndUploadJarToGcs("DatatypeIT");
CustomTransformation customTransformation =
CustomTransformation.builder(
"customTransformation.jar", "com.custom.CustomTransformationWithShardForIT")
"customTransformation.jar", "com.custom.CustomTransformationWithShardForLiveIT")
.build();
jobInfo =
launchDataflowJob(
Expand Down Expand Up @@ -410,7 +410,7 @@ private void assertAllDatatypeColumnsTableBackfillContents() {
row.put("decimal_column", "456.12");
row.put("datetime_column", "2024-02-08T08:15:30Z");
row.put("timestamp_column", "2024-02-08T08:15:30Z");
row.put("time_column", "29730000000");
row.put("time_column", "08:15:30");
row.put("year_column", "2022");
// text, char, tinytext, mediumtext, longtext are BYTE columns
row.put("text_column", "/u/9n58P");
Expand Down Expand Up @@ -448,7 +448,7 @@ private void assertAllDatatypeColumnsTableBackfillContents() {
row.put("decimal_column", 123.45);
row.put("datetime_column", "2024-02-09T15:30:45Z");
row.put("timestamp_column", "2024-02-09T15:30:45Z");
row.put("time_column", "55845000000");
row.put("time_column", "15:30:45");
row.put("year_column", "2023");
// text, char, tinytext, mediumtext, longtext are BYTE columns
row.put("text_column", "/u/9n58f");
Expand Down Expand Up @@ -496,7 +496,7 @@ private void assertAllDatatypeColumnsTableCdcContents() {
row.put("decimal_column", "456.12");
row.put("datetime_column", "2024-02-08T08:15:30Z");
row.put("timestamp_column", "2024-02-08T08:15:30Z");
row.put("time_column", "29730000000");
row.put("time_column", "08:15:30");
row.put("year_column", "2022");
// text, char, tinytext, mediumtext, longtext are BYTE columns
row.put("text_column", "/u/9n58P");
Expand Down Expand Up @@ -545,7 +545,7 @@ private void assertAllDatatypeColumns2TableBackfillContents() {
row.put("decimal_column", 456.12);
row.put("datetime_column", "2024-02-08T08:15:30Z");
row.put("timestamp_column", "2024-02-08T08:15:30Z");
row.put("time_column", "29730000000");
row.put("time_column", "08:15:30");
row.put("year_column", "2022");
row.put("char_column", "char_1");
// Source column value: 74696e79626c6f625f646174615f31 ( in BYTES, "tinyblob_data_1" in STRING)
Expand Down Expand Up @@ -578,7 +578,7 @@ private void assertAllDatatypeColumns2TableBackfillContents() {
row.put("decimal_column", 123.45);
row.put("datetime_column", "2024-02-09T15:30:45Z");
row.put("timestamp_column", "2024-02-09T15:30:45Z");
row.put("time_column", "55845000000");
row.put("time_column", "15:30:45");
row.put("year_column", "2023");
row.put("char_column", "char_2");
row.put("tinyblob_column", "dGlueWJsb2JfZGF0YV8y");
Expand Down Expand Up @@ -621,8 +621,7 @@ private void assertAllDatatypeTransformationTableBackfillContents() {
row.put("decimal_column", 23457.78);
row.put("datetime_column", "2022-12-31T23:59:58Z");
row.put("timestamp_column", "2022-12-31T23:59:58Z");
// TODO (b/349257952): update once TIME handling is made consistent for bulk and live.
// row.put("time_column", "86399001000");
row.put("time_column", "00:59:59");
row.put("year_column", "2023");
row.put("blob_column", "V29ybWQ=");
row.put("enum_column", "1");
Expand All @@ -643,8 +642,7 @@ private void assertAllDatatypeTransformationTableBackfillContents() {
row.put("decimal_column", 34568.89);
row.put("datetime_column", "2023-12-31T23:59:59Z");
row.put("timestamp_column", "2023-12-31T23:59:59Z");
// TODO (b/349257952): update once TIME handling is made consistent for bulk and live.
// row.put("time_column", "1000");
row.put("time_column", "01:00:00");
row.put("year_column", "2025");
row.put("blob_column", "V29ybWQ=");
row.put("enum_column", "1");
Expand All @@ -665,8 +663,7 @@ private void assertAllDatatypeTransformationTableBackfillContents() {
row.put("decimal_column", 45679.90);
row.put("datetime_column", "2021-11-11T11:11:10Z");
row.put("timestamp_column", "2021-11-11T11:11:10Z");
// TODO (b/349257952): update once TIME handling is made consistent for bulk and live.
// row.put("time_column", "40271001000");
row.put("time_column", "12:11:11");
row.put("year_column", "2022");
row.put("blob_column", "V29ybWQ=");
row.put("enum_column", "1");
Expand All @@ -677,7 +674,7 @@ private void assertAllDatatypeTransformationTableBackfillContents() {

SpannerAsserts.assertThatStructs(
spannerResourceManager.runQuery(
"SELECT varchar_column, tinyint_column, text_column, date_column, int_column, bigint_column, float_column, double_column, decimal_column, datetime_column, timestamp_column, year_column, blob_column, enum_column, bool_column, binary_column, bit_column FROM AllDatatypeTransformation"))
"SELECT varchar_column, tinyint_column, text_column, date_column, int_column, bigint_column, float_column, double_column, decimal_column, datetime_column, timestamp_column, time_column, year_column, blob_column, enum_column, bool_column, binary_column, bit_column FROM AllDatatypeTransformation"))
.hasRecordsUnorderedCaseInsensitiveColumns(events);
}

Expand All @@ -695,7 +692,7 @@ private void assertAllDatatypeTransformationTableCdcContents() {
row.put("decimal_column", 23456.79);
row.put("datetime_column", "2023-01-01T12:00:00Z");
row.put("timestamp_column", "2023-01-01T12:00:00Z");
row.put("time_column", "43200000000");
row.put("time_column", "12:00:00");
row.put("year_column", "2023");
row.put("blob_column", "EjRWeJCrze8=");
row.put("enum_column", "3");
Expand All @@ -716,7 +713,7 @@ private void assertAllDatatypeTransformationTableCdcContents() {
row.put("decimal_column", 34567.90);
row.put("datetime_column", "2024-01-02T00:00:00Z");
row.put("timestamp_column", "2024-01-02T00:00:00Z");
row.put("time_column", "3600000000");
row.put("time_column", "01:00:00");
row.put("year_column", "2025");
row.put("blob_column", "q83vEjRWeJA=");
row.put("enum_column", "1");
Expand Down Expand Up @@ -746,7 +743,7 @@ private void assertAllDatatypeColumns2TableCdcContents() {
row.put("decimal_column", 456.12);
row.put("datetime_column", "2024-02-08T08:15:30Z");
row.put("timestamp_column", "2024-02-08T08:15:30Z");
row.put("time_column", "29730000000");
row.put("time_column", "08:15:30");
row.put("year_column", "2022");
row.put("char_column", "char_1");
// Source column value: 74696e79626c6f625f646174615f31 ( in BYTES, "tinyblob_data_1" in STRING)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ public void setUp() throws IOException, InterruptedException {
createAndUploadJarToGcs("shard1");
CustomTransformation customTransformation =
CustomTransformation.builder(
"customTransformation.jar", "com.custom.CustomTransformationWithShardForIT")
"customTransformation.jar", "com.custom.CustomTransformationWithShardForLiveIT")
.build();
if (jobInfo1 == null) {
jobInfo1 =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ public void setUp() throws IOException, InterruptedException {
createAndUploadJarToGcs(gcsResourceManager);
CustomTransformation customTransformation =
CustomTransformation.builder(
"customTransformation.jar", "com.custom.CustomTransformationWithShardForIT")
"customTransformation.jar", "com.custom.CustomTransformationWithShardForLiveIT")
.build();
launchWriterDataflowJob(customTransformation);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ public void simpleTest() throws Exception {
createAndUploadJarToGcs("CustomTransformationAllTypes");
CustomTransformation customTransformation =
CustomTransformation.builder(
"customTransformation.jar", "com.custom.CustomTransformationWithShardForIT")
"customTransformation.jar", "com.custom.CustomTransformationWithShardForBulkIT")
.build();
jobInfo =
launchDataflowJob(
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
/*
* Copyright (C) 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You may obtain a copy of
* the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations under
* the License.
*/
package com.custom;

import com.google.cloud.teleport.v2.spanner.exceptions.InvalidTransformationException;
import com.google.cloud.teleport.v2.spanner.utils.ISpannerMigrationTransformer;
import com.google.cloud.teleport.v2.spanner.utils.MigrationTransformationRequest;
import com.google.cloud.teleport.v2.spanner.utils.MigrationTransformationResponse;
import java.text.SimpleDateFormat;
import java.time.format.DateTimeFormatter;
import java.util.Calendar;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class CustomTransformationWithShardForBulkIT implements ISpannerMigrationTransformer {

private static final Logger LOG = LoggerFactory.getLogger(CustomShardIdFetcher.class);

@Override
public void init(String parameters) {
LOG.info("init called with {}", parameters);
}

@Override
public MigrationTransformationResponse toSpannerRow(MigrationTransformationRequest request)
throws InvalidTransformationException {
if (request.getTableName().equals("Customers")) {
Map<String, Object> row = new HashMap<>(request.getRequestRow());
row.put("full_name", row.get("first_name") + " " + row.get("last_name"));
row.put("migration_shard_id", request.getShardId() + "_" + row.get("id"));
MigrationTransformationResponse response = new MigrationTransformationResponse(row, false);
return response;
} else if (request.getTableName().equals("AllDatatypeTransformation")) {
Map<String, Object> row = new HashMap<>(request.getRequestRow());
// Filter event in case "varchar_column" = "example1"
if (row.get("varchar_column").equals("example1")) {
return new MigrationTransformationResponse(null, true);
}
// In case of update events, return request as response without any transformation
if (request.getEventType().equals("UPDATE-INSERT")) {
return new MigrationTransformationResponse(null, false);
}
// In case of backfill update the values for all the columns in all the rows except the
// filtered row.
row.put("tinyint_column", (Long) row.get("tinyint_column") + 1);
row.put("text_column", row.get("text_column") + " append");
row.put("int_column", (Long) row.get("int_column") + 1);
row.put("bigint_column", (Long) row.get("bigint_column") + 1);
row.put("float_column", (double) row.get("float_column") + 1);
row.put("double_column", (double) row.get("double_column") + 1);
Double value = Double.parseDouble((String) row.get("decimal_column"));
row.put("decimal_column", String.valueOf(value + 1));
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("HH:mm:ss");
row.put("bool_column", 1);
row.put("enum_column", "1");
row.put("blob_column", "576f726d64");
row.put("binary_column", "0102030405060708090A0B0C0D0E0F1011121314");
row.put("bit_column", 13);
row.put("year_column", (Long) row.get("year_column") + 1);
try {
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
SimpleDateFormat dateTimeFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssX");
Date date = dateFormat.parse((String) row.get("date_column"));
Calendar calendar = Calendar.getInstance();
calendar.setTime(date);
calendar.add(Calendar.DAY_OF_MONTH, 1);
row.put("date_column", dateFormat.format(calendar.getTime()));
Date dateTime = dateTimeFormat.parse((String) row.get("datetime_column"));
calendar.setTime(dateTime);
calendar.add(Calendar.SECOND, -1);
row.put("datetime_column", dateTimeFormat.format(calendar.getTime()));
dateTime = dateTimeFormat.parse((String) row.get("timestamp_column"));
calendar.setTime(dateTime);
calendar.add(Calendar.SECOND, -1);
row.put("timestamp_column", dateTimeFormat.format(calendar.getTime()));

} catch (Exception e) {
throw new InvalidTransformationException(e);
}

// These types are currently only used bulk ITs for custom jars.
if (row.containsKey("varbinary_column")) {
row.put("varbinary_column", "0102030405060708090A0B0C0D0E0F1011121314");
}
if (row.containsKey("char_column")) {
row.put("char_column", "newchar");
}
if (row.containsKey("longblob_column")) {
row.put("longblob_column", "576f726d64");
}
if (row.containsKey("longtext_column")) {
row.put("longtext_column", row.get("longtext_column") + " append");
}
if (row.containsKey("mediumblob_column")) {
row.put("mediumblob_column", "576f726d64");
}
if (row.containsKey("mediumint_column")) {
row.put("mediumint_column", (Long) row.get("mediumint_column") + 1);
}
if (row.containsKey("mediumtext_column")) {
row.put("mediumtext_column", row.get("mediumtext_column") + " append");
}
if (row.containsKey("set_column")) {
row.put("set_column", "v3");
}
if (row.containsKey("smallint_column")) {
row.put("smallint_column", (Long) row.get("smallint_column") + 1);
}
if (row.containsKey("tinyblob_column")) {
row.put("tinyblob_column", "576f726d64");
}
if (row.containsKey("tinytext_column")) {
row.put("tinytext_column", row.get("tinytext_column") + " append");
}
if (row.containsKey("json_column")) {
row.put("json_column", "{\"k1\": \"v1\", \"k2\": \"v2\"}");
}
MigrationTransformationResponse response = new MigrationTransformationResponse(row, false);
return response;
}
return new MigrationTransformationResponse(null, false);
}

@Override
public MigrationTransformationResponse toSourceRow(MigrationTransformationRequest request)
throws InvalidTransformationException {
return new MigrationTransformationResponse(null, false);
}
}
Loading
Loading