-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a node id to all spans #896
Add a node id to all spans #896
Conversation
…r random UUID (as per env variables). Put that worker/node id into all of the spans as a resource and continue to use it to thread the MDC environment for the work coordinator. Signed-off-by: Greg Schohn <[email protected]>
6a2d28e
to
be81e7d
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #896 +/- ##
============================================
- Coverage 78.66% 78.58% -0.09%
- Complexity 2515 2517 +2
============================================
Files 386 387 +1
Lines 14981 15012 +31
Branches 920 923 +3
============================================
+ Hits 11785 11797 +12
- Misses 2636 2651 +15
- Partials 560 564 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@gregschohn Thanks for doing this - just in time for me to use. I'm adding this to my review queue. |
var workCoordinator = new OpenSearchWorkCoordinator( | ||
new CoordinateWorkHttpClient(connectionContext), | ||
TOLERABLE_CLIENT_SERVER_CLOCK_DIFFERENCE_SECONDS, | ||
workerId | ||
); | ||
MDC.put(LOGGING_MDC_WORKER_ID, workerId); // I don't see a need to clean this up since we're in main | ||
TryHandlePhaseFailure.executeWithTryCatch(() -> { | ||
log.info("Running RfsMigrateDocuments with workerId = " + workerId); | ||
log.info("Running RfsMigrateDocuments"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we can't still log the workerId, even if it's the nodeInstanceName?
// @Test | ||
// public void testDocumentMigrationForBigMonolithicShardWorks() throws Exception { | ||
// testDocumentMigration(1, | ||
// SearchClusterContainer.OS_V2_14_0.getImageName(), | ||
// SearchClusterContainer.OS_V2_14_0, | ||
// GENERATOR_BASE_IMAGE, | ||
// new String[]{"tail", "-f", "/dev/null"}); | ||
// } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here/commented out?
final var workerId = UUID.randomUUID().toString(); | ||
System.err.println("Starting Traffic Replayer with id=" + workerId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing UUID workerIds here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should use the ProcessHelpers.getNodeInstanceName()
to get the id
@@ -18,7 +20,15 @@ public class RootWorkCoordinationContext extends RootOtelContext { | |||
public final WorkCoordinationContexts.AcquireNextWorkItemContext.MetricInstruments acquireNextWorkMetrics; | |||
|
|||
public RootWorkCoordinationContext(OpenTelemetry sdk, IContextTracker contextTracker) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove this overload to prevent null from leaking out
final var workerId = UUID.randomUUID().toString(); | ||
System.err.println("Starting Traffic Replayer with id=" + workerId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should use the ProcessHelpers.getNodeInstanceName()
to get the id
) | ||
.setMeterProvider( | ||
SdkMeterProvider.builder().setResource(serviceResource).registerMetricReader(metricReader).build() | ||
SdkMeterProvider.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 The new format is much easier to read.
Give me newlines + nesting that make it easy to see what an object is composed of.
@@ -35,13 +35,9 @@ public class RootOtelContext implements IRootOtelContext { | |||
|
|||
public static OpenTelemetry initializeOpenTelemetryForCollector( | |||
@NonNull String collectorEndpoint, | |||
@NonNull String serviceName | |||
@NonNull String serviceName, | |||
String nodeName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why isn't this nonnull too?
public static String getNodeInstanceName() { | ||
String id = null; | ||
try { | ||
id = System.getenv("ECS_TASK_ID"); // for ECS deployments | ||
if (id != null) { | ||
return id; | ||
} | ||
id = System.getenv("HOSTNAME"); // for any kubernetes deployed pod | ||
if (id != null) { | ||
return id; | ||
} | ||
// add additional fallbacks here | ||
id = DEFAULT_NODE_ID; | ||
return id; | ||
} finally { | ||
if (id != null) { | ||
String finalId = id; | ||
log.atInfo().setMessage(() -> "getNodeInstanceName()=" + finalId).log(); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
public static String getNodeInstanceName() { | |
String id = null; | |
try { | |
id = System.getenv("ECS_TASK_ID"); // for ECS deployments | |
if (id != null) { | |
return id; | |
} | |
id = System.getenv("HOSTNAME"); // for any kubernetes deployed pod | |
if (id != null) { | |
return id; | |
} | |
// add additional fallbacks here | |
id = DEFAULT_NODE_ID; | |
return id; | |
} finally { | |
if (id != null) { | |
String finalId = id; | |
log.atInfo().setMessage(() -> "getNodeInstanceName()=" + finalId).log(); | |
} | |
} | |
} | |
public static String getNodeInstanceName() { | |
var nodeId = Optional.of("ECS_TASK_ID").map(System::getenv) | |
.or(() -> Optional.of("HOSTNAME").map(System::getenv)) | |
.orElseGet(DEFAULT_NODE_ID); | |
log.atInfo().setMessage(() -> "getNodeInstanceName()=" + nodeId).log(); | |
return nodeId; | |
} |
|
Signed-off-by: Greg Schohn <[email protected]> # Conflicts: # DocumentsFromSnapshotMigration/src/main/java/com/rfs/RfsMigrateDocuments.java # DocumentsFromSnapshotMigration/src/test/java/com/rfs/ParallelDocumentMigrationsTest.java # MetadataMigration/src/main/java/com/rfs/MetadataMigration.java
Signed-off-by: Greg Schohn <[email protected]>
9cca6d1
to
8847c03
Compare
DocumentsFromSnapshotMigration/src/main/java/com/rfs/RfsMigrateDocuments.java
Show resolved
Hide resolved
Also removed the empty MetadataMigration file that was a result of the last merge (the file was moved) Signed-off-by: Greg Schohn <[email protected]>
3655888
to
fd9b306
Compare
The value is either an ECS taskId, a Kubernetes hostname, or random UUID (as per env variables). Adding something more dynamic like shardId isn't possible to do as a resource as those need to be defined at the time that the SDK is initialized.
Put that worker/node id into all of the spans as a resource and continue to use it to thread the MDC environment for the work coordinator.
Description
Issues Resolved
https://opensearch.atlassian.net/browse/MIGRATIONS-1940
Testing
gradle tests
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.