Annotation based telemetry #691

peternied · 2024-05-31T15:10:40Z

Description

This is an alternative approach to adding telemetry by using annotations. I've only validated the metrics side of the system, still need to

Todo to leave draft

Validate the span processor
Refactor test configuration
Refactor WriteMeteringHandler/ReadMeteringHandler or defer
Compare and contrast approach with Rfs instrumentation #678

Issues Resolved

Resolves https://opensearch.atlassian.net/browse/MIGRATIONS-1738

Check List

New functionality includes testing
- All tests pass, including unit test, integration test and doctest
New functionality has been documented
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Peter Nied <[email protected]>

peternied · 2024-05-31T15:11:47Z

RFS/build.gradle

+    implementation platform("io.opentelemetry:opentelemetry-bom:1.34.1")
+
+    implementation('io.opentelemetry:opentelemetry-api')
+    implementation('io.opentelemetry:opentelemetry-sdk')
+    implementation('io.opentelemetry:opentelemetry-sdk-metrics')
+
+    implementation("io.opentelemetry:opentelemetry-exporter-logging")
+    implementation('io.opentelemetry:opentelemetry-sdk-testing')
+    implementation('io.opentelemetry:opentelemetry-sdk-extension-autoconfigure')
+    implementation('io.opentelemetry.instrumentation:opentelemetry-instrumentation-annotations:2.4.0')
+
+    testImplementation testFixtures(project(path: ':coreUtilities'))
+    testImplementation testFixtures(project(path: ':testHelperFixtures'))


TODO: pare this down to only whats needed

RFS/src/main/java/com/rfs/common/DocumentReindexer.java

RFS/src/main/java/com/rfs/common/RestClient.java

RFS/src/test/java/com/rfs/common/RestClientTest.java

RFS/src/main/java/com/rfs/common/DocumentReindexer.java

gregschohn · 2024-05-31T17:19:15Z

RFS/src/main/java/com/rfs/common/DocumentReindexer.java

+    private static final LongCounter documentsProcessedCounter;
+    private static final LongCounter batchesProcessedCounter;
+    static {
+        final var meter = GlobalOpenTelemetry.getMeter("DocumentReindexer");


This requires you to register a global static, which will have some big impacts on tests. Some tests might want to use no instrumentation at all and others might want to use in-memory structures. One test run might nearly blow out memory due to combinatoric explosion and another subsequent test might hold the bag for it. This could also have an impact on development speed for some too.

gregschohn · 2024-05-31T17:32:14Z

RFS/src/main/java/com/rfs/common/OpenSearchClient.java

@@ -289,7 +291,8 @@ public Optional<ObjectNode> updateDocument(String indexName, String documentId,
        }
    }

-    public Mono<BulkResponse> sendBulkRequest(String indexName, String body) {
+    @WithSpan


How do you nest spans with this annotation approach?
This isn't as valuable as it could be without understanding how it ties to the context above it. My guess is that it might rely upon static methods, which might work for now, but could fall apart as more of this codebase becomes asynchronous.

RFS/src/main/java/com/rfs/common/RestClient.java

gregschohn · 2024-05-31T17:43:11Z

RFS/src/test/java/com/rfs/common/RestClientTest.java

+
+    private InMemoryMetricReader metricReader;
+
+    @BeforeEach


This won't work when tests get run in parallel

RFS/src/test/java/com/rfs/common/RestClientTest.java

RFS/src/main/java/com/rfs/common/DocumentReindexer.java

gregschohn · 2024-05-31T18:13:28Z

RFS/build.gradle

@@ -5,6 +5,7 @@ plugins {
    id "com.avast.gradle.docker-compose" version "0.17.4"
    id 'com.bmuschko.docker-remote-api'
    id 'io.freefair.lombok' version '8.6'
+    id 'com.ryandens.javaagent-application' version "0.5.1"


A couple of deficiencies that concern me (and concerned me originally when I looked into alternate otel approaches).

I don't think that there's an easy way to hook all of your spans into the ActiveContextTracker. As it is, 3 lines would do that for RFS. The ActiveContextTracker is a lifesaver into visibility when work items never terminate.

I don't know how to chain/nest spans. Attribute propagation looks like overkill in the coreUtils packages, but is invaluable when a top-level span doesn't finish.

How would span linking work when parents are no longer in the backtrace of a given method call?

It seems like there are about 20 lines of code added to each file that emits a metric. The coreUtils solution takes that down to about 3 (import, context param, pass context param as is or by creating a new child). If a developer spends most of their day perfecting the class that they're working in, why require them to also pollute the file with lots of instrumentation details. 20 lines of code doesn't take into account projecting a metric with different sets of attributes so that we can do analysis over more scenarios (like we do with status codes for the replayer).

The clarity of the overall instrumentation story seems much better for the coreUtils packages. With this code, I don't know how spans are related and it's hard to know where, if at all, a metric is emitted. That sounds like it should be obvious, but we could easily add the # of bytes across the wire in 3 places and the contributors for each one may have no idea that other codepaths did the same (with the same or different metric names).

Within instrumented code, spans and metrics are disjoint, so the implementor needs to know to keep these things in sync/manually make changes.

So, what are the benefits of this approach over direct utilization of the tools provided by org.opensearch.migrations.tracing? As per the README for the coreUtils package... In which ways is this alternative approach better or what was missing from that list?

This library adapts the parts of OpenTelemetry to make it more natural and more foolproof throughout the rest of the TrafficCapture packages. This package introduces the concept of "Contexts" to build manage all tracing and metering instrumentation.

Just as the otel metering and tracing can be efficiently disabled by not configuring them, this library provides some future-proofing by defining interfaces to track attributes, activities, exceptions, etc - but through descriptive interfaces where callers describe which actions they're performing, preventing the code from becoming overly complex.

The goals of the instrumentation package are to

make instrumentation classes easy to use.

make it easy to create new safe and easy to use instrumentation classes.

be efficient enough to use in most cases and flexible enough to tune in cases where the cost is too high.

The third point is still a work in progress as the exact performance penalty isn't understood yet. However, work for point #2 dovetails into #3. As context creations are chained together, a no-op uber-context can be created with zero memory footprint and minimal CPU penalty. The first couple points are accomplished by putting contextual information alongside other data as first class parameters and fields. For example, where a method might require an identifier, a context might be passed instead so that the function can retrieve identifying information via the context AND have the ability to instrument activity within the appropriate context.

peternied · 2024-06-04T14:49:13Z

@gregschohn Thanks for looking into this and providing some thoughts on what has already been done in the Traffic Capture space - I am going to close out this PR. It (mostly) served its purpose in getting me familiar with one of the options and helps clarify the advantages to the time investment in with the manual instrumentation and span definition.

peternied added 3 commits May 31, 2024 02:26

Instrumentation based telemetry

1c4dd9a

Signed-off-by: Peter Nied <[email protected]>

Include metrics on http client

fd596b1

Signed-off-by: Peter Nied <[email protected]>

Validate metrics creation

381a65b

Signed-off-by: Peter Nied <[email protected]>

peternied commented May 31, 2024

View reviewed changes

gregschohn reviewed May 31, 2024

View reviewed changes

peternied closed this Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotation based telemetry #691

Annotation based telemetry #691

peternied commented May 31, 2024 •

edited

Loading

peternied May 31, 2024

gregschohn May 31, 2024

gregschohn May 31, 2024

gregschohn May 31, 2024

gregschohn May 31, 2024

peternied commented Jun 4, 2024

Annotation based telemetry #691

Annotation based telemetry #691

Conversation

peternied commented May 31, 2024 • edited Loading

Description

Todo to leave draft

Issues Resolved

Check List

peternied May 31, 2024

Choose a reason for hiding this comment

gregschohn May 31, 2024

Choose a reason for hiding this comment

gregschohn May 31, 2024

Choose a reason for hiding this comment

gregschohn May 31, 2024

Choose a reason for hiding this comment

gregschohn May 31, 2024

Choose a reason for hiding this comment

peternied commented Jun 4, 2024

peternied commented May 31, 2024 •

edited

Loading