-
Notifications
You must be signed in to change notification settings - Fork 933
OTEP: Process Context: Sharing Resource Attributes with External Readers #4719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
OTEP: Process Context: Sharing Resource Attributes with External Readers #4719
Conversation
This OTEP introduces a standard mechanism for OpenTelemetry SDKs to publish process-level resource attributes for access by out-of-process readers such as the OpenTelemetry eBPF Profiler. External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. We propose a mechanism for OpenTelemetry SDKs to publish process-level resource attributes, through a standard format based on Linux anonymous memory mappings. When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. The OTEL eBPF profiler will then, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples taken from a given process. _I'm opening this PR as a draft with the intention of sharing with the Profiling SIG for an extra round of feedback before asking for a wider review._ _This OTEP is based on [Sharing Process-Level Resource Attributes with the OpenTelemetry eBPF Profiler](https://docs.google.com/document/d/1-4jo29vWBZZ0nKKAOG13uAQjRcARwmRc4P313LTbPOE/edit?tab=t.0), big thanks to everyone that provided feedback and helped refine the idea so far._
|
Marking as ready for review! |
|
So this would be a new requirement for eBPF profiler implementations? My issue is the lack of safe support for Erlang/Elixir to do this. While something that could just be accessed as a file or socket wouldn't have that issue. We'd have to pull in a third party, or implement ourselves, library that is a NIF to make these calls and that brings in instability many would rather not have when the goal of our SDK is to not be able to bring down a users program if the SDk crashes -- unless they specifically configure it to do so. |
No, hard requirement should not be the goal: for starters, this is Linux-only (for now), so right off the gate this means it's not going to be available everywhere. Having this discussion is exactly why it was included as one of the open questions in the doc 👍 Our view is that we should go for recommended to implement and recommended to enable by default. In languages/runtimes where it's easy to do so (Go, Rust, Java 22+, possibly Ruby, ...etc?) we should be able to deliver this experience. For others, such as Erlang/Elixir, Java 8-21 (requires a native library, similar to Erlang/Elixir), the goal would be to make it very easy to enable/use for users that want it, but still optional so as to not impact anyone that is not interested. We should probably record the above guidance on the OTEP, if/once we're happy with it 🤔 |
|
cc @open-telemetry/specs-entities-approvers for extra eyes |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
|
||
| External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. This creates several problems: | ||
|
|
||
| - **Missing cross-signal correlation identifiers**: Runtime-generated attributes ([`service.instance.id`](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) being a key example) are often inaccessible to external readers, making it hard to correlate profiles with other telemetry (such as traces and spans!) from the same service instance (especially in runtimes that employ multiple processes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - **Missing cross-signal correlation identifiers**: Runtime-generated attributes ([`service.instance.id`](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) being a key example) are often inaccessible to external readers, making it hard to correlate profiles with other telemetry (such as traces and spans!) from the same service instance (especially in runtimes that employ multiple processes). | |
| - **Missing cross-signal correlation identifiers**: Runtime-generated attributes ([`service.instance.id`](https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id) being a key example) are often inaccessible to external readers, making it hard to correlate various signals with each other). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about keeping the comment about the runtimes with multiple processes? I think that's one good use-case where it's especially hard to map what multiple pids seen from the outside actually are.
| | Field | Type | Description | | ||
| |-------------------|-----------|----------------------------------------------------------------------| | ||
| | `signature` | `char[8]` | Set to `"OTEL_CTX"` when the payload is ready (written last) | | ||
| | `version` | `uint32` | Format version. Currently `2` (`1` was used for development) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Development versions should not matter at this point as this OTEP is the point of introduction. All previous work is just for experimentation.
| | `version` | `uint32` | Format version. Currently `2` (`1` was used for development) | | |
| | `version` | `uint32` | Format version. Currently `1`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting at 2 would make it really easy to distinguish from the earlier experiments that we deployed in a lot of spots already...
Since there's space for uint32 different versions, do you see starting at 2 as a big blocker? (I can still remove the comment explaining what 1 was, I agree it's TMI)
| // Additional key/value pairs as resources https://opentelemetry.io/docs/specs/otel/resource/sdk/ | ||
| // Similar to baggage https://opentelemetry.io/docs/concepts/signals/baggage/ / https://opentelemetry.io/docs/specs/otel/overview/#baggage-signal | ||
| // | ||
| // Providing resources is optional. | ||
| // | ||
| // If a key in this field would match one of the attributes already defined as a first-class field below (e.g. `service.name`), | ||
| // the first-class field must always take priority. | ||
| // Readers MAY choose to fallback to a value in `resources` if its corresponding first-class field is empty, or they CAN ignore it. | ||
| map<string, string> resources = 1; | ||
|
|
||
| // We strongly recommend that the following first-class fields are provided, but they can be empty if needed. | ||
| // In particular for `deployment_environment_name` and `service_version` often need to be configured for a given application | ||
| // and cannot be inferred. For the others, see the semantic conventions documentation for recommended ways of setting them. | ||
|
|
||
| // https://opentelemetry.io/docs/specs/semconv/registry/attributes/deployment/#deployment-environment-name | ||
| string deployment_environment_name = 2; | ||
| // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id | ||
| string service_instance_id = 3; | ||
| // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-name | ||
| string service_name = 4; | ||
| // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As semantic conventions evolve and requirements change, shared attributes should be more generic and differentiate between identifying and descriptive attributes.
For correlation identifying attributes are significant and therefore should be preferred, I think.
| // Additional key/value pairs as resources https://opentelemetry.io/docs/specs/otel/resource/sdk/ | |
| // Similar to baggage https://opentelemetry.io/docs/concepts/signals/baggage/ / https://opentelemetry.io/docs/specs/otel/overview/#baggage-signal | |
| // | |
| // Providing resources is optional. | |
| // | |
| // If a key in this field would match one of the attributes already defined as a first-class field below (e.g. `service.name`), | |
| // the first-class field must always take priority. | |
| // Readers MAY choose to fallback to a value in `resources` if its corresponding first-class field is empty, or they CAN ignore it. | |
| map<string, string> resources = 1; | |
| // We strongly recommend that the following first-class fields are provided, but they can be empty if needed. | |
| // In particular for `deployment_environment_name` and `service_version` often need to be configured for a given application | |
| // and cannot be inferred. For the others, see the semantic conventions documentation for recommended ways of setting them. | |
| // https://opentelemetry.io/docs/specs/semconv/registry/attributes/deployment/#deployment-environment-name | |
| string deployment_environment_name = 2; | |
| // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-instance-id | |
| string service_instance_id = 3; | |
| // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-name | |
| string service_name = 4; | |
| // https://opentelemetry.io/docs/specs/semconv/registry/attributes/service/#service-version | |
| // Unique identifying resource attributes as defined by OpenTelemetry Semantic Convention. | |
| map<string, string> resources = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly worried that without guidance we may end up sometimes missing these key attributes in some implementations.
What do you think of using resources but still specifying as a comment that deployment.environment.name/service.instance.id/service.name are highly recommended?
(Also I'm curious, did you mean to leave fields 5->8 or would you see only a single resources field and that's it?)
|
|
||
| ### Publication Protocol | ||
|
|
||
| Publishing the context should follow these steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As context sharing provides also an opportunity for others, what is the idea for other OS than Linux (or more general OS that don't have a mmap syscall).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For windows, we've experimented at Datadog with using an in-memory file. For macOS it's a bit more nebulous: we can still use mmap, and maybe combine it with mach_vm_region to discover the region?
While this mechanism can be extended to other OS's in the future, our thinking so far was that since the eBPF profiler is Linux-only, the main focus should be on getting Linux support in really amazing shape and then later extend as-needed.
| 8. **Set read-only**: Apply `mprotect(..., PROT_READ)` to mark the mapping as read-only | ||
| 9. **Name mapping** (Linux ≥5.17): Use `prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ..., "OTEL_CTX")` to name the mapping | ||
|
|
||
| The signature MUST be written last to ensure readers never observe incomplete or invalid data. Once the signature is present and the mapping set to read-only, the entire mapping is considered valid and immutable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it simplify the publication protocol to require the writer to set published_at_ns to a time in the future, when writing the data is guaranteed to be finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. In theory a "malicious"/buggy/overloaded scheduler could always schedule out the thread after writing the timestamp and before it finished the rest of the steps...
One really nice property is that the pages are zeroed out by the kernel so it shouldn't be possible to observe anything else other than zeroes or valid data.
Co-authored-by: Florian Lehner <[email protected]>
|
|
||
| ## Motivation | ||
|
|
||
| External readers like OpenTelemetry eBPF Profiler or OpenTelemetry eBPF Instrumentation operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. This creates several problems: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenTelemetry eBPF Instrumentation ... cannot access resource attributes configured within OpenTelemetry SDKs
@MrAlias Is this actually true? My understanding is that we are hoping to add support for many "cooperative" APIs for eBPF to interact with in similar fashion to https://github.com/open-telemetry/opentelemetry-go/blob/main/trace/auto.go. I wonder if a similar approach would work for resource attributes.
| // If a key in this field would match one of the attributes already defined as a first-class field below (e.g. `service.name`), | ||
| // the first-class field must always take priority. | ||
| // Readers MAY choose to fallback to a value in `resources` if its corresponding first-class field is empty, or they CAN ignore it. | ||
| map<string, string> resources = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just regular attributes from OTLP proto?
| // and cannot be inferred. For the others, see the semantic conventions documentation for recommended ways of setting them. | ||
|
|
||
| // https://opentelemetry.io/docs/specs/semconv/registry/attributes/deployment/#deployment-environment-name | ||
| string deployment_environment_name = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear why some attributes need to be hard coded. Why not use semconv with attributes key/values declared above like everything else in OTLP does?
|
|
||
| When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. | ||
|
|
||
| The OTEL eBPF profiler will then, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples taken from a given process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please describe how it would/could/(or won't) work when an application is instrumented with OBI (https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation)?
Changes
External readers like the OpenTelemetry eBPF Profiler operate outside the instrumented process and cannot access resource attributes configured within OpenTelemetry SDKs. We propose a mechanism for OpenTelemetry SDKs to publish process-level resource attributes, through a standard format based on Linux anonymous memory mappings.
When an SDK initializes (or updates its resource attributes) it publishes this information to a small, fixed-size memory region that external processes can discover and read. The OTEL eBPF profiler will then, upon observing a previously-unseen process, probe and read this information, associating it with any profiling samples taken from a given process.
Why open as draft: I'm opening this PR as a draft with the intention of sharing with the Profiling SIG for an extra round of feedback before asking for a wider review.
This OTEP is based on Sharing Process-Level Resource Attributes with the OpenTelemetry eBPF Profiler, big thanks to everyone that provided feedback and helped refine the idea so far.
CHANGELOG.mdfile updated for non-trivial changes