Skip to content

Conversation

@aaronc
Copy link
Member

@aaronc aaronc commented Oct 29, 2025

Description

This PR:

  • adds out of the box global OpenTelemetry declarative configuration support to the telemetry/ package via https://pkg.go.dev/go.opentelemetry.io/contrib/otelconf/v0.3.0
  • deprecates all the existing methods in telemetry/ which are based on https://pkg.go.dev/github.com/hashicorp/go-metrics which is under-maintained and requires mutex locks and map lookups on every telemetry method
  • provides default routing of go-metrics telemetry data to OpenTelemetry when OpenTelemetry is enabled
  • instruments BaseApp with OpenTelemetry spans and block and tx counter metrics
  • integrates OpenTelemetry shutdown into server and provides a TestingMain function for tests to export telemetry data
  • configures the log/slog default logger to send logs to OpenTelemetry and allows otelslog bridged to be used for logging. NOTE: this leaves a bit of a disconnect between the SDK's existing logging infrastructure which currently just writes to stdout - this can either be default with in this PR or dealt with in a follow up

The OpenTelemetry go libraries are very actively maintained, most vendors in the space are adding OpenTelemetry support and generally it seems like the industry is headed in this direction. Much of our existing telemetry code is to configure basic telemetry exporting, but with otelconf declarative config, we don't need to maintain any of this ourselves and the out of the box experience is quite simple even for usage in testing.

Closes: SDK-427

@github-actions github-actions bot removed the C:log label Oct 31, 2025
@technicallyty
Copy link
Contributor

How do i test this?

see telemetry/readme.md

@technicallyty technicallyty marked this pull request as ready for review November 26, 2025 00:27
@technicallyty
Copy link
Contributor

reviewers:

telemetry is automagically instantiated via side-effect import init from this package file: https://github.com/cosmos/cosmos-sdk/blob/cfaffe2b286442a262e506845cdf2d05fb6ed2dc/telemetry/config.go

to setup tracers/meters you then do a var/init pattern:

var (
	tracer   = otel.Tracer("cosmos-sdk/baseapp")
	meter    = otel.Meter("cosmos-sdk/baseapp")
	blockCnt metric.Int64Counter
	txCnt    metric.Int64Counter
)

func init() {
	var err error
	blockCnt, err = meter.Int64Counter("block.count")
	if err != nil {
		panic(err)
	}
	txCnt, err = meter.Int64Counter("tx.count")
	if err != nil {
		panic(err)
	}
}

please share opinions on this pattern. the other method we discussed was having some sort of telemetry struct which houses our meters/tracers which gets passed around. but seems more non-standard based on otel docs.

)

const (
OtelConfigEnvVar = "OTEL_EXPERIMENTAL_CONFIG_FILE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked previously about how using init avoided atomic reference checks via the delegation field in otel, but I do agree that it would be nice to keep things more idiomatic.

Is there no way to parse the node dir passed via app config or cli and assume that the otel config is in the node's config directory?

Would that require us storing all of the meter variables internal to the app structs?

Copy link
Contributor

@technicallyty technicallyty Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would just require us to initialize telemetry somewhere after the clientContext has been built with all the flags, envs, tomls, etc.

perhaps if we go this way, we can just chuck a default otel config in the home as part of appd init?

Copy link
Member Author

@aaronc aaronc Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how any code inside of init can safely be aware of anything besides environment variables. We could technically have it parse os.Args, but that would be super brittle IMHO.

Waiting to intialize until after we've parsed flags, toml, etc. would just mean that you always pay the atomic.Value cost even with an otherwise noop config. Not the end of the world and maybe it will never show up in any benchmarks given everything else going on, but there is some non-zero cost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i just debugged this, if i initialize this in the start command, it ends up as an siCounter meter, which has the delegate atomic load.

if we just do init, we get the non-delegate loading version. but yeah, not sure how impactful this will be

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: after finding that the delegate meter is only a few ns (2-5) we aligned on a late-initialization of the otel SDK in favor of improved devx. The otel SDK now reads from ~/.node_home/config/otel.yaml as part of appd start. An empty otel.yaml file is written to ~/.node_home/config during appd init.

@linear
Copy link

linear bot commented Dec 2, 2025

@github-actions github-actions bot added the C:x/genutil genutil module issues label Dec 2, 2025
_ "cosmossdk.io/api/cosmos/app/v1alpha1"
fmt "fmt"
io "io"
reflect "reflect"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
_ "cosmossdk.io/api/cosmos/app/v1alpha1"
fmt "fmt"
io "io"
reflect "reflect"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
import (
fmt "fmt"
io "io"
reflect "reflect"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
_ "cosmossdk.io/api/cosmos/app/v1alpha1"
fmt "fmt"
io "io"
reflect "reflect"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
// InitializeOpenTelemetry initializes the OpenTelemetry SDK.
// We assume that the otel configuration file is in `~/.<your_node_home>/config/otel.yaml`.
// An empty otel.yaml is automatically placed in the directory above in the `appd init` command.
func InitializeOpenTelemetry(homeDir string) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we just pass in a filename here instead of a dir. Then the user can use whatever filename they want. That would be better for testing too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a more future proof approach here would be either an options struct, ex:

type Options struct {
  ConfigFilename string
}

func InitializeOpenTelemetry(Options) error {

or vararg Options ex:

type Option interface {
  applyOption(*options)
}

func WithConfigFile(filename string) Option { ...}

func InitializeOpenTelemetry(...Option) error {

otel go universally uses the ...Option pattern FWIW

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C:CLI C:x/genutil genutil module issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants