Skip to content

Conversation

@nabeelalam
Copy link
Contributor

Description:

This PR adds three new Docker source metrics for improved visibility into image enumeration and scanning progress.
It also updates all Docker metrics to include the "Job ID" label alongside "Source Name" for better observability of the statistics.

New Metrics:

  • dockerImagesNumerated: Total number of images discovered for a namespace.
  • dockerLayersEnumerated: Total number of layers discovered per image.
  • dockerHistoryEntriesEnumerated: Total number of image history entries enumerated.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@nabeelalam nabeelalam self-assigned this Nov 11, 2025
@nabeelalam nabeelalam requested a review from a team November 11, 2025 06:08
@nabeelalam nabeelalam requested review from a team as code owners November 11, 2025 06:08
Copy link
Contributor

@mustansir14 mustansir14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Help: "Total number of Docker layers scanned.",
},
[]string{"source_name"})
[]string{"source_name", "job_id"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add job_id for all of these? I know that we use it as a metric label in many other places, but that kind of contravenes best Prometheus practices, so I'd like to be very deliberate about extending the pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rosecodym! Thanks for reviewing this and calling that out.

The original requirement I was given was “add the source’s Job ID to all metrics calls” so I pushed job_id onto the shared label set for all Docker metrics, the reason I believe is so that metrics from different scan jobs can be distinguished instead of being aggregated into the same series.

Though, hmm... thinking about it from a Prometheus perspective I do agree we probably don’t need job_id on every one of these series, especially the histogram.

I can perhaps scope this down so that job_id is only applied to the progress counters/gauges and leave docker_list_images_api_duration_seconds with just the source_name?

I would like to know if that sound reasonable to you, or do you have any other suggestion? Let me know if I have completely misunderstood your comment 😅.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add job_id for all of these?

This was my idea, I wanted to have a way of noticing big changes in these numbers between jobs. But, I'm a Prometheus dummy so if this is a bad idea or there's better ways to do it, I'm into it. I think there's one metric we don't need it for though; I'll comment there.

Buckets: prometheus.DefBuckets,
},
[]string{"source_name"})
[]string{"source_name", "job_id"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one I don't think we need job_id for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah missed this, yes this is what I had in mind as well, wouldn't need this on the histogram

Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove job_id from "docker_list_images_api_duration_seconds", then I think we're good to go.

// Chunks emits data over a channel that is decoded and scanned for secrets.
func (s *Source) Chunks(ctx context.Context, chunksChan chan *sources.Chunk, _ ...sources.ChunkingTarget) error {
ctx = context.WithValues(ctx, "source_type", s.Type(), "source_name", s.name)
jobIDStr := fmt.Sprint(s.jobId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow also TIL I learned about fmt.Sprint; verrrrry handy

@nabeelalam nabeelalam requested a review from a team as a code owner November 24, 2025 09:22
@nabeelalam nabeelalam merged commit 75c18f2 into main Dec 3, 2025
15 of 17 checks passed
@nabeelalam nabeelalam deleted the update/source/add-new-metrics-and-job-id-in-docker branch December 3, 2025 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants