-
Notifications
You must be signed in to change notification settings - Fork 3k
add variable support to extractors #6635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
WalkthroughAdded runtime variable resolution to regex and JSON extractors. When patterns contain unresolved variables, compilation is deferred. During extraction, expressions are evaluated against a provided data map to resolve variables before pattern matching. Changes
Sequence Diagram(s)sequenceDiagram
participant Protocol Handler as Protocol Handler
participant Extractor as Extractor
participant Compiler as Compiler
participant Expressions as Expressions
Protocol Handler->>Extractor: ExtractRegex(corpus, data)
activate Extractor
loop For each regex pattern with index i
Extractor->>Expressions: ContainsUnresolvedVariables(regex[i])
activate Expressions
alt Variables found
Expressions-->>Extractor: true
Extractor->>Expressions: Evaluate(regex[i], data)
Expressions-->>Extractor: resolved pattern
Extractor->>Compiler: Compile(resolved pattern)
alt Compilation success
Compiler-->>Extractor: compiled regex
else Compilation fails
Extractor->>Extractor: Log warning, continue
end
else No variables
Expressions-->>Extractor: false
Extractor->>Extractor: Use precompiled regex
end
deactivate Expressions
end
Extractor-->>Protocol Handler: matches map
deactivate Extractor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
This is based on the work @@murat-kekij did. Let's make sure it gets merged into the main, previously approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
pkg/operators/extractors/compile.go (1)
12-13: Unresolved-variable handling looks correct; verify runtime nil-path + consider caching of resolved expressionsThe new
ContainsUnresolvedVariablespre-checks for regex and JSON queries make sense: skipping compile and storingnilin the compiled slices is a clean way to defer work until variables are resolved at runtime while preserving index alignment betweene.Regex/e.JSONand their compiled counterparts.Two follow-ups are worth double‑checking:
- Ensure the runtime paths in
ExtractRegex/ExtractJSONexplicitly handlenilentries ine.regexCompiled/e.jsonCompiled(e.g., resolve variables againstdata, compile the resolved string once, and avoid any nil dereferences or silent skips).- From a performance perspective, for expressions whose resolved value is stable across items (like CLI/template vars), consider caching the compiled, post‑substitution pattern keyed by the resolved string so you don’t regress into compiling on every extraction. Based on learnings, keeping compile-time caching effective is important for production performance.
Also applies to: 25-28, 46-49
pkg/operators/extractors/extract_test.go (1)
9-19: Tests updated for new signatures; consider adding coverage for variable-aware extractionUpdating
ExtractRegex/ExtractJSONcalls to pass an emptymap[string]interface{}{}matches the new method signatures and keeps existing assertions valid.To lock in the new feature, it would be useful to add at least one test each for:
- Regex extractor where the pattern includes a template variable resolved from
data.- JSON extractor where the jq path includes a template variable (e.g., dynamic top-level key as in the examples from the PR/issue).
These can live alongside the existing tests and will help prevent regressions in the variable-interpolation logic.
Also applies to: 68-78
pkg/operators/extractors/extract.go (1)
19-61: Consider enhancing error messages with more context.The warning messages on lines 27, 32 could benefit from additional context to help with debugging, such as the extractor name or position in the extraction chain.
For example:
- gologger.Warning().Msgf("Could not evaluate expression: %s, error: %s", e.Regex[i], err.Error()) + gologger.Warning().Msgf("Could not evaluate regex expression (index %d): %s, error: %s", i, e.Regex[i], err.Error())Similarly for line 32 and the corresponding lines in
ExtractJSON(171, 176, 181).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
pkg/operators/extractors/compile.go(3 hunks)pkg/operators/extractors/extract.go(3 hunks)pkg/operators/extractors/extract_test.go(2 hunks)pkg/protocols/dns/operators.go(1 hunks)pkg/protocols/file/operators.go(1 hunks)pkg/protocols/headless/operators.go(1 hunks)pkg/protocols/http/operators.go(1 hunks)pkg/protocols/network/operators.go(1 hunks)pkg/protocols/offlinehttp/operators.go(1 hunks)pkg/protocols/protocols.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-16T21:27:14.937Z
Learnt from: hdm
Repo: projectdiscovery/nuclei PR: 6322
File: pkg/templates/compile.go:79-81
Timestamp: 2025-07-16T21:27:14.937Z
Learning: To make the template caching mechanism in pkg/templates/compile.go production-ready, DSLs need to be updated to use runtime options instead of cached variables, rather than restoring the Compile() calls on each request.
Applied to files:
pkg/operators/extractors/compile.go
📚 Learning: 2025-07-16T21:27:14.937Z
Learnt from: hdm
Repo: projectdiscovery/nuclei PR: 6322
File: pkg/templates/compile.go:79-81
Timestamp: 2025-07-16T21:27:14.937Z
Learning: In pkg/templates/compile.go, the template caching mechanism intentionally skips calling Compile() on copied requests to achieve performance benefits. This is the intended design, not a bug. The current implementation isn't production-ready but represents the desired direction.
Applied to files:
pkg/operators/extractors/compile.go
🧬 Code graph analysis (1)
pkg/protocols/protocols.go (1)
pkg/operators/extractors/extractor_types.go (2)
KValExtractor(19-19)JSONExtractor(23-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Lint
🔇 Additional comments (7)
pkg/protocols/file/operators.go (1)
40-53: Data context forwarding into regex/JSON extractors is correct and consistentPassing
dataintoExtractRegexandExtractJSONhere aligns with the new extractor API and with other protocol operators; it should safely enable variable-aware extraction without changing existing matching semantics.pkg/protocols/network/operators.go (1)
47-55: Regex extractor now correctly receives the full data mapWiring
extractor.ExtractRegex(itemStr, data)here is consistent with the updated signature and other protocol implementations, and preserves the existing control flow aroundgetMatchPart/SupportsMap.pkg/protocols/offlinehttp/operators.go (1)
68-75: Regex extractor wiring with data map is consistent with HTTP pathUpdating
extractor.ExtractRegex(item, data)here matches the new extractor signature and the HTTP operator behavior, giving regex extractors access to the same context used by matchers/DSL without changing existing control flow.pkg/protocols/headless/operators.go (1)
72-83: Headless extractors now correctly receive full context for regex/JSONPassing
datainto bothExtractRegexandExtractJSONbrings the headless path in line with other protocols and should enable variable interpolation inside extractors without altering existing selection logic.pkg/protocols/http/operators.go (1)
69-80: HTTP extractors receive data map as intended; central path looks goodUsing
extractor.ExtractRegex(item, data)andextractor.ExtractJSON(item, data)here is the right way to expose the full HTTPdatacontext (including variables) to extractors while preserving existing behavior for non-variable patterns.pkg/protocols/dns/operators.go (1)
58-65: DNS regex extractor now gets both corpus and data contextPassing
types.ToString(item)plusdataintoExtractRegexcleanly adapts the DNS path to the new extractor API and should enable variable-driven regex extraction without changing current matching behavior.pkg/protocols/protocols.go (1)
389-401: LGTM! Extractor API calls updated correctly.The changes properly pass the
datamap to bothExtractRegexandExtractJSONmethods, enabling variable resolution during extraction. The implementation is consistent with the updated extractor API signatures.
| groupPlusOne := e.RegexGroup + 1 | ||
| for _, regex := range e.regexCompiled { | ||
| for i, regex := range e.regexCompiled { | ||
| if varErr := expressions.ContainsUnresolvedVariables(e.Regex[i]); varErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about what might happen if legitimate extraction patterns might collide with variable names.
There is some failing test to review.
Non mandatory note: I think that post-scan json result elaboration (filtering, data enrichment and so on should be out of scope of the tool). What do you think?
Proposed changes by @murat-kekij
Closes #2647
Test Server
Test Json Extractor
Test Regex Extractor
Command
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.