Skip to content

Conversation

@dogancanbakir
Copy link
Member

@dogancanbakir dogancanbakir commented Nov 25, 2025

Proposed changes by @murat-kekij

Closes #2647

  • Adds variable support to regex extractors
  • Adds variable support to json extractors

Test Server

package main

import (
  "encoding/json"
  "fmt"
  "log"
  "net/http"
  "strings"
)

// DomainResponse represents the structure of the response where the domain name is a key
type DomainResponse map[string]interface{}

// Static data for the example
var exampleData = map[string]interface{}{
  "subdomains": []string{"api", "www", "test"},
  "ip":         "192.168.1.1",
  "region":     "us-east",
}

func main() {
  http.HandleFunc("/json-test", jsonHandler)
  http.HandleFunc("/regex-test", regexHandler)

  fmt.Println("Server is running on http://127.0.0.1:5005")
  log.Fatal(http.ListenAndServe(":5005", nil))
}

// jsonHandler returns a JSON object with a dynamic key (domain name based on the request URL)
func jsonHandler(w http.ResponseWriter, r *http.Request) {
  // Get the host from the request
  host := strings.Split(r.Host, ":")[0]

  // Create the response with the dynamic host as the key
  response := DomainResponse{
  	host: exampleData,
  }

  // Set response header to application/json
  w.Header().Set("Content-Type", "application/json")
  w.WriteHeader(http.StatusOK)
  json.NewEncoder(w).Encode(response)
}

func regexHandler(w http.ResponseWriter, r *http.Request) {
  nonce := "abc123"
  scriptSrc := fmt.Sprintf("/static/main.%s.js", nonce)

  // HTML content with the script tag
  htmlContent := fmt.Sprintf(`
  <html>
  <head>
  	<title>Test Page</title>
  </head>
  <body>
  	<h1>Test Page With Dynamic Script Tag</h1>
  	<script src="%s"></script>
  </body>
  </html>
  `, scriptSrc)

  // Write HTML response
  w.Header().Set("Content-Type", "text/html")
  w.WriteHeader(http.StatusOK)
  w.Write([]byte(htmlContent))
}

Test Json Extractor

id: http-variable-json-extractor

info:
name: HTTP Variable JSON Extractor
author: pdteam
severity: info

http:
- method: GET
  path:
    - "{{BaseURL}}/json-test"

  extractors:
    - type: json
      part: body
      name: subdomains
      json:
        - '."{{FQDN}}".subdomains[]'

Test Regex Extractor

id: http-variable-regex-extractor

info:
name: HTTP Variable Regex Extractor
author: pdteam
severity: info

http:
- method: GET
  path:
    - "{{BaseURL}}/regex-test"

  extractors:
    - type: regex
      part: body
      name: mainjs
      regex:
        - '{{script_regex}}'

Command

nuclei -t ./http-variable-regex-extractor.yaml -u http://127.0.0.1:5005 -var "script_regex=/static/main\.[a-zA-Z0-9]+\.js"

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Summary by CodeRabbit

  • New Features

    • Extractors now support dynamic variable resolution, allowing regex and JSON patterns to reference variables from response data during extraction.
    • Patterns containing unresolved variables are evaluated against runtime context for more flexible, context-aware extraction rules.
  • Bug Fixes

    • Improved error handling for extraction patterns with unresolved variables during evaluation.

✏️ Tip: You can customize this high-level summary in your review settings.

@dogancanbakir dogancanbakir self-assigned this Nov 25, 2025
@auto-assign auto-assign bot requested a review from dwisiswant0 November 25, 2025 03:45
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Walkthrough

Added runtime variable resolution to regex and JSON extractors. When patterns contain unresolved variables, compilation is deferred. During extraction, expressions are evaluated against a provided data map to resolve variables before pattern matching.

Changes

Cohort / File(s) Summary
Core extractor compilation logic
pkg/operators/extractors/compile.go
Pre-checks for unresolved variables in regex and JSON patterns; skips compilation and appends nil when variables are detected. Imported expressions package for variable detection.
Core extractor runtime logic
pkg/operators/extractors/extract.go
Extended ExtractRegex and ExtractJSON method signatures to accept a data map parameter. Added dynamic expression evaluation at runtime: if unresolved variables are present, expressions are evaluated against the data map, then recompiled; failures log warnings and continue.
Extractor unit tests
pkg/operators/extractors/extract_test.go
Updated test calls to ExtractRegex and ExtractJSON to pass empty data maps as second argument.
Protocol handler call sites
pkg/protocols/dns/operators.go, pkg/protocols/file/operators.go, pkg/protocols/headless/operators.go, pkg/protocols/http/operators.go, pkg/protocols/network/operators.go, pkg/protocols/offlinehttp/operators.go, pkg/protocols/protocols.go
Updated all ExtractRegex and ExtractJSON invocations to pass the data map as second argument, enabling extractors to access runtime context during pattern evaluation.

Sequence Diagram(s)

sequenceDiagram
    participant Protocol Handler as Protocol Handler
    participant Extractor as Extractor
    participant Compiler as Compiler
    participant Expressions as Expressions
    
    Protocol Handler->>Extractor: ExtractRegex(corpus, data)
    activate Extractor
    
    loop For each regex pattern with index i
        Extractor->>Expressions: ContainsUnresolvedVariables(regex[i])
        activate Expressions
        alt Variables found
            Expressions-->>Extractor: true
            Extractor->>Expressions: Evaluate(regex[i], data)
            Expressions-->>Extractor: resolved pattern
            Extractor->>Compiler: Compile(resolved pattern)
            alt Compilation success
                Compiler-->>Extractor: compiled regex
            else Compilation fails
                Extractor->>Extractor: Log warning, continue
            end
        else No variables
            Expressions-->>Extractor: false
            Extractor->>Extractor: Use precompiled regex
        end
        deactivate Expressions
    end
    
    Extractor-->>Protocol Handler: matches map
    deactivate Extractor
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Key areas requiring attention:
    • Expression evaluation logic in pkg/operators/extractors/extract.go for both regex and JSON paths; verify error handling and warning logs are appropriate
    • Verify all seven protocol handler call sites correctly pass the data map without inadvertently creating scoping issues
    • Test coverage for runtime variable resolution edge cases (e.g., failed expression evaluation, missing variables in data map)
    • Ensure pre-check logic in compile.go correctly identifies unresolved variables and doesn't accidentally skip valid compilation

Poem

🐰 Variables dance through regex and JSON bright,
Runtime resolution brings patterns to light,
Data flows softly through handlers we know,
Expressions bloom where dynamic needs grow! 🌱

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'add variable support to extractors' clearly and concisely describes the main change in the pull request—adding variable interpolation support to regex and JSON extractors throughout the codebase.
Linked Issues check ✅ Passed The pull request implements variable support for extractors by accepting a data map parameter in ExtractRegex and ExtractJSON methods, resolving unresolved variables at extraction time against provided data, and enables the exact use cases requested in issue #2647.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing variable support in extractors as specified in issue #2647; no out-of-scope modifications are present.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 2647_add_extractor_var_support

Comment @coderabbitai help to get the list of available commands and usage tips.

@dogancanbakir
Copy link
Member Author

This is based on the work @@murat-kekij did. Let's make sure it gets merged into the main, previously approved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
pkg/operators/extractors/compile.go (1)

12-13: Unresolved-variable handling looks correct; verify runtime nil-path + consider caching of resolved expressions

The new ContainsUnresolvedVariables pre-checks for regex and JSON queries make sense: skipping compile and storing nil in the compiled slices is a clean way to defer work until variables are resolved at runtime while preserving index alignment between e.Regex / e.JSON and their compiled counterparts.

Two follow-ups are worth double‑checking:

  • Ensure the runtime paths in ExtractRegex / ExtractJSON explicitly handle nil entries in e.regexCompiled / e.jsonCompiled (e.g., resolve variables against data, compile the resolved string once, and avoid any nil dereferences or silent skips).
  • From a performance perspective, for expressions whose resolved value is stable across items (like CLI/template vars), consider caching the compiled, post‑substitution pattern keyed by the resolved string so you don’t regress into compiling on every extraction. Based on learnings, keeping compile-time caching effective is important for production performance.

Also applies to: 25-28, 46-49

pkg/operators/extractors/extract_test.go (1)

9-19: Tests updated for new signatures; consider adding coverage for variable-aware extraction

Updating ExtractRegex / ExtractJSON calls to pass an empty map[string]interface{}{} matches the new method signatures and keeps existing assertions valid.

To lock in the new feature, it would be useful to add at least one test each for:

  • Regex extractor where the pattern includes a template variable resolved from data.
  • JSON extractor where the jq path includes a template variable (e.g., dynamic top-level key as in the examples from the PR/issue).

These can live alongside the existing tests and will help prevent regressions in the variable-interpolation logic.

Also applies to: 68-78

pkg/operators/extractors/extract.go (1)

19-61: Consider enhancing error messages with more context.

The warning messages on lines 27, 32 could benefit from additional context to help with debugging, such as the extractor name or position in the extraction chain.

For example:

-			gologger.Warning().Msgf("Could not evaluate expression: %s, error: %s", e.Regex[i], err.Error())
+			gologger.Warning().Msgf("Could not evaluate regex expression (index %d): %s, error: %s", i, e.Regex[i], err.Error())

Similarly for line 32 and the corresponding lines in ExtractJSON (171, 176, 181).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2997735 and 80321bc.

📒 Files selected for processing (10)
  • pkg/operators/extractors/compile.go (3 hunks)
  • pkg/operators/extractors/extract.go (3 hunks)
  • pkg/operators/extractors/extract_test.go (2 hunks)
  • pkg/protocols/dns/operators.go (1 hunks)
  • pkg/protocols/file/operators.go (1 hunks)
  • pkg/protocols/headless/operators.go (1 hunks)
  • pkg/protocols/http/operators.go (1 hunks)
  • pkg/protocols/network/operators.go (1 hunks)
  • pkg/protocols/offlinehttp/operators.go (1 hunks)
  • pkg/protocols/protocols.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-16T21:27:14.937Z
Learnt from: hdm
Repo: projectdiscovery/nuclei PR: 6322
File: pkg/templates/compile.go:79-81
Timestamp: 2025-07-16T21:27:14.937Z
Learning: To make the template caching mechanism in pkg/templates/compile.go production-ready, DSLs need to be updated to use runtime options instead of cached variables, rather than restoring the Compile() calls on each request.

Applied to files:

  • pkg/operators/extractors/compile.go
📚 Learning: 2025-07-16T21:27:14.937Z
Learnt from: hdm
Repo: projectdiscovery/nuclei PR: 6322
File: pkg/templates/compile.go:79-81
Timestamp: 2025-07-16T21:27:14.937Z
Learning: In pkg/templates/compile.go, the template caching mechanism intentionally skips calling Compile() on copied requests to achieve performance benefits. This is the intended design, not a bug. The current implementation isn't production-ready but represents the desired direction.

Applied to files:

  • pkg/operators/extractors/compile.go
🧬 Code graph analysis (1)
pkg/protocols/protocols.go (1)
pkg/operators/extractors/extractor_types.go (2)
  • KValExtractor (19-19)
  • JSONExtractor (23-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Lint
🔇 Additional comments (7)
pkg/protocols/file/operators.go (1)

40-53: Data context forwarding into regex/JSON extractors is correct and consistent

Passing data into ExtractRegex and ExtractJSON here aligns with the new extractor API and with other protocol operators; it should safely enable variable-aware extraction without changing existing matching semantics.

pkg/protocols/network/operators.go (1)

47-55: Regex extractor now correctly receives the full data map

Wiring extractor.ExtractRegex(itemStr, data) here is consistent with the updated signature and other protocol implementations, and preserves the existing control flow around getMatchPart / SupportsMap.

pkg/protocols/offlinehttp/operators.go (1)

68-75: Regex extractor wiring with data map is consistent with HTTP path

Updating extractor.ExtractRegex(item, data) here matches the new extractor signature and the HTTP operator behavior, giving regex extractors access to the same context used by matchers/DSL without changing existing control flow.

pkg/protocols/headless/operators.go (1)

72-83: Headless extractors now correctly receive full context for regex/JSON

Passing data into both ExtractRegex and ExtractJSON brings the headless path in line with other protocols and should enable variable interpolation inside extractors without altering existing selection logic.

pkg/protocols/http/operators.go (1)

69-80: HTTP extractors receive data map as intended; central path looks good

Using extractor.ExtractRegex(item, data) and extractor.ExtractJSON(item, data) here is the right way to expose the full HTTP data context (including variables) to extractors while preserving existing behavior for non-variable patterns.

pkg/protocols/dns/operators.go (1)

58-65: DNS regex extractor now gets both corpus and data context

Passing types.ToString(item) plus data into ExtractRegex cleanly adapts the DNS path to the new extractor API and should enable variable-driven regex extraction without changing current matching behavior.

pkg/protocols/protocols.go (1)

389-401: LGTM! Extractor API calls updated correctly.

The changes properly pass the data map to both ExtractRegex and ExtractJSON methods, enabling variable resolution during extraction. The implementation is consistent with the updated extractor API signatures.

groupPlusOne := e.RegexGroup + 1
for _, regex := range e.regexCompiled {
for i, regex := range e.regexCompiled {
if varErr := expressions.ContainsUnresolvedVariables(e.Regex[i]); varErr != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about what might happen if legitimate extraction patterns might collide with variable names.
There is some failing test to review.

Non mandatory note: I think that post-scan json result elaboration (filtering, data enrichment and so on should be out of scope of the tool). What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

extractors add variable support

3 participants