-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Hey @cfc4n I'm experiencing a fundamental conflict when trying to capture HTTPS traffic from short-lived processes in a multi-container Kubernetes environment. The recommendations from issue #862 (use --pid=0) and #863 (use --pid=SPECIFIC_PID with container paths) are mutually exclusive.
Background
Following the guidance from:
Issue #862: Use --pid=0 to capture short-lived processes that spawn and exit quickly
Issue #863: Use --pid=SPECIFIC_PID with /proc/PID/root/... paths for multi-container environments
However, these approaches conflict in Kubernetes environments where:
Processes are short-lived (<1 second lifespan, e.g., curl commands)
Multiple containers run on the same node with different filesystem namespaces
Process detection and eCapture startup take ~800-1000ms
Current Implementation
Based on advice from #863, I'm using per-PID eCapture instances:
// Detection code
func (o *AutoOrchestrator) startCaptureForLibrary(lib *LibraryInfo) error {
// Build command with specific PID
cmd := exec.Command("/ecapture", "tls",
fmt.Sprintf("--libssl=/proc/%d/root/usr/lib/x86_64-linux-gnu/libssl.so.1.1", lib.PID),
fmt.Sprintf("--pid=%d", lib.PID), // Specific PID, not --pid=0
"-m", "text",
"--hex=false",
fmt.Sprintf("--ecaptureq=ws://127.0.0.1:%d/", wsPort))
cmd.Start()
// ... WebSocket connection logic
}Detection loop: Scans /proc every 30 seconds to detect new processes with SSL libraries
What's Happening - The Race Condition
Timeline of Events:
T+0ms: Curl process spawns (PID 275721)
T+50ms: SSL library loaded
T+200ms: HTTPS request made
T+500ms: Curl exits ✅ (request complete)
T+30000ms: Scanner detects PID 275721 in /proc/275721/maps
T+30200ms: eCapture command launched
T+30900ms: eBPF hooks attached
T+31000ms: WebSocket connection established
T+31001ms: ❌ Process is already dead - nothing to captureActual Logs:
{"level":"info","time":"2025-11-25T11:42:53Z","message":"🔧 Starting PER-CONTAINER eCapture for PID=275721"}
{"level":"info","time":"2025-11-25T11:42:53Z","message":"✅ eCapture started for Container PID=275721"}
{"level":"info","time":"2025-11-25T11:42:54Z","message":"✅ WebSocket connected for openssl:...:275721"}
{"level":"debug","time":"2025-11-25T11:42:54Z","message":"📋 Process log: {\"target PID\":275721}"}
{"level":"error","time":"2025-11-25T11:42:55Z","message":"❌ WebSocket read error: EOF"}Result: eCapture successfully attaches to PID 275721, but the process exited 30 seconds ago. The WebSocket immediately receives EOF because there's no process to monitor.
The Fundamental Conflict
Requirement--pid=0--pid=SPECIFIC_PIDCapture short-lived processes✅ Works❌ Fails (process dies before attach)Multi-container support❌ Fails (namespace isolation)✅ WorksCapture ongoing processes✅ Works✅ Works
Test Environment
Kubernetes: 3-node cluster (EKS)
Kernel: 6.8.0-1031-azure (eBPF supported)
eCapture: v1.4.3
Test workload: Debian container running:
while true; do
curl -H "Authorization: Bearer token" https://httpbin.org/get
sleep 10
done
Process lifespan: ~500-800ms per curl execution
Scanner interval: 30 seconds (to avoid overloading /proc)
Attempted Solutions
- ✅ Per-PID unique ports (fixed port collision)
Changed from:
sessionKey := fmt.Sprintf("%s:%s", lib.LibraryType, lib.LibraryPath)
To:
sessionKey := fmt.Sprintf("%s:%s:%d", lib.LibraryType, lib.LibraryPath, lib.PID)
Result: Port collisions eliminated, but short-lived processes still missed. - ❌ Faster scanning (tried 5-second intervals)
Result: High CPU usage, still couldn't catch processes that live <1 second. - ❌ Pre-launching eCapture with --pid=0
Problem: Can't use container-specific paths like /proc/275721/root/usr/lib/libssl.so.1.1 with --pid=0 because different containers need different library paths.
Questions
Is it possible to capture short-lived processes (<1s) in multi-container environments?
Can eCapture use --pid=0 with namespace-aware library paths? For example:
/ecapture tls --libssl=/proc/*/root/usr/lib/libssl.so.1.1 --pid=0
Does eBPF support "pre-hooking"? Can we attach hooks to a library path before any process loads it, so hooks are already in place when processes spawn?
Alternative approach? Should I:
Accept that short-lived processes can't be captured in multi-container setups?
Use --pid=0 per container namespace (how?)?
Use a different capture strategy entirely?