A comprehensive framework for reverse engineering closed-source codebases and implementing continuous architecture governance. This toolkit enables teams to:
- Map architecture (code + processes) for proprietary repositories
- Quantify risk via churn, complexity, coverage gaps, criticality, and security analysis
- Detect architecture drift through dependency graph analysis
- Identify knowledge concentration risks across teams
- Generate SBOMs and integrate security findings
- Produce consolidated risk registers for governance
- Feed metrics to Backstage and CI/CD governance loops
- Enable safe AI summarization of codebases
- Provide remediation backlogs with weighted risk formulas
- Support compliance & IP awareness
# Install required tools
pip install radon pipdeptree cyclonedx-bom pyyaml
npm install -g @cyclonedx/cyclonedx-npm# 1. Generate SBOM
./scripts/gen_sbom.sh --out artifacts/sbom --ref "$(git rev-parse HEAD)"
# 2. Analyze code complexity and churn
radon cc -j -s src/ > artifacts/complexity.json
git log --since=90.days --name-only --pretty=format: | sort | grep -v '^$' | uniq -c > artifacts/churn.txt
# 3. Merge into hotspot analysis
python scripts/hotspot_merge.py \
--churn artifacts/churn.txt \
--complexity artifacts/complexity.json \
--out artifacts/hotspots.json| Script | Purpose |
|---|---|
gen_sbom.sh |
Generate CycloneDX/SPDX SBOMs across ecosystems |
scan_drift.py |
Compare dependency graphs to detect architecture drift |
hotspot_merge.py |
Merge churn + complexity + coverage into risk scores |
ownership_diff.py |
Detect knowledge concentration per directory |
risk_update.py |
Aggregate analyses into consolidated risk register |
parse_trivy.py |
Normalize Trivy security findings |
parse_semgrep.py |
Normalize Semgrep security findings |
- Executive summary template
- Remediation backlog scaffold (YAML)
- AI-safe summarization prompt templates
React components for visualizing:
- Risk overview widgets
- Hotspot tables
- Knowledge concentration gauges
- Dependency graph visualization
Node/Express API serving:
- Service metrics
- Hotspot data
- Dependency graphs
- Cycle detection
- Time-series data
Risk is calculated using multiple dimensions:
- Churn (40%): Code volatility over 90 days
- Complexity (40%): Cyclomatic complexity metrics
- Coverage Gap (10%): Missing test coverage
- Criticality (10%): Business impact weighting
Formula:
risk = (normalized_churn * 0.4) + (normalized_complexity * 0.4) +
(coverage_penalty * 0.1) + (criticality_factor * 0.1)
Tracks changes in dependency graphs:
- Added/removed nodes (modules, services)
- Added/removed edges (dependencies)
- Edge churn ratio
- Core boundary violations
Identifies single points of failure:
- Top contributor percentage per module
- Single contributor warnings
- Criticality-weighted risk
Focus on changed files in PRs:
scripts/diff_changed_files.sh origin/main HEAD > artifacts/changed_files.txt
python scripts/diff_risk.py --hotspots artifacts/hotspots.json \
--changed artifacts/changed_files.txt --out artifacts/diff_hotspots.jsonTrack metrics over time:
python scripts/record_timeseries.py \
--services config/service_paths.yaml \
--out-dir artifacts/timeseriesEnsure architectural decisions are documented:
python scripts/adr_enforce_boundary.py \
--drift artifacts/drift_report.json \
--adr-index docs/adr/0000-record-architecture-decisions.md \
--config config/adr_enforcement.yamlFind circular dependencies:
curl http://localhost:8085/api/graph-cycles | jq '.'See .github/workflows/arch-governance.yml for complete pipeline example.
Key steps:
- Build & test (generate coverage)
- Static analysis (complexity, churn)
- SBOM generation
- Security scanning
- Hotspot analysis
- Drift detection
- Risk aggregation
- Artifact publishing
docs/summary_compiled.md- Complete system overviewdocs/diff-aware-risk.md- Focusing on PR changesdocs/time-series-metrics.md- Historical trackingdocs/adr-enforcement.md- Decision record enforcementdocs/graph-cycles.md- Circular dependency detectiondocs/security-ingestion.md- Security tool integration
services:
payments-service:
paths:
- "src/payments/"
- "src/core/payment"weights:
churn: 0.30
complexity: 0.35
coverage_gap: 0.15
criticality: 0.10
security_hotspot: 0.10- No raw source code in SBOM outputs
- Classify artifacts as Internal/Confidential
- Sanitize AI prompts (no secrets, structural metadata only)
- Redact sensitive data before distribution
Internal use - see organization policies.
This toolkit was created through an AI-assisted design process to provide comprehensive architecture governance capabilities. Contributions should follow established patterns and maintain security/privacy standards.