[EXPERIMENTAL] Tiered CI Testing - Improve Reliability & Speed #4865

e-gineer · 2025-11-12T11:24:18Z

🎯 Purpose

This PR introduces experimental tiered CI workflows to address current CI reliability issues. This is a SAFE EXPERIMENT that doesn't affect production CI.

⚠️ Problem Statement

Current CI has critical issues:

50% failure rate (10 failures in last 20 runs)
7-10 minute feedback time on every commit
Heavy infrastructure tests in unit test suite causing random failures
No test categorization - all tests treated equally
Lost productivity: ~2.5 developer-hours/day debugging failures

💡 Solution: Tiered Testing

Tier 1: Quick Checks (< 2 minutes)

Workflow: 10-test-quick-EXPERIMENTAL.yaml

Triggers: Every push to test/** branches
What: Fast unit tests only (go test -short)
Goal: Immediate feedback

Tier 2: Standard Suite (< 10 minutes)

Workflow: 11-test-standard-EXPERIMENTAL.yaml

Triggers: Manual (workflow_dispatch)
What: All unit tests + 4 core acceptance tests
Goal: Comprehensive validation

Tier 3: Full Suite (< 20 minutes)

Workflow: 11-test-full-EXPERIMENTAL.yaml

Triggers: Manual (workflow_dispatch)
What: All 21 acceptance tests
Goal: Complete coverage

🔒 Safety

These workflows are EXPERIMENTAL and SAFE:

✅ Only trigger on test/** branches
✅ Don't interfere with production CI
✅ Can be tested in isolation
✅ Easy to rollback (just don't merge)

📊 Expected Outcomes

Metric	Current	Target	Improvement
Success Rate	50%	95%+	2x better
Time to Feedback	7-10 min	< 2 min	5x faster
Wasted CI Time	100 min/day	< 10 min/day	90% reduction
Dev Hours Lost	2.5/day	0.3/day	8x better

🧪 Testing Plan

This PR tests Quick Checks
- Quick tests will run automatically
- Should complete in < 2 minutes
- We'll push several commits to validate
Manual Standard Test
- Trigger via Actions tab: "Run workflow"
- Should complete in < 10 minutes
- Core tests should catch 90%+ of issues
Manual Full Test
- Trigger via Actions tab: "Run workflow"
- Should complete in < 20 minutes
- All 21 acceptance tests
Metrics Collection
- Run 5-10 iterations
- Track: duration, success rate, which tier caught bugs
- Compare with production CI

📝 Files Changed

New Workflows

.github/workflows/10-test-quick-EXPERIMENTAL.yaml - Quick tests
.github/workflows/11-test-standard-EXPERIMENTAL.yaml - Standard suite
.github/workflows/11-test-full-EXPERIMENTAL.yaml - Full suite

Documentation

See .ai/wip/ci-optimization/ for:

Comprehensive analysis
Implementation plan
Success metrics

🎬 Next Steps

If this experiment succeeds:

Gather feedback from team
Create production branch (remove EXPERIMENTAL)
Modify existing workflows with conditional triggers
Add documentation
Merge to develop

If it doesn't work:

Don't merge this PR
No impact to production
Learn and iterate

🔍 How to Test

Test Quick Workflow (Automatic)

Just push commits to this PR - quick tests run automatically!

Test Standard Workflow (Manual)

Go to Actions tab
Select "11 - Test: Standard Suite (EXPERIMENTAL)"
Click "Run workflow"
Select branch: test/ci-tiered-testing
Click green "Run workflow" button

Test Full Workflow (Manual)

Go to Actions tab
Select "11 - Test: Full Suite (EXPERIMENTAL)"
Click "Run workflow"
Select branch: test/ci-tiered-testing
Click green "Run workflow" button

📚 Background

Original analysis: Issue #XXXX (to be created)
Current CI failure rate: 50% (10/20 recent runs)
Most common failure: Infrastructure tests in unit suite
Recent fix: Removed TestEnsureDBInstalled_Concurrent (Add comprehensive passing tests from bug hunting initiative #4864)

✅ Success Criteria

Quick tests consistently complete in < 2 minutes
Quick tests have 95%+ success rate
Standard tests catch 90%+ of bugs
Full tests maintain 100% coverage
At least 5 successful test runs
Team approves approach

🤝 Feedback Welcome!

This is an experiment to improve our CI reliability and developer experience. Please:

Try running the workflows
Report any issues or suggestions
Share your experience with feedback times

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

Add three experimental workflows to test tiered CI approach: 1. Quick Tests (< 2 min) - Runs on every push to test/** branches - Uses -short flag for fast feedback - Workflow: 10-test-quick-EXPERIMENTAL.yaml 2. Standard Suite (< 10 min) - Manual trigger only - All unit tests + 4 core acceptance tests - Workflow: 11-test-standard-EXPERIMENTAL.yaml 3. Full Suite (< 20 min) - Manual trigger only - All 21 acceptance tests - Workflow: 11-test-full-EXPERIMENTAL.yaml These experimental workflows: - Only trigger on test/** branches - Don't interfere with production CI - Allow us to test the tiered approach safely Goal: Improve CI success rate from 50% to 95%+ and reduce feedback time from 7-10 min to < 2 min. Related to issue: CI reliability improvements See: .ai/wip/ci-optimization/ for detailed analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Fixed triggers to implement proper tiered testing: **Tier 1 - Quick Tests** (< 2 min): - Trigger: Every PR push + push to develop/main - Goal: Fast feedback on every commit **Tier 2 - Standard Suite** (< 10 min): - Trigger: PR ready for review + push to develop/main - Goal: Comprehensive validation before merge **Tier 3 - Full Suite** (< 20 min): - Trigger: Push to develop (post-merge) + nightly + label "test:full" - Goal: Complete coverage This matches the original plan: test actual development code with progressive levels of thoroughness as code moves toward release. Previous version incorrectly limited triggers to test/** branches only. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

e-gineer · 2025-11-12T11:33:22Z

🔧 Fix Applied: Proper Tiered Triggers

Just pushed a fix - the original commit had overly conservative triggers (only test/** branches). Updated to implement the proper tiered strategy:

✅ Corrected Trigger Strategy

Tier 1: Quick Checks - `10-test-quick-EXPERIMENTAL.yaml`

Triggers:

✅ Every pull_request push (all PRs, not just test branches)
✅ Push to develop and main branches
✅ Manual via workflow_dispatch

What it does:

Runs go test -short (< 30s timeout)
Skips expensive tests
Fast feedback: < 2 minutes

Tier 2: Standard Suite - `11-test-standard-EXPERIMENTAL.yaml`

Triggers:

✅ pull_request type ready_for_review (when you mark PR ready)
✅ Push to develop and main branches
✅ Manual via workflow_dispatch

What it does:

All unit tests (no -short flag)
4 core acceptance tests (installation, plugin, connection_config, service)
Comprehensive: < 10 minutes

Tier 3: Full Suite - `11-test-full-EXPERIMENTAL.yaml`

Triggers:

✅ Push to develop (after merge)
✅ Nightly at 2 AM (cron: '0 2 * * *')
✅ pull_request with label test:full
✅ Manual via workflow_dispatch

What it does:

All unit tests
All 21 acceptance tests
Complete coverage: < 20 minutes

🎯 The Progressive Strategy

PR Draft → Quick tests (2 min) → Fast feedback
    ↓
PR Ready → Standard tests (10 min) → Pre-merge validation  
    ↓
Merge → Full tests (20 min) → Post-merge verification
    ↓
Nightly → Full tests → Regression detection

This matches the original goal: test actual development code with increasing thoroughness as code progresses toward release.

The -EXPERIMENTAL suffix is just so these workflows can coexist with the current production workflow while we validate the approach. Once proven, we'll:

Remove -EXPERIMENTAL suffix
Modify the existing 11-test-acceptance.yaml to be less aggressive
Enjoy faster, more reliable CI! 🎉

Nightly runs are redundant when running full tests on every merge to develop. Benefits of removing nightly: - Reduces CI resource usage - Simplifies maintenance - Full tests still run on: - Every merge to develop (frequent) - Manual trigger (when needed) - PR with "test:full" label (on-demand) Nightly only makes sense if: - Merges are infrequent (days/weeks apart) - Tests have time-dependent edge cases - External dependencies need regular validation For active development with frequent merges, running full tests on every develop merge provides sufficient coverage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

e-gineer and others added 2 commits November 12, 2025 19:23

This was referenced Nov 12, 2025

Snapshot row streaming timeout too short (5s) causes flaky acceptance tests #4866

Closed

Fix #4866: Increase snapshot row streaming timeout from 5s to 30s #4867

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[EXPERIMENTAL] Tiered CI Testing - Improve Reliability & Speed #4865

[EXPERIMENTAL] Tiered CI Testing - Improve Reliability & Speed #4865

Uh oh!

e-gineer commented Nov 12, 2025

Uh oh!

e-gineer commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[EXPERIMENTAL] Tiered CI Testing - Improve Reliability & Speed #4865

Are you sure you want to change the base?

[EXPERIMENTAL] Tiered CI Testing - Improve Reliability & Speed #4865

Uh oh!

Conversation

e-gineer commented Nov 12, 2025

🎯 Purpose

⚠️ Problem Statement

💡 Solution: Tiered Testing

Tier 1: Quick Checks (< 2 minutes)

Tier 2: Standard Suite (< 10 minutes)

Tier 3: Full Suite (< 20 minutes)

🔒 Safety

📊 Expected Outcomes

🧪 Testing Plan

📝 Files Changed

New Workflows

Documentation

🎬 Next Steps

🔍 How to Test

Test Quick Workflow (Automatic)

Test Standard Workflow (Manual)

Test Full Workflow (Manual)

📚 Background

✅ Success Criteria

🤝 Feedback Welcome!

Uh oh!

e-gineer commented Nov 12, 2025

🔧 Fix Applied: Proper Tiered Triggers

✅ Corrected Trigger Strategy

Tier 1: Quick Checks - 10-test-quick-EXPERIMENTAL.yaml

Tier 2: Standard Suite - 11-test-standard-EXPERIMENTAL.yaml

Tier 3: Full Suite - 11-test-full-EXPERIMENTAL.yaml

🎯 The Progressive Strategy

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tier 1: Quick Checks - `10-test-quick-EXPERIMENTAL.yaml`

Tier 2: Standard Suite - `11-test-standard-EXPERIMENTAL.yaml`

Tier 3: Full Suite - `11-test-full-EXPERIMENTAL.yaml`