Skip to content

Magpie is a high-performance blocklist aggregator that fetches, validates, and combines domain blocklists from multiple sources. Built in pure Go with minimal dependencies (color output and progress bars), it's optimized for speed and reliability.

Notifications You must be signed in to change notification settings

PigeonSec/magpie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Magpie Logo

Magpie

High-performance blocklist aggregator with smart filtering and DNS validation

Go Version License Minimal Dependencies


Overview

Magpie is a high-performance blocklist aggregator that fetches, validates, and combines domain blocklists from multiple sources. Built in pure Go with a beautiful terminal UI powered by Bubble Tea, it's optimized for speed, reliability, and user experience.

Magpie TUI

Key Features:

  • 🎨 Beautiful TUI colorful terminal interface with real-time progress
  • 🚀 Parallel fetching with 6 DNS resolvers (bypasses Pi-hole)
  • 🎯 Smart filtering auto-blacklists failing URLs after 3 attempts
  • 📊 Stats tracking persistent health monitoring in data/stats.json
  • High performance 100 workers, DNS caching, connection pooling
  • 🔧 Format support hosts files, plain lists, AdBlock, URLs, wildcards
  • 🤖 Cronjob ready silent mode for automated scheduled runs

Installation

# From source
git clone https://github.com/pigeonsec/magpie.git
cd magpie
go build -o magpie ./cmd/magpie

# Quick install
go install github.com/pigeonsec/magpie/cmd/magpie@latest

Quick Start

# Create source file with blocklist URLs
cat > sources.txt << EOF
https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
https://v.firebog.net/hosts/static/w3kbl.txt
EOF

# Aggregate with DNS validation (long form)
./magpie -source sources.txt -output blocklist.txt

# Or use short flags
./magpie -s sources.txt -o blocklist.txt

# Fast mode (no validation)
./magpie -s sources.txt -o blocklist.txt -dns=false

# View statistics
./magpie --stats

CLI Options

Input/Output

Option Short Default Description
-source -s required Source file containing URLs to fetch (one per line)
-output -o aggregated.txt Output file for aggregated domains

Validation

Option Short Default Description
-dns -d true Enable DNS validation (A, AAAA, CNAME)
-http -H false Enable HTTP validation (in addition to DNS)
-workers -w 100 Number of concurrent validation workers
-resolvers -r 1.1.1.1:53,... Comma-separated DNS resolvers (Cloudflare, Google, Quad9)

Performance

Option Short Default Description
-fetch-workers -f 5 Number of concurrent URL fetchers
-cache -c true Enable DNS result caching (5min TTL)

Stats & Filtering

Option Short Default Description
--data-dir - ./data Directory for stats.json and persistent data
--no-tracking - false Disable URL health tracking and auto-filtering

General Options

Option Short Default Description
-quiet -q false Quiet mode - minimal output
--silent - false Silent mode - no output (perfect for cronjobs)
-version -v false Show version information
--stats - false Display stats table and exit
--help -h false Show help message

Performance

Real-world benchmarks on M1 MacBook Pro with 100 Mbit connection using test dataset (5 URLs, 117k domains):

Benchmark Results

Validation Method Workers Time Speed Valid Domains Notes
No validation N/A ~5 sec N/A 117,160 (100%) Fastest, no filtering
DNS only 50 13 min 151/sec 15,584 (13.3%) Recommended
HTTP only 10 ~140 min 14/sec ~45,000 (38.3%) ⚠️ Very slow, impractical
DNS + HTTP 50 + 10 ~32 min Varies ~6,000 (5%) Most aggressive filtering

Performance Analysis

DNS Validation (Recommended):

  • ✅ Fast and efficient (~151 domains/sec with 50 workers)
  • ✅ Filters out 86.7% of invalid domains
  • ✅ Low resource usage
  • 13 minutes for 117k domains

HTTP Validation (Not Recommended):

  • ⚠️ 10x slower than DNS (~14 domains/sec)
  • ⚠️ High resource usage (HTTP/2 connections, TLS handshakes)
  • ⚠️ Many protocol errors from malformed responses (tracking pixels, broken servers)
  • ⏱️ 2+ hours for 117k domains - impractical for large blocklists

DNS + HTTP (Maximum Filtering):

  • First pass with DNS (fast)
  • Second pass with HTTP on DNS-valid domains only
  • Achieves highest filtering rate but takes longer
  • ⏱️ ~32 minutes for 117k domains

Test Configuration

  • Hardware: M1 MacBook Pro, 100 Mbit internet
  • Dataset: 5 URLs, 117,160 unique domains
  • DNS Resolvers: 6 public resolvers (Cloudflare, Google, Quad9)
  • Parallel DNS lookups: A, AAAA, CNAME checked simultaneously (500ms timeout)
  • DNS caching: Enabled (5min TTL)
  • Fetch time: ~5 seconds for all URLs

Why is DNS validation so fast?

  1. Parallel lookups: Checks A, AAAA, and CNAME records simultaneously, not sequentially
  2. Round-robin DNS: Distributes load across 6 DNS resolvers
  3. Early exit: Stops checking once any record type validates
  4. Result caching: 5-minute TTL reduces redundant queries
  5. High concurrency: 50 workers process domains in parallel

Recommendations

Use Case Recommended Settings
Daily aggregation -dns (default) with 50+ workers
Maximum speed -dns=false (no validation, 5 sec)
Maximum filtering -dns -http with 50+ workers (~30-60 min)
CI/CD pipelines -dns with -workers 100 for faster builds
Resource-constrained -dns with -workers 20-30

⚡ Pro tip: DNS validation provides 86.7% filtering in just 13 minutes. HTTP validation is rarely worth the 10x slowdown.

Supported Formats

Magpie automatically parses various blocklist formats:

# Plain domains
example.com

# Hosts file (IPv4/IPv6)
0.0.0.0 ads.example.com
127.0.0.1 tracker.com
::1 blocked.net

# AdBlock/uBlock
||domain.com^
||ads.example.com^$third-party

# URLs
https://example.com/path

# Wildcards
*.ads.example.com

# Comments (ignored)
# This is a comment
! AdBlock comment
; Hosts comment

Smart URL Filtering

Magpie automatically tracks URL health and filters broken sources:

# View statistics
./magpie --show-stats

How it works:

  • Every fetch is tracked in data/stats.json
  • URLs failing 3+ times are automatically blacklisted
  • Blacklisted URLs are skipped on future runs
  • Auto-recovery when URLs come back online

Stats include:

  • Success/failure counts
  • Last fetch time
  • Total domains retrieved
  • Error messages
  • Blacklist status

DNS Validation

Magpie uses 6 public DNS resolvers in round-robin to bypass Pi-hole and ensure accurate validation:

Resolvers:

  • Cloudflare: 1.1.1.1:53, 1.0.0.1:53
  • Google: 8.8.8.8:53, 8.8.4.4:53
  • Quad9: 9.9.9.9:53, 149.112.112.112:53

Validation logic:

  1. Check A record (IPv4) - most common, checked first
  2. If no A → check AAAA record (IPv6)
  3. If no AAAA → check CNAME record
  4. Cache result for 5 minutes

Examples

Basic Aggregation

# Long form
./magpie -source sources.txt -output blocklist.txt

# Short form
./magpie -s sources.txt -o blocklist.txt

Maximum Performance

# Using short flags for brevity
./magpie -s sources.txt -o blocklist.txt -w 150 -f 10

No Validation (Fastest)

./magpie -s sources.txt -o blocklist.txt -dns=false

Full HTTP Validation

./magpie -s sources.txt -o blocklist.txt -H -w 50

Custom DNS Resolvers

./magpie -s sources.txt -o blocklist.txt -r "1.1.1.1:53,8.8.8.8:53"

Quiet Mode (for Scripts)

./magpie -s sources.txt -o blocklist.txt -q

Silent Mode (for Cronjobs)

# Perfect for automated runs - zero output
./magpie -s sources.txt -o blocklist.txt --silent

Automated Cronjob Setup

Magpie is perfect for running on a schedule to keep your blocklists up to date. The --silent flag ensures zero output, making it ideal for cronjobs.

Daily Blocklist Updates

# Add to crontab (crontab -e)
# Run every day at 3 AM
0 3 * * * /usr/local/bin/magpie -s /path/to/sources.txt -o /path/to/blocklist.txt --silent

# Run every 12 hours
0 */12 * * * /usr/local/bin/magpie -s /path/to/sources.txt -o /path/to/blocklist.txt --silent

# Run weekly on Sunday at 2 AM
0 2 * * 0 /usr/local/bin/magpie -s /path/to/sources.txt -o /path/to/blocklist.txt --silent

Publishing to GitHub

Automatically commit and push updated blocklists to a GitHub repository:

#!/bin/bash
# save as: update-blocklists.sh

REPO_DIR="/path/to/your/blocklist-repo"
SOURCES="/path/to/sources.txt"

cd "$REPO_DIR" || exit 1

# Run Magpie in silent mode
/usr/local/bin/magpie -s "$SOURCES" -o blocklist.txt --silent

# Check if there are changes
if [[ -n $(git status --porcelain) ]]; then
  git add blocklist.txt
  git commit -m "🤖 Auto-update blocklist $(date +%Y-%m-%d)"
  git push origin main
fi

Cronjob:

# Run daily at 3 AM and push to GitHub
0 3 * * * /path/to/update-blocklists.sh >> /var/log/magpie.log 2>&1

Example: Public Blocklist Repository

# Directory structure
~/blocklist-repo/
├── sources.txt          # Your source URLs
├── blocklist.txt        # Generated blocklist
├── README.md           # Description
└── .github/
    └── workflows/
        └── update.yml  # Optional: GitHub Actions

# sources.txt
https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
https://v.firebog.net/hosts/static/w3kbl.txt
https://raw.githubusercontent.com/PolishFiltersTeam/KADhosts/master/KADhosts.txt

# Cronjob
0 3 * * * cd ~/blocklist-repo && /usr/local/bin/magpie -s sources.txt -o blocklist.txt --silent && git add . && git commit -m "Update $(date +%Y-%m-%d)" && git push

Integration with Kestrel

Use Magpie output with Kestrel threat intelligence server:

# Aggregate blocklists with short flags
./magpie -s sources.txt -o aggregated.txt

# Ingest into Kestrel
while read domain; do
  curl -X POST http://localhost:8080/api/ioc \
    -H "X-API-Key: kestrel_admin_key" \
    -H "Content-Type: application/json" \
    -d "{\"domain\":\"$domain\",\"category\":\"Aggregated\",\"feed\":\"community\"}"
done < aggregated.txt

Output Format

Plain text, one domain per line:

example.com
malicious-site.net
ads.tracking.com

Compatible with:

Contributing

Contributions are welcome! Please submit a Pull Request.


📊 Blocklist Statistics

URL Domains Status Success Failures Last Checked
sources 0 ✅ Active 0 0 never
global 0 ✅ Active 0 0 never

Summary: 2 total URLs | 2 active | 0 filtered | ~0 total domains

Last updated: 2025-11-12 10:39 UTC

---

License

This project is licensed under the same license as the Kestrel project.


Made with ❤️ by [PigeonSec](https://github.com/pigeonsec)

About

Magpie is a high-performance blocklist aggregator that fetches, validates, and combines domain blocklists from multiple sources. Built in pure Go with minimal dependencies (color output and progress bars), it's optimized for speed and reliability.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages