Skip to content

[NeurIPS 2025] Official resources of "HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation".

License

Notifications You must be signed in to change notification settings

tao-hpu/DynHyperRAG

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

91 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HyperGraphRAG - Enhanced Fork & DynHyperRAG Research

Python 3.11+ License Code style: black PRs Welcome

πŸ”¬ Research Project: DynHyperRAG | Original Repository

This is an enhanced fork of the official HyperGraphRAG implementation, serving as the foundation for DynHyperRAG - a novel quality-aware dynamic hypergraph RAG system for doctoral research.

Original Paper: "HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation" Haoran Luo, Haihong E, Guanting Chen, et al. NeurIPS 2025

Research Project: DynHyperRAG: Quality-Aware Dynamic Hypergraph for Efficient Retrieval-Augmented Generation (PhD Thesis, Expected 2025-06)


πŸš€ DynHyperRAG Research Project

Core Innovation

DynHyperRAG extends static HyperGraphRAG with three major innovations:

  1. Graph-Structure-Based Quality Assessment - Automatically evaluate hyperedge quality using 5 structural features (degree centrality, betweenness, clustering coefficient, hyperedge coherence, text quality)

  2. Quality-Aware Dynamic Weight Update - Dynamically adjust hyperedge weights based on retrieval feedback and quality scores using EMA/additive/multiplicative strategies

  3. Efficient Retrieval with Entity Type Filtering - Optimize retrieval speed by 30%+ through entity type filtering and quality-aware ranking

Research Questions

  1. How to automatically assess hyperedge quality using graph structural properties?
  2. How to dynamically adjust hyperedge weights based on retrieval feedback and quality scores?
  3. How to optimize retrieval efficiency while maintaining or improving accuracy?

Expected Contributions

Academic:

  • Novel graph-structure-based quality assessment algorithm
  • New hyperedge coherence metric
  • Quality-aware dynamic update mechanism
  • Feature importance analysis with SHAP

Experimental:

  • Validation on CAIL2019 (legal) and PubMed/AMiner (academic) datasets
  • Retrieval accuracy improvement (MRR +X%)
  • Retrieval time reduction (30%+)
  • Statistical significance testing

Engineering:

  • Complete open-source implementation
  • Reproducible experiment pipeline
  • Lightweight variant (Dyn-Hyper-RAG-Lite) for production

πŸ“– Full thesis overview: docs/THESIS_OVERVIEW.md

πŸ“‹ Project specs: .kiro/specs/dynhyperrag-quality-aware/

Research Timeline (18 Weeks)

Phase Weeks Deliverables
Phase 1: Quality Assessment 1-4 Graph feature extraction, coherence metric, quality scorer, SHAP analysis
Phase 2: Dynamic Update 5-7 Weight updater (EMA/additive/multiplicative), feedback extractor, hyperedge refiner
Phase 3: Efficient Retrieval 8-10 Entity type filter, quality-aware ranker, Dyn-Hyper-RAG-Lite
Phase 4: Data Preparation 11-12 CAIL2019 loader, PubMed/AMiner loader, annotation interface, gold standard datasets
Phase 5: Evaluation 13-14 Metrics implementation, baseline methods, statistical tests
Phase 6: Experiments 15-16 Full pipeline, ablation studies, result generation
Phase 7: Documentation 17-18 API docs, user guide, thesis writing

🎯 What's New in This Fork

✨ Key Enhancements

  1. πŸ”§ Flexible Configuration System

    • Environment-based configuration via .env files
    • Support for multiple OpenAI-compatible API providers
    • Easy deployment without code modification
    • See: config.py | Setup Guide
  2. πŸ› Critical Bug Fixes

    • Fixed AttributeError: 'function' object has no attribute 'embedding_dim'
    • Proper handling of EmbeddingFunc with functools.partial
    • Improved error handling and retry mechanisms
  3. πŸ“š Comprehensive Documentation

  4. πŸ’‘ Enhanced Example Scripts

    • Improved script_construct.py with progress indicators
    • Enhanced script_query.py with configurable query parameters
    • Better error messages and user feedback
  5. 🎨 Interactive Web Visualization

    • Modern React-based web UI for graph exploration
    • Real-time search and filtering capabilities
    • Query path visualization with animation
    • Export functionality (PNG, SVG, JSON)
    • See: Web UI Documentation
  6. πŸ§ͺ Comprehensive Testing Suite

    • 91+ unit and integration tests (88.3% pass rate)
    • Component tests for all UI elements
    • Service and store integration tests
    • E2E test framework with Playwright
    • See: Testing Guide
  7. πŸ”¬ Production-Ready Features

    • Comprehensive logging
    • Automatic retry with exponential backoff
    • Configurable query modes (local/global/hybrid/naive)
    • Validated on real-world medical datasets

πŸ“– Overview

HyperGraphRAG represents a significant advancement in RAG systems by using hyperedges to model n-ary relationships between entities, going beyond the binary relationships of traditional GraphRAG.

πŸ† Key Advantages (vs Traditional RAG/GraphRAG)

Metric vs Standard RAG vs GraphRAG
Accuracy +15.4% +8.2%
Hallucination Rate -27% -18%
Retrieval Time Faster -28% (9.5s vs 13.3s)
Cost Similar -35% ($0.0032 vs $0.0049)

Why Hypergraphs?

  • Traditional GraphRAG: One edge connects only 2 entities (binary relation)
  • HyperGraphRAG: One hyperedge connects N entities (n-ary relation)
  • Better models real-world complex relationships (medical, legal, scientific domains)

πŸ“Š Detailed analysis: Performance Analysis Report


πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • OpenAI API key (or compatible API endpoint)

Installation

# Clone this fork
git clone https://github.com/tao-hpu/HyperGraphRAG.git
cd HyperGraphRAG

# Create environment
conda create -n hypergraphrag python=3.11
conda activate hypergraphrag

# Install dependencies
pip install -r requirements.txt

Configuration

  1. Copy the example configuration:
cp .env.example .env
  1. Edit .env with your API credentials:
# Required
OPENAI_API_KEY=your_api_key_here

# Optional (use OpenAI-compatible endpoints)
OPENAI_BASE_URL=https://api.openai.com/v1

# Models
EMBEDDING_MODEL=text-embedding-3-small
LLM_MODEL=gpt-4o-mini

Usage

1. Build Knowledge Hypergraph

python script_construct.py

Output: Constructs a knowledge hypergraph from example_contexts.json

  • Extracts entities and hyperedges
  • Generates embeddings for entities, hyperedges, and text chunks
  • Saves to expr/example/

2. Query the Knowledge Base

python script_query.py

Output: Runs a sample complex query using hybrid mode (local + global retrieval)

3. Launch Web Visualization (Optional)

# Start backend API
python3 -m uvicorn api.main:app --host 0.0.0.0 --port 3401

# In another terminal, start frontend
cd web_ui
pnpm install
pnpm dev

Access: Open http://localhost:3400 in your browser

Features:

  • Interactive graph visualization with Cytoscape.js
  • Real-time node/edge search and filtering
  • Query execution with path visualization
  • Export graphs as PNG, SVG, or JSON
  • Responsive design for mobile/tablet/desktop

πŸ“– Full guide: Web UI Documentation


πŸ“‚ Project Structure

HyperGraphRAG/
β”œβ”€β”€ .env.example              # Configuration template
β”œβ”€β”€ config.py                 # Configuration management system
β”œβ”€β”€ script_construct.py       # Enhanced construction script
β”œβ”€β”€ script_query.py          # Enhanced query script
β”‚
β”œβ”€β”€ hypergraphrag/           # Core library
β”‚   β”œβ”€β”€ base.py              # Base classes and interfaces
β”‚   β”œβ”€β”€ hypergraphrag.py     # Main HyperGraphRAG implementation
β”‚   β”œβ”€β”€ operate.py           # Core operations (extract, query)
β”‚   β”œβ”€β”€ llm.py               # LLM interface
β”‚   β”œβ”€β”€ storage.py           # Storage implementations
β”‚   β”œβ”€β”€ prompt.py            # Prompt templates
β”‚   β”œβ”€β”€ utils.py             # Utility functions
β”‚   β”‚
β”‚   β”œβ”€β”€ quality/             # πŸ†• Quality assessment module (DynHyperRAG)
β”‚   β”‚   β”œβ”€β”€ __init__.py      # Module exports
β”‚   β”‚   β”œβ”€β”€ scorer.py        # Quality scoring algorithm
β”‚   β”‚   β”œβ”€β”€ features.py      # Graph structure feature extraction
β”‚   β”‚   β”œβ”€β”€ coherence.py     # Hyperedge coherence metric
β”‚   β”‚   └── analyzer.py      # Feature importance analysis (SHAP)
β”‚   β”‚
β”‚   β”œβ”€β”€ dynamic/             # πŸ†• Dynamic update module (DynHyperRAG)
β”‚   β”‚   β”œβ”€β”€ __init__.py      # Module exports
β”‚   β”‚   β”œβ”€β”€ weight_updater.py    # Weight update mechanism
β”‚   β”‚   β”œβ”€β”€ feedback_extractor.py # Feedback signal extraction
β”‚   β”‚   └── refiner.py       # Hyperedge refinement
β”‚   β”‚
β”‚   β”œβ”€β”€ retrieval/           # πŸ†• Efficient retrieval module (DynHyperRAG)
β”‚   β”‚   β”œβ”€β”€ __init__.py      # Module exports
β”‚   β”‚   β”œβ”€β”€ entity_filter.py # Entity type filtering
β”‚   β”‚   β”œβ”€β”€ quality_ranker.py # Quality-aware ranking
β”‚   β”‚   └── lite_retriever.py # Lightweight retriever
β”‚   β”‚
β”‚   β”œβ”€β”€ evaluation/          # πŸ†• Evaluation framework (DynHyperRAG)
β”‚   β”‚   β”œβ”€β”€ __init__.py      # Module exports
β”‚   β”‚   β”œβ”€β”€ metrics.py       # Evaluation metrics
β”‚   β”‚   β”œβ”€β”€ baselines.py     # Baseline methods
β”‚   β”‚   └── pipeline.py      # Experiment pipeline
β”‚   β”‚
β”‚   └── data/                # πŸ†• Data processing (DynHyperRAG)
β”‚       β”œβ”€β”€ __init__.py      # Module exports
β”‚       β”œβ”€β”€ cail2019_loader.py   # CAIL2019 legal dataset
β”‚       β”œβ”€β”€ academic_loader.py   # PubMed/AMiner academic dataset
β”‚       └── annotator.py     # Annotation interface
β”‚
β”œβ”€β”€ api/                     # FastAPI backend for visualization
β”‚   β”œβ”€β”€ main.py              # API entry point
β”‚   β”œβ”€β”€ routes/              # API endpoints
β”‚   β”œβ”€β”€ services/            # Business logic
β”‚   └── models/              # Data models
β”‚
β”œβ”€β”€ web_ui/                  # React-based web visualization
β”‚   β”œβ”€β”€ src/                 # Frontend source code
β”‚   β”œβ”€β”€ e2e/                 # E2E tests (Playwright)
β”‚   β”œβ”€β”€ TESTING_GUIDE.md     # Testing documentation
β”‚   └── TEST_SUMMARY.md      # Test results
β”‚
β”œβ”€β”€ docs/                    # Comprehensive documentation
β”‚   β”œβ”€β”€ THESIS_OVERVIEW.md   # πŸ†• DynHyperRAG thesis overview
β”‚   β”œβ”€β”€ QUICKSTART.md        # Quick start guide (δΈ­ζ–‡)
β”‚   β”œβ”€β”€ SETUP.md             # Setup guide (δΈ­ζ–‡)
β”‚   β”œβ”€β”€ performance-analysis.md  # Performance & advantage analysis
β”‚   β”œβ”€β”€ troubleshooting.md   # Troubleshooting guide
β”‚   β”œβ”€β”€ architecture.md      # Architecture & design
β”‚   └── visualization/       # Visualization documentation
β”‚
β”œβ”€β”€ .kiro/specs/             # πŸ†• Research specifications
β”‚   └── dynhyperrag-quality-aware/
β”‚       β”œβ”€β”€ requirements.md  # Research requirements
β”‚       β”œβ”€β”€ design.md        # System design
β”‚       └── tasks.md         # Implementation tasks
β”‚
β”œβ”€β”€ expr/                    # Experiment data
β”‚   β”œβ”€β”€ example/             # Original medical dataset
β”‚   β”œβ”€β”€ cail2019/            # πŸ†• Legal dataset (planned)
β”‚   └── pubmed/              # πŸ†• Academic dataset (planned)
β”‚
└── evaluation/              # Evaluation scripts

πŸ”¬ Research & Experimentation

DynHyperRAG vs Static HyperGraphRAG

Feature Static HyperGraphRAG DynHyperRAG (This Research)
Relationship Type n-ary n-ary
Graph Structure Static Dynamic
Quality Assessment None Graph structure features
Weight Update None Feedback-driven
Retrieval Optimization Graph traversal Type filtering + quality ranking
Efficiency Slow Optimized (-30%)

Supported Query Modes

  • local: Entity-centric retrieval (focused)
  • global: Graph-wide retrieval (comprehensive)
  • hybrid: Combines local + global (recommended)
  • naive: Simple text retrieval (baseline)

Configurable Parameters

from hypergraphrag import QueryParam

result = rag.query(
    query_text,
    param=QueryParam(
        mode="hybrid",
        top_k=60,
        max_token_for_text_unit=4000,
        max_token_for_local_context=4000,
        max_token_for_global_context=4000
    )
)

DynHyperRAG Configuration (Planned)

from hypergraphrag.quality import QualityScorer
from hypergraphrag.dynamic import WeightUpdater
from hypergraphrag.retrieval import EntityTypeFilter

# Quality assessment
scorer = QualityScorer(graph, config={
    'feature_weights': {
        'degree_centrality': 0.2,
        'betweenness': 0.15,
        'clustering': 0.15,
        'coherence': 0.3,
        'text_quality': 0.2
    }
})

# Dynamic weight update
updater = WeightUpdater(graph, config={
    'update_alpha': 0.1,
    'decay_factor': 0.99,
    'strategy': 'ema'  # ema, additive, multiplicative
})

# Entity type filtering
filter = EntityTypeFilter(graph, config={
    'domain': 'legal',  # legal, academic
    'entity_taxonomy': {
        'legal': ['law', 'article', 'court', 'party', 'crime', 'penalty']
    }
})

πŸ“Š Validated Performance

Tested on 3-document medical dataset:

  • Entities extracted: 129
  • Hyperedges extracted: 84
  • Graph size: 213 nodes, 145 edges
  • Construction time: ~85 seconds
  • Query time: ~3 seconds (hybrid mode)
  • API success rate: >95%

Full analysis: docs/performance-analysis.md


🀝 Contributing

This is a research fork. Contributions are welcome!

Current Research Focus (DynHyperRAG)

  • Quality assessment module (Week 1-4) βœ…
  • Dynamic weight update module (Week 5-7) βœ…
  • Efficient retrieval module (Week 8-10) βœ…
  • CAIL2019 legal dataset preparation (Week 11-12) βœ…
  • PubMed/AMiner academic dataset preparation (Week 11-12) βœ…
  • Evaluation framework (metrics, baselines, pipeline) (Week 13-14) βœ…
  • Ablation studies framework (Week 15-16) βœ…
  • Full experiments and result generation (Week 15-16)
  • Thesis writing (Week 17-18)

Completed Features

  • Visualization tools for hypergraph exploration βœ…
  • Comprehensive testing infrastructure βœ…
  • Production-ready configuration system βœ…
  • Web UI with interactive graph exploration βœ…
  • CAIL2019 legal dataset loader and processor βœ…
  • Academic dataset (PubMed/AMiner) loader βœ…
  • Evaluation metrics (MRR, Recall@K, Precision@K, F1@K, NDCG@K) βœ…
  • Baseline methods (BM25, TF-IDF, Dense Retrieval, GraphRAG) βœ…
  • Experiment pipeline with statistical testing βœ…
  • Ablation studies framework βœ…

Future Work

  • Optimization for large-scale documents
  • Advanced hyperedge extraction algorithms
  • Multi-language support
  • Integration with other vector databases
  • Performance optimization for web UI

πŸ“ Citation

If you use this fork or DynHyperRAG in your research, please cite:

Original HyperGraphRAG Paper

@misc{luo2025hypergraphrag,
      title={HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation},
      author={Haoran Luo and Haihong E and Guanting Chen and Yandan Zheng and Xiaobao Wu and Yikai Guo and Qika Lin and Yu Feng and Zemin Kuang and Meina Song and Yifan Zhu and Luu Anh Tuan},
      year={2025},
      eprint={2503.21322},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.21322}
}

DynHyperRAG (This Research)

@phdthesis{liu2025dynhyperrag,
      title={DynHyperRAG: Quality-Aware Dynamic Hypergraph for Efficient Retrieval-Augmented Generation},
      author={Hao Liu and Tao An},
      year={2025},
      school={[University Name]},
      note={PhD Thesis, Expected June 2025. Core algorithm by Hao Liu, programming and code review by Tao An},
      url={https://github.com/HaoLiu923/DynHyperRAG}
}

This Enhanced Fork

@misc{liu2025hypergraphrag-enhanced,
      title={HyperGraphRAG: Enhanced Implementation with Production Features},
      author={Hao Liu and Tao An},
      year={2025},
      url={https://github.com/HaoLiu923/DynHyperRAG},
      note={Enhanced fork with configuration system, visualization, and DynHyperRAG research implementation. Core algorithm by Hao Liu, programming and code review by Tao An}
}

πŸ“§ Contact

Core Algorithm Developer: Hao Liu (HaoLiu923)

Programming & Code Review: Tao An

Location: Beijing

For questions about this fork: Open an issue on GitHub

For questions about the original implementation: Contact the original authors at [email protected]


πŸ™ Acknowledgements

Original Authors

This fork is based on the excellent work by Haoran Luo, Haihong E, and the LHRLAB team. Their groundbreaking research on hypergraph-structured knowledge representation (NeurIPS 2025) laid the foundation for this enhanced implementation.

Special thanks to:

  • Haoran Luo and the research team for their innovative HyperGraphRAG paper
  • LHRLAB for the original implementation and continuous maintenance
  • The open-source community for valuable feedback and contributions

Related Research

This fork also benefits from related work:

Dependencies

This project builds upon excellent open-source projects:

  • LightRAG - RAG framework foundation
  • Text2NKG - Knowledge graph extraction
  • HAHE - Hypergraph algorithms

Development Tools

  • Claude Code - Assisted in documentation, bug fixes, and analysis

πŸ“œ License

This fork maintains the same license as the original repository. Please refer to the original repository for license details.


πŸ”— Links

Original Work

This Fork & DynHyperRAG Research

Related Projects

Datasets

  • CAIL2019: Chinese AI and Law Challenge 2019 (Legal domain)
  • PubMed Knowledge Graph: Academic papers and citations
  • AMiner: Academic social network dataset

⭐ If you find this fork useful, please consider giving it a star!

Made with ❀️ in Beijing

About

[NeurIPS 2025] Official resources of "HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 81.4%
  • TypeScript 17.1%
  • Shell 1.1%
  • JavaScript 0.2%
  • CSS 0.1%
  • Dockerfile 0.1%