Skip to content

Conversation

@jn2707
Copy link

@jn2707 jn2707 commented Oct 22, 2025

STAR series models are highly capable language model specialized in function calling, achieving excellent performances on the Berkeley Function Calling Leaderboard (BFCL) for models in their size classes.

These models are the results of fine-tuning the Qwen/Qwen3-0.6B, Qwen/Qwen3-1.7B and Qwen/Qwen3-4B base models using the novel STAR (Similarity-guided Teacher-Assisted Refinement) framework. STAR is a holistic training curriculum designed to effectively transfer the advanced capabilities of large language models (LLMs) into "super-tiny" models, making them powerful, accessible, and efficient for real-world agentic applications.

The key innovations of the STAR framework include:

  • Similarity-guided RL (Sim-RL): A reinforcement learning mechanism that uses a fine-grained, similarity-based reward signal. This provides a more robust and continuous signal for policy optimization compared to simple binary rewards, which is crucial for complex, multi-solution tasks like function calling.
  • Constrained Knowledge Distillation (CKD): An advanced training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions. This ensures training stability while preserving the model's exploration capacity, creating a strong foundation for the subsequent RL phase.
    Notably, our STAR-0b6 model significantly outperforms other open models under 1B parameters and even surpasses several larger models, demonstrating the effectiveness of the STAR methodology.

STAR-0b6, STAR-1b7 and STAR-4b have achieved outstanding performances for models of their sizes on BFCLv4 (not including Web Search metric).

Metric STAR-4B(FC) STAR-1.7B(FC) STAR-0.6B(FC)
Overall Acc 36.91% 30.00% 26.09%
Model STAR-4B(FC) STAR-1.7B(FC) STAR-0.6B(FC)
Non-Live AST Acc 89.42% 84.94% 79.48%
Non-Live Simple AST 75.17% 74.25% 71.42%
Non-Live Multiple AST 96.50% 92.00% 89.50%
Non-Live Parallel AST 93.00% 87.00% 80.50%
Non-Live Parallel Multiple AST 93.00% 86.50% 76.50%
Live Acc 78.98% 68.91% 59.36%
Live Simple AST 84.50% 79.46% 65.50%
Live Multiple AST 77.78% 66.67% 58.02%
Live Parallel AST 75.00% 50.00% 43.75%
Live Parallel Multiple AST 75.00% 66.67% 62.50%
Multi Turn Acc 25.88% 10.25% 6.75%
Multi Turn Base 32.00% 15.00% 8.50%
Multi Turn Miss Func 27.00% 9.50% 6.50%
Multi Turn Miss Param 24.50% 11.00% 7.00%
Multi Turn Long Context 20.00% 5.50% 5.00%
Web Search Acc N/A N/A N/A
Web Search Base N/A N/A N/A
Web Search No Snippet N/A N/A N/A
Memory Acc 18.92% 17.20% 9.03%
Memory KV 1.94% 5.81% 1.29%
Memory Vector 13.55% 13.55% 5.81%
Memory Recursive Summarization 41.29% 32.26% 20.00%
Relevance Detection 81.25% 75.00% 81.25%
Irrelevance Detection 85.23% 80.96% 83.73%
Format Sensitivity Max Delta N/A N/A N/A
Format Sensitivity Standard Deviation N/A N/A N/A

@jn2707 jn2707 changed the title [BFCL] Add model "star-lab/STAR-0b6", "star-lab/STAR-1b7" and "star-lab/STAR-4b" [BFCL] Add model star-lab/STAR-0b6, star-lab/STAR-1b7 and star-lab/STAR-4b Oct 22, 2025
@jn2707
Copy link
Author

jn2707 commented Oct 28, 2025

@HuanzhiMao Hi! Can you have a look when you get a chance? Appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant