-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Description
Name of failing test
pip install git+https://github.com/TIGER-AI-Lab/Mantis.git && pytest -v -s models/multimodal/processing
Basic information
- Flaky test
- Can reproduce locally
- Caused by external libraries (e.g. bug in
transformers)
🧪 Describe the failing test
This test validates multimodal input processing correctness for vision-language models in vLLM, comparing cached vs non-cached processing paths.
Purpose:
Ensures that the MultiModalProcessorOnlyCache produces identical results to baseline (uncached) processing across different multimodal inputs (images, videos, audio).
Test Flow:
- Parameterized testing across multiple models, hit rates, and batch configurations
- Generates random multimodal data with controlled cache hit rates (0.3, 0.5, 1.0)
- Creates two processors: baseline (no cache) and cached versions
- Compares outputs for both text and token prompts
- Validates equivalence of processed inputs using
_assert_inputs_equal
Key Test Parameters:
model_id: Various multimodal models (filtered to vLLM-only architectures)hit_rate: Cache hit probability (30%, 50%, 100%)num_batches: 32 batches per test runsimplify_rate: Probability of converting multi-item inputs to single items (100%)
Special Handling:
- Model-specific patches for GLM4.1V and Qwen3-VL (video metadata requirements)
- Skipped models:
google/gemma-3n-E2B-it,OpenGVLab/InternVL2-2B,jinaai/jina-reranker-m0(marked "Fix later") - Ignores specific keys for certain models (e.g., Ultravox audio_features due to padding differences)
Assertions:
Verifies that baseline and cached processors produce byte-for-byte identical outputs for the same inputs, ensuring cache correctness doesn't introduce processing errors.
📝 History of failing test
Test failure history:
AMD-CI build Buildkite references:
- 1041
- 1077
- 1088
- 1109
- 1111
Resolution in progress:
150 failed, 384 passed, 285 skipped, 18 warnings in 2672.68s (0:44:32)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status