Skip to content

[CI Failure]: mi325_1: Multi-Modal Processor Test #29446

@AndreasKaratzas

Description

@AndreasKaratzas

Name of failing test

pip install git+https://github.com/TIGER-AI-Lab/Mantis.git && pytest -v -s models/multimodal/processing

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

This test validates multimodal input processing correctness for vision-language models in vLLM, comparing cached vs non-cached processing paths.

Purpose:
Ensures that the MultiModalProcessorOnlyCache produces identical results to baseline (uncached) processing across different multimodal inputs (images, videos, audio).

Test Flow:

  1. Parameterized testing across multiple models, hit rates, and batch configurations
  2. Generates random multimodal data with controlled cache hit rates (0.3, 0.5, 1.0)
  3. Creates two processors: baseline (no cache) and cached versions
  4. Compares outputs for both text and token prompts
  5. Validates equivalence of processed inputs using _assert_inputs_equal

Key Test Parameters:

  • model_id: Various multimodal models (filtered to vLLM-only architectures)
  • hit_rate: Cache hit probability (30%, 50%, 100%)
  • num_batches: 32 batches per test run
  • simplify_rate: Probability of converting multi-item inputs to single items (100%)

Special Handling:

  • Model-specific patches for GLM4.1V and Qwen3-VL (video metadata requirements)
  • Skipped models: google/gemma-3n-E2B-it, OpenGVLab/InternVL2-2B, jinaai/jina-reranker-m0 (marked "Fix later")
  • Ignores specific keys for certain models (e.g., Ultravox audio_features due to padding differences)

Assertions:
Verifies that baseline and cached processors produce byte-for-byte identical outputs for the same inputs, ensuring cache correctness doesn't introduce processing errors.

📝 History of failing test

Test failure history:

AMD-CI build Buildkite references:

  • 1041
  • 1077
  • 1088
  • 1109
  • 1111

Resolution in progress:
150 failed, 384 passed, 285 skipped, 18 warnings in 2672.68s (0:44:32)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci-failureIssue about an unexpected test failure in CI

    Type

    No type

    Projects

    Status

    No status

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions