Add support for transcribe image and audio transcription for gemini, anthropic, mistral and ollama. #1828
+312
−311
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Add support for transcribe image and audio transcription for gemini, anthropic, mistral and ollama.
Type of Change
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Implementation Questions for Review
I need guidance on the following architectural decisions:
Video Transcription: For video transcription, what is the intended approach?
Standardizing Return Types for Auto Transcription: The implementation currently involves model-specific methods (e.g., a distinct method for OpenAI and different approaches for Mistral). How should we establish a consistent, unified return type (e.g., a common
TranscriptionResultobject) across all models to ensure a standardized user interface?Future Scope
Testing Notes
I have tested core components of the new features using a separate validation script.
python cognee/tests/test_library.py, specifically:LLMAPIKeyNotSetError: LLM API key is not set. (Status code: 422). I need help on the correct environment variable or configuration file location for setting the LLM API key to resolve this and run the full test suite.