Skip to content

Conversation

@bclavie
Copy link

@bclavie bclavie commented Aug 12, 2024

Hey @Muennighoff!

Just the indexing code for now (will add the rest tomorrow), but opening the draft PR in case you wanted to take a look at this before the rest comes in!

Goal of the PR

Add support for ColBERT models, starting with Answer.AI's ColBERT-small via an API Answer will host (discussed with @okhat who's also okay with this being the first ColBERT representative), in order to see how multi-vector models of various sizes fare on this benchmark. The querying mechanism within the API is very simple and lives at AnswerDotAI/mteb_arena_colbert_api.

Changes

  • The PR relies on an external API, where the index is hosted and queried, and which will simply return documents. It doesn't change the logic of any existing mechanisms.
  • It adds the ColBERT indexing code for full reproducibility
  • TODO: It adds the querying mechanism, using API calls to fetch the highest scoring document for a given query.
  • TODO: It adds utilities to download the pre-built indexes from Wikipedia to be able to query them locally.

@bclavie bclavie marked this pull request as draft August 12, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant