Skip to content

[evals] pairwise evaluator #3738

@mikeldking

Description

@mikeldking

Implement a pairwise evaluator that leverages LLM as a judge to judge two generations against each-other. In the case of experiments this would assume to perform judgement against the expected>

https://docs.llamaindex.ai/en/stable/examples/evaluation/pairwise_eval/

Note that there should be a parameter for consensus. E.g. force the LLM to judge the answer flipped and see what it would say.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions