Skip to content

Conversation

@mikeldking
Copy link
Collaborator

@mikeldking mikeldking commented Sep 25, 2025

this is the feature branch for the upcoming version 13

@mikeldking mikeldking requested review from a team as code owners September 25, 2025 20:54
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Sep 25, 2025
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 25, 2025
@mikeldking mikeldking changed the base branch from main to feat/version-12 September 25, 2025 20:56
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Sep 25, 2025
@mikeldking mikeldking changed the title version 13 feat!: version 13 - dataset evaluators Sep 25, 2025
@pkg-pr-new
Copy link

pkg-pr-new bot commented Sep 25, 2025

Open in StackBlitz

npm i https://pkg.pr.new/Arize-ai/phoenix/@arizeai/phoenix-client@9642
npm i https://pkg.pr.new/Arize-ai/phoenix/@arizeai/phoenix-mcp@9642

commit: 7a0c5f7

@mikeldking mikeldking added the feature branch a feature branch that consolidates multiple features into a single commit on main label Sep 25, 2025
@mikeldking mikeldking marked this pull request as draft September 25, 2025 21:22
Copy link
Contributor

@RogerHYang RogerHYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking feature branch

@github-project-automation github-project-automation bot moved this from 📘 Todo to 🔍. Needs Review in phoenix Sep 25, 2025
Base automatically changed from feat/version-12 to main September 29, 2025 18:13
An error occurred while trying to automatically change base from feat/version-12 to main September 29, 2025 18:13
@RogerHYang RogerHYang removed this from phoenix Oct 6, 2025
@RogerHYang RogerHYang force-pushed the version-13 branch 4 times, most recently from fc49ed1 to 2b19c56 Compare October 24, 2025 15:47
@RogerHYang RogerHYang force-pushed the version-13 branch 3 times, most recently from f4ab1f0 to 2ef94fd Compare October 29, 2025 15:31
@mikeldking mikeldking closed this Nov 4, 2025
@mikeldking mikeldking reopened this Nov 4, 2025
cephalization and others added 17 commits November 30, 2025 14:37
…s useful (#10187)

* feat(evaluators): provide a useful correctness pre-built evaluator

* feat(evaluators): provide a useful correctness pre-built evaluator

* simplify
* evaluator prompt validation

* cursor tests

* clean

* condense

* test

* clean

* clean

* test

* parse pydantic errors

* clean

* validate mutations

* fix tests

* validate choices

* test with form

* test

* type check

* clean
* include only dataset-specific evaluators in playground eval selector

* fix dataset page tab selection

* add aria label to dialog

* add annotation names to playground select

* handle long annotation  names

* separate components for DatasetEvaluatorSelect and PlaygroundEvaluatorSelect

* remove extra opacity css var

* updates to Menu

* updates to evaluator menus

* fix menu item flicker

* wip: enable mapping evaluator from playground

* formatting
* add eval outputs to playground output cell

* add evaluation details popover & trace link

* include evals in output for non-streaming playground runs

* fix unnecessary truncation of eval name

* handle evaluations on error

* fix evaluation name

* rerun CI

* prevent losing example data when handling tool chunk

---------

Co-authored-by: Alexander Song <[email protected]>
* feat: Create distinct slideovers for evaluator use cases

* fix: manually update updated_at when creating llm_evaluator

* fix: global change to combobox, opens submenu on enter

---------

Co-authored-by: Rick Steele <[email protected]>
* Spike out builtin evaluator interfaces

* Get builtin evaluator if it exists

* Refine data model

* Simplify models

* Implement literal/path mapping logic

* Wire up builtin evaluators

* Persist single-run evaluations as SpanAnnotations

* Update gql schema and run relay compiler

* Fix evaluation over playground dataset run

* ruff

* Fix queries w.r.t BuiltInEvaluator

* Add built in evaluators to dataset evaluators query

* Add xfail to dataset evaluator test

* Ignore missing type stubs

* fix evaluators over single chat

* fix ts ci

---------

Co-authored-by: Tony Powell <[email protected]>
Co-authored-by: Alexander Song <[email protected]>
* wip: enable unassigning a dataset evaluator

* update cached evaluator data upon assignment/unassignment

* add confirmation dialog

* wire up evaluator unlink with optional delete

* remove row selectability

* add comment

* use alert banner instead of toast for errors

* explicitly close dialog on successful delete/unlink
* fix evaluator config dialog header overflow

* fix dataset select overflow

* styling

* dataset select styling
* feat: Add builtin evaluator support to crosswalk table

* Fix migration and updqte gql schema

* Fix relationship definition

* feat: Add prebuilt evaluators to template submenu

* Tweak language

* feat: Support input mapping code evaluators

* Improve dataset messaging in evaluator form

* update default evaluator template

* Add DatasetExampleSelect component

Also makes combobox and dataset select more responsive

* Allow users to edit evaluator input preview

* Fix db constraints for input mapping

* Wire up input mapping end to end

* Fix ruff

* use fastapi instead of starlette import

* Remove xfail and clean up input_config handling

* Ruff

* Verify evaluator id existence for type checker

* Build gql schema and run relay compiler

* Pull output from input-mapped inputs

* Insure input config is stored as JSON

* Add minWidth prop to Select

* Fix evaluator config dialog header truncation

* Use both unique constraint and partial index

* Add builtin evaluators to dataloader

* Call lower() after str conversion

* Rename evaluator for simplicity

* Remove explicit constraint name

* Update variable name

* Address PR feedback

* Change column name from input_config to input_mapping

* Update tests and other input_config references

* Make mypy happy

---------

Co-authored-by: Dustin Ngo <[email protected]>
RogerHYang and others added 12 commits December 1, 2025 10:13
* remove global evaluator create flow

* remove type dependancy
* refactor table cell helpers to be shared between experiments table & playground

* remove unused code

* refactor annotations list to shared commponent

* add data test id
* add wrapper for GridList

* update playground eval select to use GridList

* add edit evaluator button to menu

* add menu section header

* fix selection & hover/focus states

* update stories

* remove unused code

* update GridList section styles

* pairing with rick

* restore dataset id arg

* remove comment

* address cursor PR comments

* update dataset eval menu to use MenuItem

* address feedback
* feat: prompt template apply query

* feat: apply prompt template

* refactor for re-use

* add more validation

* fix tests

* remove tool call parts

* remove tool result application

* Update src/phoenix/server/api/queries.py

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

* cleanup

---------

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
* add preview query

* cleanup

* cleanup

* cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature branch a feature branch that consolidates multiple features into a single commit on main size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants