Skip to content

Conversation

@moonbox3
Copy link
Contributor

Motivation and Context

Putting some thoughts and a POC together around workflow compatibility - being able to connect together two workflows, edges + high-level orchestration APIs, etc.

Description

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@moonbox3 moonbox3 self-assigned this Nov 21, 2025
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python labels Nov 21, 2025
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Nov 21, 2025

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework/_workflows
   _concurrent.py1432880%50, 59–60, 68–69, 88–89, 94, 99, 124, 129, 134–135, 141, 163, 173, 180, 253–257, 259, 287, 291, 301–302, 332
   _group_chat.py4146484%169–170, 301–311, 360, 374, 381, 385–387, 453, 462, 608, 619, 623, 692, 709, 714–716, 723, 889, 894, 1011, 1078–1079, 1081–1082, 1084, 1086, 1088–1089, 1157–1158, 1231, 1326, 1352, 1354, 1358–1361, 1363, 1366–1367, 1382, 1415–1418, 1428, 1439, 1448–1450
   _handoff.py48213871%55, 68–70, 77–78, 80, 82, 156, 164–169, 172–173, 187, 196, 215–218, 227–229, 240, 243, 253–264, 266, 272, 278, 308, 353, 364–366, 422, 431–436, 438–439, 450, 467, 485, 509, 537, 549–551, 557, 581–583, 586–589, 591–593, 827, 833, 837, 843, 847, 859, 907, 910, 998, 1003, 1013, 1019–1022, 1030–1031, 1035–1037, 1039–1049, 1051–1052, 1054, 1056, 1071–1072, 1075–1076, 1079, 1099–1105, 1107, 1113, 1147–1148, 1201–1202, 1312–1313, 1394, 1402, 1469, 1480, 1484, 1489–1490
   _magentic.py90325871%49, 54, 76–85, 90, 94–105, 329, 334, 351, 353, 368, 376–385, 463, 467, 481, 487, 502, 582, 595, 612, 621–622, 624–626, 628, 639, 781–784, 787–791, 793–795, 802, 841, 888, 924–926, 928, 936–939, 943–946, 1010, 1068–1069, 1086, 1088–1089, 1097, 1133, 1142–1144, 1164, 1217, 1237, 1240, 1269, 1272, 1280–1284, 1290, 1318–1320, 1322, 1324, 1332–1335, 1337, 1341–1342, 1345–1348, 1350–1351, 1357–1359, 1362–1363, 1368–1369, 1377, 1385, 1400, 1412, 1424–1427, 1456–1457, 1462–1464, 1495, 1521, 1536, 1552, 1569, 1636, 1643, 1646, 1648–1649, 1652–1653, 1657, 1660, 1681, 1716–1717, 1719, 1723–1725, 1740, 1751, 1761, 1805, 1810–1811, 2137–2138, 2142, 2157, 2166–2167, 2172, 2175, 2229, 2240, 2251–2253, 2266–2267, 2272, 2283–2285, 2296–2298, 2310–2317, 2319–2320, 2328, 2336–2337, 2339–2341, 2343–2346, 2350–2358, 2362–2363, 2366–2370, 2372–2373, 2375–2377, 2379–2382, 2384–2386, 2388–2389, 2391–2393, 2408–2411, 2422–2425, 2437–2440, 2444
   _sequential.py84989%71, 80–81, 91, 143, 149, 162–163, 182
   _workflow.py2571893%98, 271–273, 275–276, 294, 318, 320, 413, 662, 696, 701, 704, 723–725, 790
   _workflow_builder.py3383290%96, 138, 182, 185–189, 201–202, 224–225, 242, 324, 407, 409, 416–418, 421–422, 431, 434, 441, 446, 520, 853–857, 859
TOTAL15697238784% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
2224 130 💤 0 ❌ 0 🔥 55.339s ⏱️

Copy link
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some very interesting things in here, I agree that option 2 makes the most sense, but I think the port idea might make some other scenario's easier and should be considered.

analysis = (
ConcurrentBuilder()
.participants([operation, compliance])
.as_connection()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this even needed? couldn't the next step just accept a unbuild workflowbuilder of any type? and then call as_connection itself, the function doesn't have params, so should be easy. Alternatively the prefix could be set in as_connection and then you can just connect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even have prefix default to the name of the workflow, making it a bit easier to keep track of it.

)

builder = WorkflowBuilder()
analysis_handle = builder.connect(analysis, prefix="analysis") # merge + handle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so while I understand the connect verbiage, why not keep it a bit simpler and just add_workflow, which allows either a WorkflowConnection or a WorkflowBuilder (and that probably means also making sure that the workflowbuilders are all based on a single baseclass with the as_connection logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now got a shared mixin-style surface (as_connection, add_workflow as the single verb) rather than forcing all high-level builders into a single inheritance base because:

  • High-level builders already have distinct ctor shapes and internal state; pushing them under one base would introduce a breaking constructor contract or a leaky abstract layer we’d need to retrofit across Concurrent/Sequential/GroupChat/Magentic/Handoff.
  • The behavior we need is tiny (expose .as_connection(prefix=...) and route through WorkflowBuilder), which we can provide uniformly via a mixin or thin delegations without reshaping the class hierarchy.
  • add_workflow accepts WorkflowBuilder, Workflow, or WorkflowConnection; so callers get the simple API, while we keep existing builder implementations intact.
  • This keeps public surface additive and avoids a sweeping inheritance refactor now; if we later find more shared behavior, we can introduce a common base with backward-compatible constructors.

- Composition helpers use contracts to fail fast or select the right canned adapter.
- Pros: predictable type-safe bridges and better error messages. Cons: adds small surface area but aligns with existing adapter executors already used inside SequentialBuilder.

## Option 4: Port-Based Interfaces and Extension Points
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a interesting feature as well, making it very explicit which input and outputs there are, also allowing hosted workflows to be instantiated with different endpoints for the different input ports.

- Keep checkpointing, request/response handling, and observability semantics intact across composed graphs.

## Current State
- High-level builders (ConcurrentBuilder, SequentialBuilder, group chat variants) emit a finished Workflow; the graph is immutable and cannot be extended directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make sure that the responsibilities of our abstractions are clearly defined:

  • A Workflow instance: built from a workflow builder and immutable
  • A WorkflowBuilder instance: created by the users and mutable

This makes the composition story more streamlined and with less ambiguity. This also potentially aligns better with the declarative story: the declarative language is essentially a builder language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a terminology section.

- Type compatibility is enforced by WorkflowBuilder during validation, but only within a single builder instance; cross-workflow composition relies on developers hand-wiring compatible adapters.

## Requirements
- Compose multiple workflows (built from any builder) as first-class nodes inside a parent graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we not achieved this via subworkflows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you mean the workflow must not be wrapped in an executor?


## Option 1: WorkflowExecutor-Centric Composition Helpers
- Add a fluent creation path for WorkflowExecutor to remove manual boilerplate:
- `Workflow.as_executor(id: str | None = None, *, allow_direct_output: bool = False, adapter: Callable[[Any], Any] | None = None) -> WorkflowExecutor`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just syntactic sugar for what we already have:

workflow_executor = WorkflowExecutor(workflow, id="abc", ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - this is syntactic sugar. The doc is exploring different approaches; we'll select what makes sense based on the full discussion.

builder = WorkflowBuilder()
normalize_handle = builder.add_connection(normalize_connection, prefix="prep")
summarize_handle = builder.add_connection(summarize_connection, prefix="summary")
builder.connect(normalize_handle.output_points[0], summarize_handle.start_id)
Copy link
Contributor

@TaoChenOSU TaoChenOSU Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an implicitly assumption we are making here, which is that users must know the internals of the workflow builders to compose them. For example, they need to know the executors and the connections between them the builder is going to create in order to make the correct connections between the output points and starting points.

This makes the whole connection object another layer of complexity and it's probably unnecessary.

Say we have two builders and we want to merge them. And we know what's inside those two builders. We can simply do the following:

workflow_builder_a = ...
workflow_builder_b = ...
new_builder = (
    workflow_builder_a
        .merge(workflow_builder_b)
        .add_edge(workflow_builder_a.get_executor("..."), workflow_builder_b.get_executor("..."))
)

All the checkpoint and start executor configuration are preserved after merging but overrides are allowed because the workflow builder comes with all the APIs.


## Option 3: Typed Adapters and Contracts
- Provide first-class adapter executors to bridge mismatched but intentionally compatible types instead of ad-hoc callbacks:
- `builder.add_adapter(source, target, adapter_fn)` sugar that injects a small Executor running adapter_fn; validated via is_type_compatible on adapter outputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can achieve this by adding a new executor that transforms the data in the merged workflow builder.

Comment on lines +144 to +146
- Executors expose `ports: dict[str, PortSpec]` where PortSpec includes direction (in/out), type set, and optional semantics tag (`conversation`, `aggregate`, `request`, `control`).
- Builders produce a `WorkflowPorts` manifest identifying exposed ports (entry, exit, extension points) instead of only start/terminal nodes.
- New APIs: `builder.connect(source=(node_id, "out:conversation"), target=(node_id, "in:conversation"))` with validation on port types/semantics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to introduce this level of complexity to the lower-level APIs?

The PortSpec feels a bit opinionated. And how does an executor get the port specs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. This option is intentionally more complex to explore the opposite end of the design spectrum. It helps us understand the trade-offs between implicit (current state) and explicit (port-based) approaches.

self._checkpoint_storage = checkpoint_storage
return self

def as_connection(self) -> WorkflowConnection:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have all the high-level workflow builders inherit from WorkflowBuilder instead?


return WorkflowAgent(workflow=self, name=name)

def as_connection(self, prefix: str | None = None) -> "WorkflowConnection":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes back to my comments in the ADR.

This does feel like just a copy of the workflow builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to let an immutable Workflow be reused in composition without mutating it. At runtime the only safe way to inline a built workflow is to recreate a WorkflowBuilder copy of its topology, because all wiring APIs live on the builder. So Workflow.as_connection deep-clones the existing graph into a builder and returns a WorkflowConnection handle: there’s no new surface beyond that copy step. It looks like “copying the builder” because that’s exactly what we need to re-enter the composition APIs while keeping the original workflow immutable; the alternative would be to expose mutation on Workflow, which we're avoiding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason that we cannot reuse the builder but need to use the workflow instance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants