fix(client/stdio): allow FIFO cleanup of multiple transports on asyncio (#577) by sakenuGOD · Pull Request #2484 · modelcontextprotocol/python-sdk

sakenuGOD · 2026-04-20T17:07:25Z

Fixes #577.

The bug

stdio_client wraps its reader / writer in anyio.create_task_group() inside the async context manager body (src/mcp/client/stdio.py L180 on main). A task group binds its cancel scope to whichever task entered the generator, and anyio enforces a strict LIFO stack on those scopes within a single task.

If a caller opens two transports from the same task and then closes them in the order they were opened — a pattern that shows up in multi-client orchestrators, DI containers and pytest fixtures — teardown crashes with:

RuntimeError: Attempted to exit a cancel scope that isn't the current task's current cancel scope

(The traceback in #577 matches this exactly. I reproduced it locally against a freshly cloned main with a minimal repro that spawns two python -c ... subprocesses and closes them FIFO.)

This has been biting users for a year — hits asyncio callers using AsyncExitStack, pytest fixtures, custom multi-client managers (see the comment thread on #577 for the list). The recommended workaround until now was "always clean up LIFO", which pushes anyio's cancel-scope discipline onto every consumer of the SDK and breaks naturally with many real code-organisation patterns.

The fix

On asyncio we don't need a task group to spawn the reader / writer — asyncio.create_task is enough. The tasks then live outside any caller-bound cancel scope, so each transport is self-contained and the cleanup order of two transports no longer matters.

Backend is detected via sniffio.current_async_library() (already a transitive dependency of anyio).
The documented shutdown sequence (close stdin → wait for process with PROCESS_TERMINATION_TIMEOUT → escalate to _terminate_process_tree → close the memory streams) is lifted into a shared _cleanup_process_and_streams helper so the two branches can't drift.
Reader / writer cancellation happens after the process-level cleanup so late I/O errors surfaced during teardown are expected and swallowed, matching the pre-existing anyio.ClosedResourceError handling in those functions.
On structured-concurrency backends (trio) the anyio task group is kept — trio forbids orphan tasks by design, and cross-task cleanup is fundamentally incompatible with that model. Trio callers still have to clean up LIFO. This is acknowledged in the inline comment on the backend branch.

Tests

Three new regression tests in tests/client/test_stdio.py, all asyncio-only (parametrize(\"anyio_backend\", [\"asyncio\"])):

Test	What it pins
`test_stdio_client_supports_fifo_cleanup_on_asyncio`	The exact scenario from #577 — two transports opened, closed FIFO. Verified to fail on the unmodified `main` branch with the original `RuntimeError`, so it is a real regression guard, not a tautology.
`test_stdio_client_supports_lifo_cleanup_on_asyncio`	Sanity: historical LIFO path must still work unchanged after the fix.
`test_stdio_client_shared_exit_stack_fifo_on_asyncio`	A single `AsyncExitStack` around two transports (ExitStack already runs callbacks LIFO, so this case worked before — but worth pinning so a future refactor of the asyncio branch can't silently break it).

pytest tests/client/test_stdio.py --timeout=30 → 13 passed, 1 skipped (the Unix-only SIGINT test is the pre-existing skip; 10 other pre-existing tests + 3 new, none regressed).

Negative-path validation: stashed only src/mcp/client/stdio.py, reran the FIFO test — it fails with the original RuntimeError: Attempted to exit a cancel scope that isn't the current task's current cancel scope. Restored the fix, all three pass.

Scope

Kept strictly to stdio_client. sse.py, streamable_http.py, websocket.py use a similar task-group-in-generator pattern and may need the same treatment — they can be handled in follow-ups if this approach is acceptable.

independently on asyncio (modelcontextprotocol#577) `stdio_client` wrapped its reader / writer in `anyio.create_task_group()` inside the async context manager body. The task group binds its cancel scope to whichever task entered the generator, and anyio enforces a strict LIFO stack on those scopes within a single task. If a caller opened two transports from the same task and then closed them in the order they were opened (FIFO) — which is what happens in multi-client orchestrators, dependency-injection containers and pytest fixtures — teardown crashed with: RuntimeError: Attempted to exit a cancel scope that isn't the current task's current cancel scope On the asyncio backend we can spawn the reader / writer with `asyncio.create_task` instead. The background tasks then live outside any caller-bound cancel scope, so each transport becomes self-contained and the cleanup order of two transports no longer matters. Teardown keeps the documented shutdown sequence (close stdin, wait for the process with `PROCESS_TERMINATION_TIMEOUT`, escalate to `_terminate_process_tree`, then close the memory streams) in a shared `_cleanup_process_and_streams` helper so the two branches can't drift. On structured-concurrency backends (trio) an anyio task group is still required — trio intentionally forbids orphan tasks, and cross-task cleanup is fundamentally incompatible with its model. Those callers still have to clean up LIFO. The backend dispatch goes through `sniffio.current_async_library()`. Regression tests (all asyncio-only): - `test_stdio_client_supports_fifo_cleanup_on_asyncio` — the exact scenario from modelcontextprotocol#577. Verified that it FAILS on the pre-fix code with the original RuntimeError, and passes on the fixed code. - `test_stdio_client_supports_lifo_cleanup_on_asyncio` — historical LIFO path must still work unchanged. - `test_stdio_client_shared_exit_stack_fifo_on_asyncio` — a single AsyncExitStack around two transports (already LIFO under the hood, but worth pinning so future refactors cannot silently break it). All 13 stdio tests pass locally (10 pre-existing + 3 new, 1 Unix-only sigint test skipped on Windows). Fixes modelcontextprotocol#577

… background reader/writer and close streams on crash Self-review of the first commit caught a regression: the asyncio branch spawned stdout_reader / stdin_writer via asyncio.create_task and did `try: await task except BaseException: pass` during cleanup, which silently swallowed any exception the reader or writer raised. An anyio task group would have re-raised that through the `async with` block — the asyncio path has to reproduce that contract to keep the fix observably equivalent. Two changes: 1. `add_done_callback(_on_background_task_done)` — if a background task crashes while the caller is still inside the `yield`, close the memory streams immediately so any in-flight user read wakes up with `ClosedResourceError` instead of hanging forever. Mirrors the scope-cancellation side effect of a task group. The callback logs the exception at debug so operators can trace crashes that we only surface on the way out. 2. On `__aexit__` we now collect task results and re-raise the first non-cancellation, non-closed-resource exception. CancelledError and anyio.ClosedResourceError are expected during the teardown we just forced in `_cleanup_process_and_streams`, so they are swallowed; anything else (a real crash) propagates. New regression test: `test_stdio_client_reader_crash_propagates_on_asyncio` — injects a TextReceiveStream that raises on the first `__anext__`, asserts the exception is visible at the async-with exit. Confirmed it FAILS on the first-commit version ("Failed: DID NOT RAISE"), so it is real regression coverage, not a tautology. All 14 stdio tests pass (10 pre-existing + 4 new, 1 Unix-only skip). Full `tests/client/` pass (203 passed, 1 pre-existing xfail).

sakenuGOD · 2026-04-20T17:13:54Z

Self-review follow-up in 1d56c4a.

While double-checking the fix I caught a real regression in the first commit: the asyncio branch was spawning stdout_reader / stdin_writer via asyncio.create_task and doing try: await task except BaseException: pass during cleanup, which silently swallowed any exception the reader or writer raised. An anyio task group would have surfaced that through the async with block, so the fix was observably weaker than the code it replaced.

Fix (stacked on top of the original commit):

add_done_callback(_on_background_task_done) — if a background task crashes while the caller is still inside the yield, close the memory streams immediately so any in-flight user read wakes up with ClosedResourceError instead of hanging forever. Mirrors the scope-cancellation side effect of a task group. Logs the exception at debug on the way through.
On __aexit__ we now collect task results and re-raise the first non-cancellation, non-closed-resource exception. CancelledError and anyio.ClosedResourceError are expected during the teardown we just forced in _cleanup_process_and_streams, so they're swallowed; anything else propagates.

New regression test: test_stdio_client_reader_crash_propagates_on_asyncio injects a TextReceiveStream that raises on the first __anext__ and asserts the exception surfaces at the async-with exit. Confirmed it fails on the first-commit version with Failed: DID NOT RAISE, so it's real regression coverage, not a tautology.

Results: 14 stdio tests pass (10 pre-existing + 4 new, 1 Unix-only SIGINT skip). Full tests/client/ suite: 203 passed, 1 pre-existing xfail, no new failures.

…nd pyright CI on modelcontextprotocol#2484 surfaced three categories of failures on every Python version: 1. ruff D212 — multi-line docstring summaries were split from the opening ``"""``. Collapsed the first line of each docstring touched by this PR to sit on the opening-quote line. 2. ruff C901 — the asyncio-vs-task-group dispatch pushed ``stdio_client``'s cyclomatic complexity from below the project's max of 24 up to 29. Extracted the two branches into dedicated async context managers: _asyncio_background_tasks(...) — the asyncio path with add_done_callback to close streams on crash, and the ``__aexit__`` that re-raises non-cancellation exceptions. _anyio_task_group_background(...) — the trio / other-backend path that keeps the original task group. ``stdio_client`` now just picks one based on ``sniffio.current_async_library()`` and uses it as an ``async with`` alongside ``process``. The backend-dispatch behaviour is identical to the first commit — only the shape of the code changed. 3. pyright reportArgumentType — the callable types for the helpers needed to be ``Callable[[], Coroutine[Any, Any, None]]`` because ``asyncio.create_task`` requires a Coroutine, not an Awaitable. Imported ``Coroutine`` and ``Any`` accordingly. No behavioural change: all four regression tests still pass, the pre-existing 10 stdio tests are unchanged, and the full ``tests/client/`` suite goes through clean (203 passed, 1 pre-existing xfail, 1 Unix-only SIGINT skip).

…overage Second CI round caught two more things the local run did not: 1. pyright on the CI runs in a stricter mode than the default, and flagged every new test: - `anyio_backend` parametrize fixture needs an explicit `str` type - `monkeypatch` needs `pytest.MonkeyPatch` - `*args, **kwargs` on the stub stream class need `Any` - `__aiter__` / `__anext__` need return types All annotated; pyright now reports 0 errors on both files. 2. The project's coverage floor is `fail_under = 100`. Extracting the trio branch into its own helper function left lines 184-187 (the body of `_anyio_task_group_background`) and the `else:` on line 304 uncovered, because every existing test runs on asyncio. Added `test_stdio_client_supports_lifo_cleanup_on_trio` — parametrised with `anyio_backend=["trio"]`, it exercises the trio branch end-to-end with the LIFO cleanup pattern that is valid on trio. Removed the `# pragma: lax no cover` markers on the helper and the else branch since coverage is now genuine, not silenced. 15/16 stdio tests pass locally (15 real pass + 1 Unix-only SIGINT skip). pyright: 0 errors. ruff: clean.

… branches Second CI round got coverage up to 99.97% but the last uncovered lines are defensive cleanup paths that normal teardown does not exercise and that do not warrant a dedicated test each: - `_on_done` callback's "task cancelled" early-return (normal teardown exits the reader/writer cleanly through stream-close, not via explicit cancellation). - `task.cancel()` guard when the task is somehow still pending by the time we reach the finally block (same reason — tasks should be done already). - CancelledError and ClosedResourceError catches in the result collection loop (symmetric with the case above; those arise only if the task was still active when we began teardown). Each line is marked with a terse ``# pragma: no cover`` pointing at the reason, matching the pattern already used elsewhere in this file (e.g. ``except Exception: # pragma: no cover`` on line 143 inside ``_on_done``).

…says are reached Third CI round: coverage reached 100.00% on the previous commit, but the project also runs ``strict-no-cover`` which complains when a line is marked ``# pragma: no cover`` yet actually gets hit by the test suite (it prevents coverage rot from creeping in under pragma cover). It flagged two of my pragmas: src/mcp/client/stdio.py:161 — `except asyncio.CancelledError:` src/mcp/client/stdio.py:163 — `except anyio.ClosedResourceError:` Both are reached on Linux during the normal teardown of the new stdio tests, so the pragma was wrong. Removed them. (The Windows runs did not hit these branches — which explains why Windows matrix passed last round but Linux did not; strict-no-cover runs on the Linux jobs.) Kept the pragmas on lines 131, 155, 166 — those are the defensive branches that still are not reached by any existing test on either platform and still warrant the marker. 15 stdio tests pass locally; ruff + pyright clean.

…dependent exception catches The two catches I touched in the previous commit (``except asyncio.CancelledError`` and ``except anyio.ClosedResourceError`` in the teardown result-collection loop) turn out to be timing-dependent: - On Windows the teardown of the background reader/writer finishes before the loop reaches ``await task``, so ``await task`` never raises CancelledError / ClosedResourceError and the branches are unreached — needs pragma. - On some Linux runs the reader/writer is still running when we cancel and await, so the branches *are* reached — strict-no-cover complained that a plain ``pragma: no cover`` was wrong there. Switched both to ``pragma: lax no cover``, which the project's coverage config already defines as the opt-out marker for exactly this case (exists on other lines in the same file, e.g. 247). It is excluded from coverage reporting but does not trip strict-no-cover when the line happens to be hit.

sakenuGOD added 2 commits April 20, 2026 20:06

sakenuGOD added 5 commits April 20, 2026 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(client/stdio): allow FIFO cleanup of multiple transports on asyncio (#577)#2484

fix(client/stdio): allow FIFO cleanup of multiple transports on asyncio (#577)#2484
sakenuGOD wants to merge 7 commits intomodelcontextprotocol:mainfrom
sakenuGOD:fix/577-stdio-fifo-cleanup

sakenuGOD commented Apr 20, 2026

Uh oh!

sakenuGOD commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sakenuGOD commented Apr 20, 2026

The bug

The fix

Tests

Scope

Uh oh!

sakenuGOD commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant