Skip to content

Add the todos-server reference example#3048

Open
maxisbey wants to merge 1 commit into
mainfrom
todos-server-example
Open

Add the todos-server reference example#3048
maxisbey wants to merge 1 commit into
mainfrom
todos-server-example

Conversation

@maxisbey

@maxisbey maxisbey commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Adds examples/servers/todos-server/ — a Python port of the TypeScript SDK's reference server, examples/todos-server: a small project todo board where every server-side MCP feature has a real job, built on the high-level MCPServer.

Motivation and Context

The TypeScript SDK ships a reference host/server pair (cli-client + todos-server) that exercises tools, resources, prompts, sampling, elicitation, multi-round input_required flows, progress, logging, and subscriptions in one small, readable app, serving both protocol revisions (2026-07-28 and 2025-11-25) over stdio and Streamable HTTP. This PR brings the server half to the Python SDK so the two SDKs have a directly comparable reference workload — the TS cli-client connects to this server out of the box over HTTP.

Notable porting decisions (details in the example README's "Fidelity" section):

  • The three interactive tools are written once as state machines over InputRequiredResult rounds. On 2026-07-28 connections the rounds ride the wire; on pre-2026 connections a ~30-line run_interactive driver in the example fulfils the same rounds as push-style elicitation/sampling requests (the job the TS SDK's built-in legacy shim does), so no handler branches on the era.
  • resources/subscribe/unsubscribe, logging/setLevel, and a dynamic resources/list (one entry per task, like the TS ResourceTemplate list callback) are registered on the low-level server via mcp._lowlevel_server.add_request_handler — the same pattern (and TODO(felix) gap) as the everything-server.
  • Logging honours logging/setLevel on 2025 sessions and the per-request io.modelcontextprotocol/logLevel _meta opt-in on 2026-07-28 sessions, matching the TS server's semantics.
  • request_state for the multi-round brainstorm flow is plaintext JSON in the handler; the SDK's default sealing protects it on the wire. REQUEST_STATE_SECRET maps to RequestStateSecurity(keys=[...]) like the TS example's HMAC env key.

How Has This Been Tested?

Drove this server and the TS reference server through an identical scripted scenario (~38 steps: all eight tools including the three-round brainstorm flow and its decline/cancel branches, resources, templates, prompts, completions, progress, logging thresholds, subscriptions) with the same client and scripted elicitation/sampling callbacks:

  • stdio × both eras: every tool result text, structured output, elicitation form schema, sampling request, progress sequence, and log line matches the TS server. After normalizing pydantic-vs-zod JSON Schema cosmetics, the only remaining differences are SDK-level (Python includes the prompt description in prompts/get results and advertises experimental: {} on legacy initialize).
  • Streamable HTTP × both eras: modern-era results identical, including subscriptions/listen streams (ack + per-mutation board updates on both servers). On the legacy HTTP leg the Python server's default stateful sessions serve push-style elicitation/sampling, where the TS server's stateless posture refuses (its documented caveat).
  • Adversarial probes (schema-violating arguments, out-of-range elicited counts, whitespace-only sampling replies, path-traversal template ids) produce clean error results matching the TS semantics.
  • uv run --frozen ruff format --check, ruff check, pyright (strict, root config), and the full pre-commit suite pass.

Breaking Changes

None — example only; no src/ changes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Two SDK gaps surfaced while verifying parity, documented in the example README rather than worked around: subscriptions/listen is not served over stdio (2026-era stdio clients get METHOD_NOT_FOUND, so board-change notifications over stdio reach 2025-era subscribers only), and the client does not send a courtesy notifications/cancelled when a 2026-era stdio call is abandoned (so mid-flight cancellation of work_through_tasks is only observable on 2025-era sessions). Happy to file issues for both separately.

AI Disclaimer

A Python port of the TypeScript SDK's examples/todos-server: a small
project todo board where every server-side MCP feature has a real job —
CRUD tools, structured output, resources and a task template, prompts
with completions, sampling- and elicitation-backed interactive tools
written once as input_required state machines with a legacy fulfilment
driver, progress, request-tied logging, and per-resource subscriptions —
served to both protocol revisions over stdio and Streamable HTTP.
@maxisbey maxisbey marked this pull request as ready for review July 1, 2026 19:42

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 8 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="examples/servers/todos-server/mcp_todos_server/todos.py">

<violation number="1" location="examples/servers/todos-server/mcp_todos_server/todos.py:121">
P2: A 2026-era `clear_done` confirmation can delete tasks that were completed after the user saw the prompt, because the retry recomputes `done` from the current global board instead of carrying the confirmed task IDs in `request_state`. Preserving the IDs/count in the `InputRequiredResult` state and deleting only that confirmed set would keep the elicitation semantics accurate under concurrent HTTP sessions or board changes between rounds.</violation>

<violation number="2" location="examples/servers/todos-server/mcp_todos_server/todos.py:643">
P2: `complete_task` can complete the wrong task when called with an empty `task` argument, because empty-string substring matching succeeds on the first title. Validating non-empty input before substring lookup would prevent unintended board mutations.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

task: Annotated[str, Field(description="Task id, or part of its title")],
ctx: Context,
) -> CallToolResult:
needle = task.lower()

@cubic-dev-ai cubic-dev-ai Bot Jul 1, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: complete_task can complete the wrong task when called with an empty task argument, because empty-string substring matching succeeds on the first title. Validating non-empty input before substring lookup would prevent unintended board mutations.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At examples/servers/todos-server/mcp_todos_server/todos.py, line 643:

<comment>`complete_task` can complete the wrong task when called with an empty `task` argument, because empty-string substring matching succeeds on the first title. Validating non-empty input before substring lookup would prevent unintended board mutations.</comment>

<file context>
@@ -0,0 +1,824 @@
+    task: Annotated[str, Field(description="Task id, or part of its title")],
+    ctx: Context,
+) -> CallToolResult:
+    needle = task.lower()
+    found = tasks.get(task) or next(
+        (candidate for candidate in tasks.values() if needle in candidate.title.lower()), None
</file context>
Fix with cubic



def render_board() -> str:
done = [task for task in tasks.values() if task.status == "done"]

@cubic-dev-ai cubic-dev-ai Bot Jul 1, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: A 2026-era clear_done confirmation can delete tasks that were completed after the user saw the prompt, because the retry recomputes done from the current global board instead of carrying the confirmed task IDs in request_state. Preserving the IDs/count in the InputRequiredResult state and deleting only that confirmed set would keep the elicitation semantics accurate under concurrent HTTP sessions or board changes between rounds.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At examples/servers/todos-server/mcp_todos_server/todos.py, line 121:

<comment>A 2026-era `clear_done` confirmation can delete tasks that were completed after the user saw the prompt, because the retry recomputes `done` from the current global board instead of carrying the confirmed task IDs in `request_state`. Preserving the IDs/count in the `InputRequiredResult` state and deleting only that confirmed set would keep the elicitation semantics accurate under concurrent HTTP sessions or board changes between rounds.</comment>

<file context>
@@ -0,0 +1,824 @@
+
+
+def render_board() -> str:
+    done = [task for task in tasks.values() if task.status == "done"]
+    lines = [
+        "# Todo board",
</file context>
Fix with cubic

Comment on lines +246 to +270
responses: InputResponses | None = None
state: str | None = None
for _ in range(10):
result = await flow(responses, state)
if isinstance(result, CallToolResult):
return result
responses = {}
for key, request in (result.input_requests or {}).items():
if isinstance(request, ElicitRequest) and isinstance(request.params, ElicitRequestFormParams):
responses[key] = await ctx.session.elicit_form(
request.params.message, request.params.requested_schema, related_request_id=ctx.request_id
)
elif isinstance(request, CreateMessageRequest):
# Push-style sampling is deprecated at 2026-07-28, but it is exactly what a
# pre-2026 session speaks — the deprecation warning is expected, so silence it.
with warnings.catch_warnings():
warnings.simplefilter("ignore", MCPDeprecationWarning)
responses[key] = await ctx.session.create_message( # pyright: ignore[reportDeprecated]
request.params.messages,
max_tokens=request.params.max_tokens,
system_prompt=request.params.system_prompt,
include_context=request.params.include_context,
temperature=request.params.temperature,
stop_sequences=request.params.stop_sequences,
metadata=request.params.metadata,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 On pre-2026 connections, run_interactive unconditionally pushes elicitation/create and sampling/createMessage to the client without checking ctx.client_capabilities and without catching MCPError, so a legacy host that never declared those capabilities gets a raw "Method not found" protocol error for the whole tools/call instead of a graceful in-band result. Consider gating on ctx.client_capabilities (or wrapping the pushes in try/except MCPError) and returning a friendly text_result, the way the everything-server example does.

Extended reasoning...

What happens. run_interactive is the pre-2026 fallback for the three interactive tools (clear_done, brainstorm_tasks, prioritize). Whenever is_modern(ctx) is false it fulfils each InputRequiredResult round by directly calling ctx.session.elicit_form(...) or ctx.session.create_message(...) — i.e. it pushes elicitation/create / sampling/createMessage requests to the client — without ever consulting ctx.client_capabilities (or ctx.session.check_client_capability, which exists at src/mcp/server/session.py:98) and with no try/except around the calls.

The failure path. ServerSession.send_request raises MCPError when the peer answers with a JSON-RPC error (documented at src/mcp/server/session.py:67-68), and elicit_form/create_message go through it. MCPServer._handle_call_tool (src/mcp/server/mcpserver/server.py:407-412) converts generic exceptions into an isError CallToolResult but explicitly re-raises MCPError. So when a 2025-era client that never declared the elicitation or sampling capability receives the pushed request and answers with METHOD_NOT_FOUND, that error propagates out of the tool handler and turns the entire tools/call into a raw JSON-RPC protocol error rather than the clean "Nothing deleted (user answered: ...)" / friendly error text these flows are written to produce.

Why it's reachable. The README explicitly invites "any mcpServers-style host" to spawn the server over stdio, and many pre-2026 hosts declare neither elicitation nor sampling. A user of such a host asking to "clear completed tasks" or "brainstorm tasks" hits this path immediately. The push itself is also non-conformant: the spec requires servers to only send elicitation/sampling requests to clients that declared the corresponding capability. Note this also applies on 2026-era connections in a milder form — the InputRequiredResult input_requests are emitted without checking ctx.client_capabilities either.

Concrete walkthrough.

  1. A 2025-era host (no elicitation/sampling in its client capabilities) spawns the server over stdio and calls tools/call for clear_done with completed tasks on the board.
  2. is_modern(ctx) is false, so run_interactive runs the flow locally; the first round returns an ElicitRequest, and the driver calls ctx.session.elicit_form(...), pushing elicitation/create to the client.
  3. The client doesn't implement that method and responds with a JSON-RPC METHOD_NOT_FOUND error; send_request raises MCPError.
  4. Nothing catches it in run_interactive or clear_done; _handle_call_tool re-raises MCPError, so the host sees an opaque "Method not found" error for its tools/call instead of an in-band tool result.

Why existing code doesn't prevent it. The SDK deliberately re-raises MCPError from tool handlers, and neither elicit_form nor create_message gates on client capabilities — the responsibility sits with the handler. The sibling reference example demonstrates the intended pattern: the everything-server wraps create_message/elicit in try/except and returns "Sampling not supported or error: ..." text, and its test_input_required_result_capabilities tool gates input_requests on ctx.client_capabilities. The todos-server hand-rolls its own legacy shim, so it should carry the same robustness. The README's Fidelity section documents other legacy-interactivity caveats but not this one, so it doesn't appear intentional.

Fix. In run_interactive (and/or before emitting input_requests), check ctx.client_capabilities for elicitation/sampling and return a clean text_result("the connected client does not support elicitation/sampling", is_error=True) when missing — or wrap the elicit_form/create_message calls in try/except MCPError and return a friendly error result. Since this is example-only code and capability-supporting hosts (including the intended TS cli-client companion) work fine, this is a nit, but it's worth fixing in a reference example people will copy.

Comment on lines +444 to +456
async def handle_completion(
ref: PromptReference | ResourceTemplateReference,
argument: CompletionArgument,
context: CompletionContext | None,
) -> Completion | None:
if isinstance(ref, PromptReference):
if ref.name == "seed-board" and argument.name == "theme":
return Completion(values=[theme for theme in THEME_SUGGESTIONS if theme.startswith(argument.value)])
if ref.name == "plan-my-day" and argument.name == "focus":
return Completion(values=[project for project in projects() if project.startswith(argument.value)])
if isinstance(ref, ResourceTemplateReference) and ref.uri == TASK_URI_TEMPLATE and argument.name == "id":
return Completion(values=[task_id for task_id in tasks if task_id.startswith(argument.value)])
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The task-id completion for todos://tasks/{id} returns every matching task id with no cap, but completion values are limited to 100 items by the spec (and enforced by the 2026-07-28 wire model's max_length=100), so once the board exceeds 100 tasks a short/empty prefix produces an oversized result — spec-violating on 2025-era connections and a failed request on 2026-era ones. Slicing the list to 100 (optionally with total/has_more) matches what the TypeScript reference does.

Extended reasoning...

What the bug is. In handle_completion (todos.py:444-456), the ResourceTemplateReference branch for the id argument returns Completion(values=[task_id for task_id in tasks if task_id.startswith(argument.value)]) — every matching task id, uncapped. The MCP spec limits completion.values to at most 100 entries, and the SDK's own Completion docstring in mcp_types._types says "Must not exceed 100 items". The other two completion branches (theme suggestions, project names) stay small naturally, but the task-id one scales with the board.

How the board exceeds 100 tasks. brainstorm_tasks accepts counts up to 100 per call, add_tasks is unbounded, and the module-level tasks dict accumulates across calls — and, over Streamable HTTP, across sessions in the same process. All ids share the t prefix (t1, t2, …), so a completion request for the id argument with an empty prefix or just t matches the whole board.

Why nothing else prevents it. The @mcp.completion() plumbing (src/mcp/server/mcpserver/server.py:698-705) wraps the returned Completion verbatim into a CompleteResult with no truncation, and the SDK-level Completion model has no max_length constraint despite its docstring. However, the 2026-07-28 wire surface does enforce it: mcp_types/v2026_07_28/__init__.py:92 declares values with Field(max_length=100), and server results are serialized against the versioned surface (_methods.serialize_server_result, called from server/runner.py).

Impact. With >100 tasks on the board:

  • On a 2025-era connection, the server emits a spec-violating completion result with more than 100 values and no total/hasMore pagination hints.
  • On a 2026-07-28 connection, serialization against the versioned wire model raises a ValidationError, so the completion/complete request fails with an internal error instead of returning any values at all.

This also diverges from the TypeScript todos-server this PR ports: the TS SDK's completion path truncates to 100 values and sets total/hasMore, so the TS server never emits an oversized completion.

Step-by-step proof. 1) Client calls brainstorm_tasks, answers the count elicitation with custom → 100, and the sampling round returns 100 lines — the board now holds 100 tasks (t1t100). 2) Client calls add_task once more → 101 tasks. 3) Client sends completion/complete with ref = {type: "ref/resource", uri: "todos://tasks/{id}"}, argument = {name: "id", value: "t"}. 4) Every id starts with t, so the list comprehension yields 101 values. 5) On a 2026-07-28 connection, CompleteResult serialization against the v2026_07_28 model trips max_length=100 and the request errors out; on a 2025-era connection, a 101-value result goes on the wire, exceeding the spec limit.

How to fix. Slice the matches to 100, e.g. matches = [task_id for task_id in tasks if task_id.startswith(argument.value)] then return Completion(values=matches[:100], total=len(matches), has_more=len(matches) > 100) — mirroring the TypeScript reference's behaviour.

Severity. This is example-only code, requires accumulating more than 100 tasks plus a short completion prefix, and the fix is a one-liner — nice to fix for parity and spec compliance, but not merge-blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant