feat(observability): add mothership tracing by Sg312 · Pull Request #4253 · simstudioai/sim

Sg312 · 2026-04-22T02:38:20Z

Summary

Brief description of what this PR does and why.

Fixes #(issue)

Type of Change

Testing

How has this been tested? What should reviewers focus on?

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)lpful -->

vercel · 2026-04-22T02:38:26Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Apr 22, 2026 3:18am

greptile-apps · 2026-04-22T02:45:14Z

Greptile Summary

This PR introduces comprehensive end-to-end OTel tracing for the mothership copilot pipeline, linking Sim (Node.js) and Go spans into a single trace via W3C trace-context propagation. Key additions include a new otel.ts module with root/child span helpers, a custom business-span sampler, a MothershipOriginSpanProcessor for span renaming and tagging, fetchGo for instrumented outbound HTTP, and explicit OTel-context re-entry inside ReadableStream.start to survive the Next.js handler→stream ALS boundary.

Confidence Score: 5/5

Safe to merge — all findings are P2 suggestions with no functional impact on production tracing correctness.

The implementation is well-structured with careful attention to ALS boundary issues, span lifecycle correctness (no leaked spans), and explicit context propagation. All three findings are P2: the serviceName override is a documentation/configurability concern for forks, the tracer name inconsistency is cosmetic, and the non-null assertions don't change runtime behavior.

apps/sim/instrumentation-node.ts — the serviceName: 'mothership' override and its interaction with the documented user-configurable telemetry.config.ts field.

Important Files Changed

Filename	Overview
apps/sim/instrumentation-node.ts	OTel bootstrap rewritten: introduces custom sampler, MothershipOriginSpanProcessor, and OTLP exporter; hardcodes serviceName: 'mothership' overriding the user-configurable telemetry.config.ts field.
apps/sim/lib/copilot/request/otel.ts	New module providing all OTel root/child span helpers for copilot lifecycle including startCopilotOtelRoot, withCopilotSpan, withCopilotToolSpan, and span-status policy helpers.
apps/sim/lib/copilot/request/go/fetch.ts	New fetchGo wrapper that instruments all Sim→Go HTTP calls in an OTel child span and propagates W3C traceparent headers.
apps/sim/lib/copilot/request/lifecycle/start.ts	SSE stream lifecycle updated: re-enters root OTel context inside ReadableStream.start to survive the Next.js handler→stream ALS boundary.
apps/sim/app/api/copilot/chat/stream/route.ts	Stream resume route creates an explicit root span parented via traceparent header echo; span is correctly ended in all early-return paths and stream finally block.
apps/sim/lib/copilot/request/tools/executor.ts	Tool execution wrapped in withCopilotToolSpan via new executeToolAndReportInner inner function; adds attachment-shape extraction for image metrics.
apps/sim/lib/copilot/request/lifecycle/finalize.ts	Finalization wrapped in its own OTel span with outcome and publisher state; uses a separate tracer name instead of the shared getCopilotTracer() helper.
apps/sim/lib/copilot/request/go/propagation.ts	New module: W3C trace context injection and extraction via W3CTraceContextPropagator.
apps/sim/lib/copilot/request/session/abort-reason.ts	New zero-dependency module defining AbortReason vocab and isExplicitStopReason predicate to avoid circular imports between OTel and abort layers.
apps/sim/lib/copilot/request/tools/files.ts	File output write wrapped in withCopilotSpan; non-null assertions added for context.workspaceId! and context.userId! which are typed as optional.

Sequence Diagram

sequenceDiagram
    participant Browser
    participant Next as Next.js (sim)
    participant OTel as OTel SDK
    participant Go as Go Orchestrator
    participant Tools as Tool Executor

    Browser->>Next: POST /api/copilot/chat
    Next->>OTel: startCopilotOtelRoot() → gen_ai.agent.execute (ROOT_CONTEXT)
    Note over Next,OTel: requestId = traceId

    Next->>Go: fetchGo() + W3C traceparent injected
    Go->>Go: child spans under Sim trace
    Go-->>Next: SSE stream

    loop Stream events
        Next->>Next: withCopilotSpan(copilot.*)
        Next->>Tools: executeToolAndReport()
        Tools->>OTel: withCopilotToolSpan() → tool.execute span
        Tools-->>Go: tool result
    end

    Next->>OTel: finalizeStream() → copilot.finalize.stream
    Next->>OTel: otelRoot.finish() → ends gen_ai.agent.execute

    Browser->>Next: GET /api/copilot/chat/stream (reconnect)
    Next->>OTel: startSpan(copilot.resume.request) parented via traceparent echo
    Next->>Next: ReadableStream SSE poll loop
    Next->>OTel: rootSpan.end() in finally block

_{Reviews (1): Last reviewed commit: "Fix lint" | Re-trigger Greptile}

greptile-apps · 2026-04-22T02:45:17Z

+      serviceName: 'mothership',
+    }


serviceName override silently bypasses user-configurable field

telemetry.config.ts explicitly documents serviceName as configurable for forks of the repo ("You can change this for your fork"), but the code always overwrites it with 'mothership' on this line. Any fork that has customized the service name in telemetry.config.ts will silently get 'mothership' exported in their spans, breaking their own trace backends. The config file's documentation and the override should be reconciled — either drop the config field or conditionally apply this default.

greptile-apps · 2026-04-22T02:45:18Z

 import type { OrchestratorResult } from '@/lib/copilot/request/types'

 const logger = createLogger('CopilotStreamFinalize')
+const getTracer = () => trace.getTracer('sim-copilot-finalize', '1.0.0')


Inconsistent tracer name

getTracer() here returns trace.getTracer('sim-copilot-finalize', '1.0.0'), while every other mothership span uses getCopilotTracer() (tracer 'sim-ai-platform'). The tracer name is metadata and doesn't affect routing, but this creates an inconsistent otel.library.name dimension in the OTLP backend. The span name copilot.finalize.stream is still covered by ALLOWED_SPAN_PREFIXES so sampling is unaffected, but aligning with the shared helper would improve maintainability.

greptile-apps · 2026-04-22T02:45:19Z

+          [TraceAttr.CopilotOutputFileOutcome]: CopilotOutputFileOutcome.Uploaded,
+        })


Non-null assertion on optional field

context.workspaceId is typed as string | undefined in ToolExecutionContext. The added ! suppresses the TypeScript error at compile time but provides no runtime guard — if workspaceId is actually undefined, uploadWorkspaceFile receives undefined silently. An explicit guard or a fallback default would be safer than a blanket assertion.

cursor · 2026-04-22T03:18:28Z

PR Summary

Medium Risk
Touches multiple copilot/billing API routes and chat streaming/stop/abort flows to propagate W3C traceparent and add span/attribute logic; mistakes could affect request correlation or edge-case behavior in streaming termination and tool confirmations.

Overview
Adds end-to-end OTel tracing for mothership by introducing a canonical trace/attribute vocabulary (trace-attribute-values-v1) and wiring server routes to parent under Go spans via withIncomingGoSpan, while outbound Sim→Go calls now go through fetchGo to create child spans and propagate traceparent.

Updates chat lifecycle to return traceparent on chat POST and echo it on side-channel requests (stop/abort/confirm/stream resume), adds richer span outcomes/metrics across routes, and instruments async-run DB operations with per-operation spans (excluding high-frequency poll paths).

Fixes a few behavior edges while instrumenting: /copilot/confirm now returns 500 (not 400) when the durable update fails, /chat/stop can persist an optional requestId, stream resume replay stamps terminal synthetic events with the latest request id, and the client stop/finalize paths now dispatch queued follow-ups reliably and treat tool events as partial when status=generating.

Reworks Node OTel bootstrap to export only allowlisted spans, add mothership.origin tagging/name prefixing, support standard OTLP endpoint/headers env vars, and implement ratio-based sampling with improved export error visibility.

^{Reviewed by Cursor Bugbot for commit fad203d. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit fad203d. Configure here.}

cursor · 2026-04-22T03:27:15Z

+    }
+  }
+  return 1.0
+}


Default sampling ratio increased from 10% to 100%

Medium Severity

resolveSamplingRatio defaults to 1.0 (100%) when no env var is set, whereas the previous code hardcoded TraceIdRatioBasedSampler(0.1) (10%). This is a 10× increase in default trace export volume. Combined with the removal of ParentBasedSampler, business-root spans that were previously sampled at 10% are now always exported, and spans whose Go-side parent was unsampled are now independently re-sampled at 100% — potentially creating broken/orphan traces in backends. Deployments that relied on the old 10% default will see a significant cost and throughput increase without any configuration change.

Additional Locations (1)

apps/sim/instrumentation-node.ts#L275-L278

^{Reviewed by Cursor Bugbot for commit fad203d. Configure here.}

Sg312 and others added 26 commits April 16, 2026 12:35

Tracing v1

5ff1f28

Tracing v2

8aa203c

Fix

62a351a

Otel

e5d0a49

update otel

a54470e

Otel

c2d9dd9

Otel fixes

9b4ca6f

Update

792459c

Otel v3

0bb67c9

Otel

4ab6706

Update

c72fa4e

Add sse tracking

8240130

Schema

746157c

update

7f5c449

Fixes

2f2230b

thinking

96cafab

Hide thinking

0b612b9

Fix orphaned spans

8c3e8cf

Fix

d9973cf

Fix

712801a

fix(run-workflow): race condition with upsert async tool

1745793

Merge branch 'staging' into dev

7a18b96

fix lint

a762795

fix merge conflict mistake

38a5d82

fix type error

a4949ff

Updates

79d3e5b

Fix lint

0a31c3c

vercel Bot temporarily deployed to Preview April 22, 2026 02:44 Inactive

greptile-apps Bot reviewed Apr 22, 2026

View reviewed changes

fix tests

fad203d

vercel Bot temporarily deployed to Preview April 22, 2026 03:18 Inactive

cursor Bot reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): add mothership tracing#4253

feat(observability): add mothership tracing#4253
Sg312 wants to merge 28 commits intostagingfrom
dev

Sg312 commented Apr 22, 2026

Uh oh!

vercel Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 22, 2026

Uh oh!

greptile-apps Bot Apr 22, 2026

Uh oh!

greptile-apps Bot Apr 22, 2026

Uh oh!

greptile-apps Bot Apr 22, 2026

Uh oh!

cursor Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		[TraceAttr.CopilotOutputFileOutcome]: CopilotOutputFileOutcome.Uploaded,
		})

Conversation

Sg312 commented Apr 22, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 22, 2026

Choose a reason for hiding this comment

Default sampling ratio increased from 10% to 100%

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Apr 22, 2026 •

edited

Loading

cursor Bot commented Apr 22, 2026 •

edited

Loading