This guide provides a step-by-step path to migrate workflows from GitHub Copilot to Anthropic Claude.
- Prerequisites
- Configuration Changes
- Model Selection & Mapping
- Behavioral Differences
- Testing Strategy
- Common Pitfalls
- Rollback Procedures
- Anthropic API key: Get one from console.anthropic.com
- SDK installed:
uv add 'anthropic>=0.77.0,<1.0.0' - Backup workflows: Save copies of working Copilot workflows
- Test environment: Non-production workspace for testing
- Access to your workflow YAML files
- Understanding of your workflow behavior/output expectations
- Time to test and validate (plan 30-60 minutes per workflow)
Change the provider field from copilot to claude:
# Before
workflow:
runtime:
provider: copilot
# After
workflow:
runtime:
provider: claudeClaude requires an API key (Copilot uses GitHub auth):
export ANTHROPIC_API_KEY=sk-ant-...Add to your shell profile for persistence:
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.zshrcUpdate default_model and per-agent model fields:
# Before (Copilot)
workflow:
runtime:
default_model: gpt-5.2
agents:
- name: analyzer
model: gpt-5.2-turbo
# After (Claude)
workflow:
runtime:
default_model: claude-sonnet-4.5
agents:
- name: analyzer
model: claude-sonnet-4.5 # See model mapping table belowClaude has different configuration parameters:
# Before (Copilot)
workflow:
runtime:
provider: copilot
default_model: gpt-5.2
temperature: 0.7
max_tokens: 4096
# After (Claude)
workflow:
runtime:
provider: claude
default_model: claude-sonnet-4.5
temperature: 0.7 # Keep this (Claude also uses 0.0-1.0)
max_tokens: 4096 # Controls output length (Claude-specific meaning)Key changes:
max_tokensnow controls output length (different from Copilot's context trimming)
MCP Servers (tools) are not supported in Claude Phase 1:
# Before (Copilot)
workflow:
runtime:
mcp_servers:
web-search:
command: npx
args: ["-y", "open-websearch@latest"]
tools: ["*"]
# After (Claude) - Remove this section
workflow:
runtime:
# mcp_servers not supported in Phase 1Agent tools must also be removed:
# Before (Copilot)
agents:
- name: researcher
tools: [web_search, code_exec]
# After (Claude)
agents:
- name: researcher
# Remove tools fieldBefore (Copilot):
workflow:
name: research-workflow
runtime:
provider: copilot
default_model: gpt-5.2
temperature: 0.7
max_tokens: 4096
mcp_servers:
web-search:
command: npx
args: ["-y", "open-websearch@latest"]
tools: ["*"]
agents:
- name: researcher
model: gpt-5.2-turbo
tools: [web_search]
prompt: "Research {{ topic }}"After (Claude):
workflow:
name: research-workflow
runtime:
provider: claude
default_model: claude-sonnet-4.5
temperature: 0.7
max_tokens: 4096
# Remove mcp_servers
agents:
- name: researcher
model: claude-sonnet-4.5
# Remove tools
prompt: "Research {{ topic }}"Map your Copilot models to Claude equivalents based on use case:
| Copilot Model | Claude Equivalent | Reasoning | Cost Impact |
|---|---|---|---|
gpt-5.2 |
claude-sonnet-4.5 |
Balanced performance, most workflows | Similar |
gpt-5.2-turbo |
claude-sonnet-4.5 |
General purpose, large context | Similar |
gpt-5.2-mini |
claude-sonnet-4.5 |
Standard model, widely used | Cheaper (Claude) |
gpt-3.5-turbo |
claude-haiku-4.5 |
Fast, cheap, simple tasks | Cheaper (Claude) |
o1-preview |
claude-opus-4.5 |
Advanced reasoning, complex tasks | More expensive |
For most workflows: Use claude-sonnet-4.5
- Direct replacement for GPT-5.2
- Excellent performance/cost balance
- 200K context (vs GPT-5.2 Turbo's 128K)
For simple, high-volume tasks: Use claude-haiku-4.5
- Replacement for GPT-3.5 Turbo
- 3-5x faster, 3x cheaper
- Classification, routing, simple Q&A
For complex reasoning: Use claude-opus-4.5
- Replacement for o1-preview
- Superior multi-step reasoning
- Worth the cost for critical workflows
| Copilot Model | Context | Claude Model | Context | Advantage |
|---|---|---|---|---|
| GPT-4 | 8K | Haiku/Sonnet/Opus | 200K | Claude (+192K) |
| GPT-4 Turbo | 128K | Haiku/Sonnet/Opus | 200K | Claude (+72K) |
| GPT-4o | 128K | Haiku/Sonnet/Opus | 200K | Claude (+72K) |
Benefit: Claude provides more context across all model tiers.
Copilot (OpenAI): 0.0 - 2.0 Claude: 0.0 - 1.0 (enforced by SDK)
Migration:
# If you used temperature > 1.0
# Before (Copilot)
runtime:
temperature: 1.5
# After (Claude) - Clamp to 1.0
runtime:
temperature: 1.0 # Maximum allowedIMPORTANT: The max_tokens field in RuntimeConfig has DIFFERENT meanings for Claude vs other providers:
- Copilot/OpenAI: Context window trimming (optional, handled by workflow engine)
- Claude: Maximum OUTPUT tokens per response (required by Claude API)
Migration:
# Before (Copilot) - max_tokens for context trimming
runtime:
provider: copilot
max_tokens: 4096 # Optional: trim context to fit window
# After (Claude) - max_tokens for output generation
runtime:
provider: claude
max_tokens: 8192 # Required: max response lengthRecommendation:
- Always specify
max_tokensfor Claude (default: 8192) - Understand it controls OUTPUT length, not context window (Claude has 200K context)
- Use lower values (1024-2048) for concise responses, higher (4096-8192) for detailed output
Claude tends to be more verbose and explanatory than GPT-4:
- More detailed reasoning
- More explicit step-by-step thinking
- Longer responses for the same prompt
Mitigation:
- Reduce
max_tokensto enforce conciseness - Update prompts: "Answer concisely" or "Be brief"
- Use Haiku for simple tasks (naturally more concise)
Example:
agents:
- name: analyzer
prompt: |
Answer the following question CONCISELY (2-3 sentences max):
{{ question }}
workflow:
runtime:
max_tokens: 512 # Enforce brevityClaude is more sensitive to system prompts than GPT-4:
- Follows system instructions more strictly
- May refuse or question problematic requests more often
- Better at maintaining persona/role
Best practice: Use clear, well-defined system prompts:
agents:
- name: analyst
system_prompt: |
You are a financial analyst. Provide objective, data-driven analysis.
Do not make investment recommendations.Copilot: Real-time streaming supported Claude: Phase 1 does NOT support streaming
Impact:
- No partial responses during execution
- Longer wait for first output
- Cannot cancel mid-generation
Workarounds:
- Reduce
max_tokensfor faster responses (less to generate) - Use Haiku models (3-5x faster)
- Break workflows into smaller agents
Copilot: Full MCP tool support Claude: Phase 1 does NOT support tools/MCP
Impact:
- No web search, code execution, file operations
- Cannot use external APIs via tools
- Agents are isolated (no external data)
Workarounds:
- Pre-fetch data and pass as workflow input
- Split tool-dependent workflows into separate steps
- Wait for Phase 2 (tools support planned)
Verify workflows run without errors:
# 1. Validate YAML syntax
conductor validate workflow.yaml
# 2. Dry-run to check execution plan
conductor run workflow.yaml --dry-run --provider claude
# 3. Test with minimal input
conductor run workflow.yaml --provider claude --input test="Hello"Compare outputs side-by-side:
# 1. Run with Copilot (baseline)
conductor run workflow.yaml --provider copilot --input question="What is Python?" > copilot-output.json
# 2. Run with Claude (comparison)
conductor run workflow.yaml --provider claude --input question="What is Python?" > claude-output.json
# 3. Compare outputs
diff copilot-output.json claude-output.jsonWhat to check:
- ✅ Both outputs contain required fields
- ✅ Outputs are semantically equivalent (content may differ)
- ✅ Claude output meets quality expectations
⚠️ Claude may be more verbose (expected)
Define acceptance criteria and validate:
# acceptance-criteria.yaml
test_cases:
- input:
question: "What is Python?"
expected_output:
answer: # Contains "programming language"
confidence: # One of: high, medium, low
- input:
question: "Explain quantum computing"
expected_output:
answer: # Contains "quantum mechanics" or "qubits"
confidence: # high or mediumValidation script (pseudocode):
for test_case in test_cases:
result = run_workflow(test_case.input, provider="claude")
assert all(check(result, expected) for expected in test_case.expected_output)Ensure existing tests still pass:
# If you have existing tests
pytest tests/test_workflows.py --provider claude
# Or manual regression checklist:
# - Test all documented workflows
# - Test edge cases (empty input, max tokens, etc.)
# - Test error handling (invalid input, API failures)Compare latency and throughput:
# Copilot baseline
time conductor run workflow.yaml --provider copilot --input question="Test"
# Claude comparison
time conductor run workflow.yaml --provider claude --input question="Test"Metrics to track:
- Response time (end-to-end)
- Token usage (input/output)
- Cost per request
- Error rate
Error: AuthenticationError: Invalid API key
Solution:
export ANTHROPIC_API_KEY=sk-ant-...
# Verify it's set
echo $ANTHROPIC_API_KEYError: NotFoundError: model 'gpt-5.2' not found
Solution: Update all model references:
# Bad
model: gpt-5.2
# Good
model: claude-sonnet-4.5Error: ValidationError: temperature must be between 0.0 and 1.0
Solution: Clamp to 1.0:
# Bad
temperature: 1.5
# Good
temperature: 1.0Error: BadRequestError: max_tokens is required
Solution: Always specify:
runtime:
max_tokens: 8192Error: Workflow doesn't error but produces wrong results (no tool calls)
Solution: Remove tools from Phase 1 workflows:
# Remove mcp_servers and agent tools fieldsBehavior: Long wait with no partial output
Solution:
- Accept non-streaming in Phase 1
- Reduce
max_tokensfor faster responses - Use Haiku models
Issue: Higher costs than expected with subscription
Solution: Monitor token usage and optimize:
- Use Haiku for simple tasks
- Reduce
max_tokensto limit response length - Use
context: mode: explicitto reduce input tokens
Option 1: Quick Rollback
Revert YAML changes and switch back:
# Restore original workflow
cp workflow.yaml.backup workflow.yaml
# Run with Copilot
conductor run workflow.yaml --provider copilotOption 2: Keep Both Versions
Maintain separate workflow files:
workflow-copilot.yaml # Original
workflow-claude.yaml # Migrated
# Use as needed
conductor run workflow-copilot.yaml --provider copilot
conductor run workflow-claude.yaml --provider claudeOption 3: Gradual Migration
Migrate one agent at a time:
agents:
# Keep working Copilot agents
- name: agent1
model: gpt-5.2
# Test Claude on one agent
- name: agent2
model: claude-sonnet-4.5Track these metrics for 1-2 weeks:
- Error rate: Should stay similar or improve
- Output quality: Validate with spot-checks
- Cost: Monitor token usage and costs
- Latency: Track response times
Consider rolling back if:
- ❌ Error rate increases >20%
- ❌ Output quality degrades significantly
- ❌ Costs exceed budget by >50%
- ❌ Latency increases >2x
- ❌ Critical workflows break
Use this checklist for each workflow:
- Change
provider: copilot→provider: claude - Set
ANTHROPIC_API_KEYenvironment variable - Map model names (GPT → Claude)
- Understand
max_tokensmeaning change (context → output length) - Remove
mcp_serverssection - Remove agent
toolsfields
- Validate YAML syntax
- Run dry-run mode
- Test with sample input
- Compare output with Copilot baseline
- Run acceptance tests
- Check performance (latency, tokens, cost)
- Update workflow documentation
- Document any prompt changes
- Note behavioral differences observed
- Record cost comparison
- Test in staging/dev environment
- Monitor error rate
- Monitor output quality
- Monitor costs
- Have rollback plan ready
- Monitor for 1-2 weeks
- Collect user feedback
- Optimize prompts/configuration
- Document lessons learned
Migrating from Copilot to Claude is straightforward:
- Configuration: Change provider, set API key, map models
- Limitations: Remove tools (Phase 1), accept non-streaming
- Testing: Validate, compare outputs, acceptance test
- Monitoring: Track errors, quality, cost, latency
- Rollback: Keep backups, have rollback plan
Time estimate: 30-60 minutes per workflow
Risk level: Low (easy rollback, config-only changes)
Recommended approach: Gradual migration, one workflow at a time, with monitoring