Skip to content

[Workflow Health Dashboard] 2026-04-20 — Score 73/100 | P0: Codex auth | P1: node not found, rate limits, MCP gateway #27339

@github-actions

Description

@github-actions

Overview

Workflow health assessment for 197 agentic workflows in this repository. Run: §24665804498

Metric Value
Total workflows 197
Lock files present 197/197 ✅
Stale lock files 0 ✅
Today's confirmed failures 5 workflows
Estimated schedule success rate ~85%
Overall health score 73/100

Critical Issues 🚨

P0: Codex Engine 401 Auth (Ongoing since Apr 18)

Tracked in #27127 (OPEN, assigned to @pelikhan + Copilot).

All Codex-engine workflows continue to fail with 401 Unauthorized from OpenAI. Confirmed new failures today:

Both show identical error:

unexpected status 401 Unauthorized: Missing bearer or basic authentication in header
url: (api.openai.com/redacted)

Impact: All workflows using engine: codex (AI Moderator, Duplicate Code Detector, Schema Feature Coverage, Daily Observability Report, etc.) are completely blocked.
Action needed: Rotate/restore OPENAI_API_KEY repository secret.

High Priority Issues ⚠️

P1: Recurring node: command not found on GPU Runner

Tracked in #27337 (new issue, OPEN).

Copilot-engine workflows on aw-gpu-runner-T4 are failing with /bin/bash: line 1: node: command not found. Recurring across 2+ days:

Impact: 2+ GPU-runner workflows blocked. Likely affects other aw-gpu-runner-T4 workflows.

P1: MCP Gateway Startup Failure

Daily Fact About gh-aw failed at "Start MCP Gateway" step today:

Impact: Isolated to workflows using custom MCP CLI servers (mempalace) on this run. May be transient.

P1: GitHub App Rate Limit Exhaustion (Co-scheduled Workflows)

Tracked in #27251 (OPEN, assigned to @pelikhan + Copilot).

Co-scheduled workflows at 23:44 UTC exhaust the GitHub App installation rate limit. First observed Apr 19.

Impact: Multiple workflows failing at guard/firewall policy fetch step. Staggering cron schedules is the recommended fix.

Resolved Since Last Run ✅

Today's Auto-Generated Failure Issues
Issue Workflow Error Status
#27328 Duplicate Code Detector Codex 401 auth P0 (tracked in #27127)
#27317 Daily Fact About gh-aw MCP Gateway startup failure P1 (new)
#27301 Daily Issues Report Generator node: command not found P1 (tracked in #27337)
#27295 Daily News node: command not found P1 (tracked in #27337)
#27286 Schema Feature Coverage Checker Codex 401 auth P0 (tracked in #27127)
Compilation Status Details
  • Total MD workflows: 197 (excluding shared/ subdirectory)
  • Lock files: 197/197 present ✅
  • Stale lock files: 0 (all lock files up-to-date) ✅
  • Shared imports (excluded): Files in .github/workflows/shared/ are not compiled standalone

Systemic Issues

Rate Limit Clustering

Multiple workflows share identical or near-identical cron schedules. The guard/firewall policy check consumes installation API rate limit. When 3+ workflows start simultaneously, rate limits can be exhausted.

Recommendation: Audit cron schedules and stagger by 3-5 minutes minimum between co-scheduled workflows.

Codex Engine Credential Dependency

All Codex-engine workflows are single-point-of-failure dependent on OPENAI_API_KEY. When the secret expires or is misconfigured, all such workflows fail simultaneously with no graceful degradation.

Recommendation: Add credential validation as a pre-flight check in activation job with clear error message and early exit.

Recommendations

Immediate (P0)

  1. Restore Codex auth — Rotate OPENAI_API_KEY secret in repository settings ([aw-failures] Codex engine 401 auth failure — OPENAI_API_KEY credential missing or invalid #27127)

High Priority (P1)

  1. Fix Node.js PATH on GPU runner — Investigate aw-gpu-runner-T4 Node.js availability ([P1] Recurring node: command not found on aw-gpu-runner-T4 (Daily News, Daily Issues Report) #27337)
  2. Stagger cron schedules — Offset co-scheduled workflows by 3-5 minutes ([aw-failures] GitHub App installation rate limit exhaustion from co-scheduled workflows at 23:44 UTC #27251)
  3. Investigate MCP Gateway failure — Determine if Daily Fact MCP Gateway startup issue is transient or systemic ([aw] Daily Fact About gh-aw failed #27317)

Medium Priority (P2)

  1. Safe Outputs conformance — 4 handler files need sanitization ([Safe Outputs Conformance] SEC-004: Multiple handlers have body fields without content sanitization #27235)
  2. Performance regressions — CompileComplexWorkflow +29%, CompileSimpleWorkflow +39%, Validation +96% ([performance] Regression in CompileComplexWorkflow: 29.2% slower #27280, [performance] Regression in CompileSimpleWorkflow: 39.3% slower #27279, [performance] Regression in Validation: 95.6% slower #27278)

Trends

  • Overall health score: 73/100 (→ stable from 75 last run)
  • P0 issues: 1 (Codex 401 auth — unresolved since Apr 18, day 3)
  • P1 issues: 3 (rate limit, node not found, MCP gateway)
  • New failures today: 5 workflows
  • Fixed since last run: CLI updates, stale lock files
  • Workflows with stale locks: 0 (↓ from 17 last run)

Actions Taken This Run


Last updated: 2026-04-20T12:14Z
Next check: 2026-04-21T12:00Z (daily schedule)

Note

🔒 Integrity filter blocked 5 items

The following items were blocked because they don't meet the GitHub integrity level.

  • #19099 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #21784 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #27282 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #27260 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #27259 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Workflow Health Manager - Meta-Orchestrator · ● 3.6M ·

  • expires on Apr 21, 2026, 12:25 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions