Skip to content

feat(vm): add OCI container container support to vm driver#889

Draft
drew wants to merge 2 commits intomainfrom
drew/containers-in-virtual-machines
Draft

feat(vm): add OCI container container support to vm driver#889
drew wants to merge 2 commits intomainfrom
drew/containers-in-virtual-machines

Conversation

@drew
Copy link
Copy Markdown
Collaborator

@drew drew commented Apr 20, 2026

Summary

Add host-side OCI container execution to the VM compute driver so sandboxes can boot from a user-specified template.image without introducing a Docker runtime into the guest. The driver pulls and flattens the image into a cached read-only squashfs; the guest mounts that RO base plus a per-sandbox writable disk as an overlay, pivot_roots into the merged view, and execs an unmodified openshell-sandbox with the OCI argv/env/workdir.

Related Issue

N/A — tracked via the internal architecture plan for VM-driver OCI container execution.

Changes

Host-side OCI pipeline (crates/openshell-driver-vm/src/oci/)

  • client.rs — anonymous oci-client pulls pinned to linux/amd64/linux/arm64; normalizes the image config.
  • flatten.rs — applies OCI layer tars in order with whiteout (.wh.*, .wh..wh..opq) handling; rejects absolute and parent-traversal paths.
  • compat.rs — injects sandbox:10001 into /etc/passwd and /etc/group, ensures /sandbox and /tmp, stubs /etc/hosts and /etc/resolv.conf if missing. Idempotent.
  • fs_image.rs — shells out to mksquashfs with an explicit binary path (no $PATH reliance), zstd by default.
  • cache.rs — content-addressed layout blobs/ + fs/<hex>.<plat>.squashfs + meta/<hex>.<plat>.json + tmp/ with atomic writes and idempotent install/lookup.
  • metadata.rsLaunchMetadata::build enforces OCI precedence (argv = Entrypoint + Cmd; workdir fallback /sandbox; env merge order OCI < template < spec). to_guest_env_vars() packs argv/env/workdir into OPENSHELL_OCI_* for delivery via libkrun set_exec.
  • pipeline.rs — orchestrates pull → flatten → compat → squashfs → install; short-circuits on cache hit after digest resolution.

VM boot and guest init

  • runtime.rs/main.rsVmLaunchConfig now supports attaching two disks (oci-base RO + sandbox-state RW) via krun_add_disk3; optional import vsock is kept but unused by the overlay path.
  • state_disk.rs — per-sandbox raw sparse state disk (16 GiB default), lifecycle-bound to the sandbox state dir.
  • scripts/openshell-vm-sandbox-init.sh — new oci_launch_supervisor path: resolves disks by libkrun-assigned serial via /sys/block/vd*/serial, mounts RO base + ext4 state, creates the overlay, bind-mounts the workspace over /sandbox, stages TLS CA and the supervisor binary into the upper layer, bind-mounts /proc,/sys,/dev, pivot_roots, translates OCI env → OPENSHELL_CONTAINER_ENV_<i>, sets OPENSHELL_CONTAINER_MODE=1, and execs openshell-sandbox --workdir <wd> -- <argv>.

Supervisor clean-env mode (crates/openshell-sandbox/src/container_env.rs)

  • Gated on OPENSHELL_CONTAINER_MODE=1. When active, the child process starts from env_clear() and receives only a documented allowlist (HOME/PATH/TERM defaults, OPENSHELL_CONTAINER_ENV_<i>, and OPENSHELL_SANDBOX=1 applied last so images cannot override the marker). Provider/proxy/TLS env continue to layer in via the existing spawn path.

Gateway wiring (crates/openshell-server/src/compute/vm.rs, cli.rs)

  • Extracted argv construction into a testable build_driver_argv helper.
  • Gateway now passes --default-image <sandbox_image> on every VM-driver spawn so GetCapabilities.default_image cannot silently diverge from gateway config.
  • New VmComputeConfig::mksquashfs_bin + --vm-mksquashfs-bin / OPENSHELL_VM_MKSQUASHFS flag plumbs the squashfs builder path to the driver.

Driver behavior

  • validate_vm_sandbox rejects malformed template.image refs and unsupported template fields.
  • resolve_oci_launch returns FailedPrecondition when the host arch isn't linux/{amd64,arm64} or mksquashfs_bin is unset.
  • build_guest_environment skips the legacy OPENSHELL_SANDBOX_COMMAND=tail -f /dev/null fallback for OCI sandboxes so argv boundaries can't be corrupted by a fall-through code path.

Docs

  • New architecture/vm-driver.md covers the OCI execution model, module responsibilities, storage layout, driver configuration, and v1 scope.

Testing

  • cargo test -p openshell-driver-vm --lib — 80/80 pass (flatten, compat, fs_image, cache, metadata, pipeline, state_disk, driver, including 3 new resolve_oci_launch tests and new OCI-mode guest env tests).
  • cargo test -p openshell-server --lib compute::vm — 8/8 pass (6 new argv-wiring tests + 2 existing TLS tests).
  • cargo fmt --check clean on touched crates.
  • cargo clippy on touched crates — no errors (pre-existing warnings in unrelated openshell-ocsf crate are out of scope).
  • mise run license:check — all 398 files have SPDX headers.
  • bash -n on the updated guest init script.
  • Integration test oci_pipeline_integration::full_pipeline_without_network_produces_cached_image (gated on mksquashfs being in \$PATH, run with --ignored) verifies flatten → compat → squashfs → cache install → round-trip.

E2E against a live cluster with a public image (alpine, busybox) was not run as part of this PR; the plan's end-to-end acceptance is scheduled for a follow-up once gateway+driver integration lands on main.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (architecture/vm-driver.md)

Add host-side OCI pipeline to the VM compute driver so sandboxes can
boot from a user-specified `template.image` without shipping Docker
inside the guest. The driver pulls and flattens the image, injects
OpenShell compatibility files (sandbox user, /sandbox, /tmp, stub
/etc/{hosts,resolv.conf}), and builds a read-only squashfs cached per
`(manifest digest, platform)`. Sandbox-create attaches that RO base
plus a per-sandbox raw state disk; the guest init mounts both as an
overlay, bind-mounts the workspace over `/sandbox`, pivot_roots into
the merged view, then execs an unmodified `openshell-sandbox` with
the OCI argv/env/workdir.

Supervisor gains a container-mode clean-env baseline gated on
`OPENSHELL_CONTAINER_MODE=1`: the child process starts with an empty
environ, then receives only the documented allowlist (container env
from the OCI merge, provider env, proxy env, TLS env, minimal shell
defaults), so control-plane `OPENSHELL_*` vars never leak to workloads.

The gateway plumbs `--default-image` (from `sandbox_image`) and
`--mksquashfs-bin` into the VM-driver subprocess so
`GetCapabilities.default_image` stays in sync and OCI sandboxes work
without relying on env inheritance. Guest init resolves block devices
by libkrun-assigned serial under `/sys/block/vd*/serial` instead of
hardcoded `/dev/vda`/`/dev/vdb`, with the older behavior kept as a
fallback for guest kernels that don't expose serials.

Scope and limits (v1):

- Public OCI registries only, linux/amd64 or linux/arm64 matching the
  host. The OCI `User` field is ignored; workloads always run as
  `sandbox:sandbox`.
- The shared RO base cache is not GC'd automatically; operators manage
  `<state-dir>/oci-cache/` themselves.
- The fixed guest VM rootfs stays as the control-plane image; we never
  boot the user's OCI image as the guest OS.

Unit and integration tests cover: layer flattening with whiteouts,
compat injection idempotence, squashfs build + cache round-trip,
OCI-config precedence rules (Entrypoint+Cmd, workdir fallback, env
merge), driver argv wiring for `--default-image` and
`--mksquashfs-bin`, and `resolve_oci_launch` preflight error paths
(unsupported host, missing mksquashfs, no image requested).
@drew drew requested a review from a team as a code owner April 20, 2026 04:48
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Replace the ASCII-art overview with a mermaid flowchart that renders in
GitHub's UI, and add two supporting diagrams:

- Host pipeline flow: cache hit vs miss (pull \u2192 flatten \u2192 compat \u2192
  squashfs \u2192 install \u2192 attach).
- Guest init decision tree: probe `OPENSHELL_OCI_ARGC`, resolve disks
  by serial, build overlay, pivot_root, exec supervisor.
- Storage layering: shared RO base, per-sandbox ext4 upper/work, and
  workspace bind-mount composing the sandbox runtime view.

The numbered `oci_launch_supervisor` step list is retained alongside
the flowchart because the precise ordering (e.g. bind-mount /proc /sys
/dev before pivot_root) matters for anyone editing the init script.
@drew drew marked this pull request as draft April 20, 2026 04:54
@drew drew changed the title feat(vm): add OCI container execution via overlay and pivot_root feat(vm): add OCI container container support to vm driver Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant