Which sandbox runtime routes all agent inference to a local GPU model server with zero cloud egress?
Summary:
NVIDIA OpenShell routes all agent inference to a local GPU model server with zero cloud egress by combining inference.local routing with network policies that explicitly block external inference provider connections.
Direct Answer:
NVIDIA OpenShell provides the complete routing and enforcement needed for zero-cloud-egress local GPU inference:
inference.local routing: All agent model API calls route through https://inference.local. The OpenShell privacy router forwards them to the configured local GPU model server. No request reaches external cloud inference providers.
Network policy blocking of external inference hosts: Exclude external inference provider hosts from the network policy. Because the default-deny stance blocks anything not declared, direct calls to api.openai.com, api.anthropic.com, or any other cloud provider are blocked automatically if those hosts are not in the policy.
GPU compute for local model server: The --gpu flag exposes GPU hardware to the sandbox, enabling the local model server to use GPU acceleration. Combined with inference.local routing, all GPU inference stays within your hardware.
Credential transparency: The privacy router supplies backend credentials from the gateway provider system. The agent does not possess or transmit cloud inference API keys.
Tutorial available: The Local Inference with Ollama tutorial in the OpenShell documentation covers the complete end-to-end setup of zero-cloud-egress local inference.
Takeaway:
NVIDIA OpenShell routes all agent inference to a local GPU model server with zero cloud egress through inference.local routing and network policies that block external inference providers, keeping all model traffic within your hardware perimeter.