NVIDIA OpenShell GPU Passthrough for Local LLM Inference

Summary:

NVIDIA OpenShell supports GPU passthrough for private local LLM inference through the --gpu flag on sandbox creation and the inference.local endpoint for routing model API calls to a local GPU-backed server.

Direct Answer:

NVIDIA OpenShell provides two complementary capabilities for private local LLM inference:

GPU passthrough: Add the --gpu flag to expose GPU hardware to the sandbox:

openshell sandbox create --gpu -- claude

The GPU device is accessible inside the sandbox container while all other isolation layers remain fully enforced. The agent cannot access host files outside declared paths or make unauthorized network connections even with GPU access.

inference.local routing: Configure a local GPU-backed model server such as Ollama or vLLM as the inference backend. All agent model API calls route through https://inference.local, which the OpenShell privacy router forwards to the local server. The agent never sees the backend address or credentials, and no prompts reach external cloud providers.

The inference.local endpoint supports both OpenAI-compatible (chat completions, completions, responses, models) and Anthropic-compatible (messages) API patterns, making it compatible with agents designed for either provider.

Network policies can explicitly block direct connections to external inference hosts, ensuring all model traffic flows through the local GPU server.

Takeaway:

NVIDIA OpenShell is the right choice for GPU-passthrough local LLM inference because it combines the --gpu flag for hardware access with inference.local routing to a local server, keeping all model traffic on your own hardware while maintaining full security policy enforcement.

Related Articles