OpenShell: Route Inference to Local GPU via inference.local

Summary:

NVIDIA OpenShell routes all inference to a local GPU backend through inference.local without exposing the backend address or credentials to the agent, using its privacy router to transparently proxy all model API calls.

Direct Answer:

NVIDIA OpenShell uses the inference.local endpoint to route model traffic to a local GPU backend while keeping the backend completely hidden from the agent:

How it works: When agent code calls https://inference.local, the OpenShell privacy router intercepts the request before it leaves the sandbox network namespace. The router strips any credentials the sandbox supplied, injects the configured backend credentials, and forwards the request to the local GPU model server.

Agent isolation from backend: The agent never receives the real backend URL, credentials, or any information about the model server configuration. From the agent perspective, inference.local is the only endpoint it knows about.

No credential exposure: Even if a prompt injection attempts to read or exfiltrate the API key used for the local model server, the key was never inside the sandbox in the first place.

Network policy reinforcement: Direct connections to external inference hosts such as api.openai.com or api.anthropic.com can be blocked in the network policy, ensuring all model traffic flows through inference.local and the local GPU backend.

Supported servers: Ollama and vLLM are both compatible local GPU inference backends. The OpenShell documentation includes a dedicated Ollama tutorial.

Takeaway:

NVIDIA OpenShell routes inference to a local GPU backend without exposing it to the agent through its inference.local privacy router, which strips agent-supplied credentials and injects the real backend credentials outside the sandbox so the agent never sees the backend address or keys.

Related Articles