NVIDIA announces RTX Spark PCs for local AI agents

NVIDIA used GTC Taipei at COMPUTEX to announce RTX Spark, a new class of Windows PCs built for local AI agents. The company says RTX Spark systems will offer 1 petaflop of AI performance and up to 128GB of unified memory, with slim laptops and compact desktops expected this fall from ASUS, Dell, HP, Lenovo, Microsoft Surface, MSI, and others.

The point is not just another fast laptop. NVIDIA and Microsoft are trying to make the primary PC a place where agents can run locally with security controls, local model routing, and enough memory for larger workflows. That is the part to watch: if personal agents need to touch local files, apps, identity, and private context, the cloud-only pattern starts to hit limits.

1 petaflop Advertised AI performance NVIDIA

128GB Maximum unified memory NVIDIA

120B LLM parameter class NVIDIA says can run locally NVIDIA

Local agents need more than a GPU

NVIDIA says RTX Spark uses a Blackwell RTX GPU with 6,144 CUDA cores and fifth-generation Tensor Cores with FP4 precision, connected to a 20-core NVIDIA Grace CPU through NVLink-C2C. Those specs explain the performance claim. They do not explain the product strategy by themselves.

The strategy is security and control. NVIDIA says it is working with Microsoft on Windows security primitives for identity, containment, policy, and end-to-end security. NVIDIA OpenShell is meant to add policy controls, route queries to local models based on privacy settings, and disguise personal information when a request has to go to a cloud model.

That is the right problem to solve. A useful local agent needs permission to act across apps, files, and workflows. Without containment and policy, that becomes a security risk. Without enough local compute, it becomes a cloud proxy. RTX Spark is NVIDIA’s attempt to sell both the silicon and the agent runtime story together.

The workloads are concrete

NVIDIA says RTX Spark systems can render 90GB-plus 3D scenes, edit 12K 4:2:2 video, generate 4K AI videos, run 120-billion-parameter LLMs with up to 1 million tokens of context using agents locally, and play AAA games at 1440p over 100 frames per second. Those are broad claims, and each workload will depend on the model, app, and thermal design of the actual device.

The more useful read is that NVIDIA is pushing unified memory as the practical limiter. Local AI is not only about raw TOPS or benchmark charts. Large context windows, local image/video generation, and multi-app agents all want memory headroom. A 128GB unified-memory PC gives software teams a different target than a standard laptop GPU with a small VRAM ceiling.

NVIDIA’s blog also says OpenShell is coming to Windows, NemoClaw is expanding across GeForce RTX, RTX PRO, RTX, DGX Spark, and DGX Station, and llama.cpp and vLLM are getting multi-token prediction and multi-GPU optimizations for up to 2x inference performance on top agentic models.

Availability is still the caveat

The announcement is not a shipping review. RTX Spark laptops and compact desktops are expected this fall, and final user experience will depend on OEM designs, prices, thermals, driver stability, app support, and what Microsoft exposes to developers in Windows.

The other caveat is agent usefulness. Running locally is valuable when the agent needs private context, low latency, or offline work. For routine chat and lightweight automation, cloud models may still be cheaper and simpler. RTX Spark will matter most when local capability changes what an agent is allowed to do, not just where the tokens are generated.

For NVIDIA, that is still a large market. If agents become a normal PC workload, NVIDIA gets to sell AI hardware into the device refresh cycle, not only into data centers.

For broader context on NVIDIA’s AI infrastructure position, see our NVIDIA company profile and the AI model leaderboard.

NVIDIA announces RTX Spark PCs for local AI agents

Local agents need more than a GPU

The workloads are concrete

Availability is still the caveat

Sources