NVIDIA's official COMPUTEX article card for RTX Spark and local AI agents
NVIDIA's official COMPUTEX article card for RTX Spark and local AI agents
+ NVIDIA AI News

NVIDIA announces RTX Spark PCs for local AI agents

RTX Spark puts 1 petaflop of AI performance and up to 128GB of unified memory into Windows PCs designed for local agents.

in 19 minutes

NVIDIA used GTC Taipei at COMPUTEX to announce RTX Spark, a new class of Windows PCs built for local AI agents. The company says RTX Spark systems will offer 1 petaflop of AI performance and up to 128GB of unified memory, with slim laptops and compact desktops expected this fall from ASUS, Dell, HP, Lenovo, Microsoft Surface, MSI, and others.

The point is not just another fast laptop. NVIDIA and Microsoft are trying to make the primary PC a place where agents can run locally with security controls, local model routing, and enough memory for larger workflows. That is the part to watch: if personal agents need to touch local files, apps, identity, and private context, the cloud-only pattern starts to hit limits.

1 petaflop Advertised AI performance NVIDIA
128GB Maximum unified memory NVIDIA
120B LLM parameter class NVIDIA says can run locally NVIDIA

Local agents need more than a GPU

NVIDIA says RTX Spark uses a Blackwell RTX GPU with 6,144 CUDA cores and fifth-generation Tensor Cores with FP4 precision, connected to a 20-core NVIDIA Grace CPU through NVLink-C2C. Those specs explain the performance claim. They do not explain the product strategy by themselves.

The strategy is security and control. NVIDIA says it is working with Microsoft on Windows security primitives for identity, containment, policy, and end-to-end security. NVIDIA OpenShell is meant to add policy controls, route queries to local models based on privacy settings, and disguise personal information when a request has to go to a cloud model.

That is the right problem to solve. A useful local agent needs permission to act across apps, files, and workflows. Without containment and policy, that becomes a security risk. Without enough local compute, it becomes a cloud proxy. RTX Spark is NVIDIA’s attempt to sell both the silicon and the agent runtime story together.

The workloads are concrete

NVIDIA says RTX Spark systems can render 90GB-plus 3D scenes, edit 12K 4:2:2 video, generate 4K AI videos, run 120-billion-parameter LLMs with up to 1 million tokens of context using agents locally, and play AAA games at 1440p over 100 frames per second. Those are broad claims, and each workload will depend on the model, app, and thermal design of the actual device.

The more useful read is that NVIDIA is pushing unified memory as the practical limiter. Local AI is not only about raw TOPS or benchmark charts. Large context windows, local image/video generation, and multi-app agents all want memory headroom. A 128GB unified-memory PC gives software teams a different target than a standard laptop GPU with a small VRAM ceiling.

NVIDIA’s blog also says OpenShell is coming to Windows, NemoClaw is expanding across GeForce RTX, RTX PRO, RTX, DGX Spark, and DGX Station, and llama.cpp and vLLM are getting multi-token prediction and multi-GPU optimizations for up to 2x inference performance on top agentic models.

Availability is still the caveat

The announcement is not a shipping review. RTX Spark laptops and compact desktops are expected this fall, and final user experience will depend on OEM designs, prices, thermals, driver stability, app support, and what Microsoft exposes to developers in Windows.

The other caveat is agent usefulness. Running locally is valuable when the agent needs private context, low latency, or offline work. For routine chat and lightweight automation, cloud models may still be cheaper and simpler. RTX Spark will matter most when local capability changes what an agent is allowed to do, not just where the tokens are generated.

For NVIDIA, that is still a large market. If agents become a normal PC workload, NVIDIA gets to sell AI hardware into the device refresh cycle, not only into data centers.

For broader context on NVIDIA’s AI infrastructure position, see our NVIDIA company profile and the AI model leaderboard.

Sources

The AI Feed Desk

The AI Feed Desk

Editorial desk

The AI Feed Desk tracks AI provider updates, model releases, agent tooling, and enterprise adoption, turning fast-moving announcements into source-linked context for builders and operators.

Noticed a typo, incorrect information, or translation error?

Tell us so we can fix it.

Help Improve This Article

Related Articles

Gemini 3.5 Flash beats last year's Pro on the work builders ship

Google's Gemini 3.5 Flash beats last year's 3.1 Pro on coding and agentic benchmarks at ~40% lower cost — with reasoning and 1M-context limits worth testing.

The AI Feed Desk

By The AI Feed Desk

OpenAI puts o3 and GPT-4.5 on a ChatGPT sunset clock

OpenAI will retire GPT-4.5 from ChatGPT on June 27 and OpenAI o3 on August 26, with no API change. Teams should audit model-specific workflows now.

The AI Feed Desk

By The AI Feed Desk

Anthropic releases Claude Opus 4.8 with a reliability gain for agentic coding

Claude Opus 4.8 ships with one substantive improvement: roughly four times fewer self-introduced code flaws pass unflagged versus its predecessor. Pricing holds at 4.7 levels.

The AI Feed Desk

By The AI Feed Desk

about 3 hours ago

Anthropic raises $65B at a $965B valuation

Anthropic's Series H pairs a $65B raise with $47B run-rate revenue and gigawatt-scale compute agreements. The money is for capacity, not just research.

The AI Feed Desk

By The AI Feed Desk

11 minutes ago

Google rolls out Gemini Omni Flash for video generation

Gemini Omni Flash turns mixed inputs into video and is rolling into Gemini, Flow, YouTube Shorts, and YouTube Create before the API arrives.

The AI Feed Desk

By The AI Feed Desk

in 4 minutes