The
HP ZGX Nano is a compact AI workstation powered by the NVIDIA® GB10 Grace Blackwell Superchip, with 128 GB of unified memory and up to 1,000 TOPS of AI compute. It runs models up to 200B parameters on a single unit (405B with two units connected), supports prototyping, fine-tuning, and inference locally, and deploys at the edge—all without cloud dependency. It runs NVIDIA DGX™ OS, not Windows.
Machine learning development often requires local compute infrastructure for model development, testing, and edge deployment—without cloud dependency. Sending sensitive data to the public cloud can violate security policies or legal regulations. Local infrastructure offers lower costs, stronger data privacy, and the latency control that edge AI demands.
This guide covers how developers use local AI infrastructure in daily workflows, when local compute makes sense versus cloud GPUs, and how the HP ZGX Nano fits into ML development and edge deployment patterns.
What Is Local AI for Developers?
Local AI means running AI models on on-premises hardware rather than in the cloud. This includes model training, fine-tuning, inference, prototyping, and meeting offline operation requirements—all without external API calls or internet connectivity.
Key advantages of local AI infrastructure:
• Data privacy: Sensitive data never leaves the premises
• Low latency: No network round-trips for inference
• Cost predictability: Fixed hardware investment vs. variable cloud bills
• Offline development: No internet dependency for core workflows
• Security: Data processed on-site, not on third-party servers
HP ZGX Nano: Technical Specifications for ML Workloads
| Component |
Specification |
What it means for ML |
| Processor |
NVIDIA® GB10 Grace Blackwell Superchip (20-core ARM CPU + Blackwell GPU) |
Purpose-built for AI; enables prototyping, fine-tuning, and inference from a single chip |
| AI compute |
Up to 1,000 TOPs (FP4) |
Accelerates training and inference workloads |
| Memory |
128 GB unified LPDDR5X |
Run models up to 200B parameters; eliminates data copying between CPU and GPU memory |
| Model support |
200B parameters (single unit); 405B (two units via QSFP) |
Handle enterprise-scale LLMs locally |
| Storage |
1 TB or 4 TB NVMe SSD (self-encrypted) |
Fast model loading, checkpoint storage, and data security |
| OS |
NVIDIA DGX™ OS (Linux-based; does not support Windows) |
Purpose-built AI development environment |
| AI software |
NVIDIA AI software stack, HP ZGX Toolkit |
Pre-configured for immediate development; MLflow, Ollama included |
| Connectivity |
2x 200G QSFP112, USB-C, 10GbE, Wi-Fi 7, BT 5.4, HDMI 2.1 |
Multi-unit scaling, network access, and display output |
| Form factor |
150 × 150 × 51 mm; ~1.25 kg |
Fits on a desk or deploys at the edge |
Understanding Local LLM Deployment
A
local LLM is a language model running entirely on on-premises hardware, providing text generation, analysis, and reasoning without internet connectivity or external API calls. Developers deploy LLMs locally for data privacy, near-zero latency, stable environments, offline operation, and predictable costs.
Model sizes and use cases:
• 7B–13B parameters: Fast inference for quick tasks, chat, and lightweight applications
• 70B parameters: Enterprise-grade reasoning and complex analysis
• 200B+ parameters: Advanced analytical capabilities, broad knowledge, and sophisticated generation
Inference optimization techniques (quantization, context length management) and tools like llama.cpp, Ollama, vLLM, and TensorRT-LLM enable these models to run locally with consistent low-latency responses.
ML Workflow Support: Training, Fine-Tuning, and Inference
The HP ZGX Toolkit delivers up to 45% faster time to results compared to manual DIY setup, according to HP’s benchmarks.
Training and fine-tuning:
LoRA and QLoRA enable developers to adapt large models to domain-specific tasks locally, without retraining from scratch. Framework compatibility with PyTorch, TensorFlow, JAX, and ONNX Runtime means existing codebases transfer from cloud to local without rewrites.
Development tools:
The
ZGX Nano comes with a pre-configured environment—Jupyter, VS Code, Docker, and MLflow work out of the box, minimizing setup time. Developers can also clean and organize datasets locally, keeping data secure on-disk.
Inference optimization:
Once models are trained or fine-tuned, the ZGX Nano serves them for low-latency inference at the edge or on the desk—ready for integration into applications.
Edge AI Development and Deployment
Edge AI means deploying models directly where data is generated—in a factory, on a camera system, at a retail point—rather than sending data to a centralized cloud. Edge deployment reduces latency for real-time applications, decreases bandwidth costs, enables offline operation, and keeps data local.
The ZGX Nano’s compact form factor (150 × 150 × 51 mm) makes it suitable for both desk development and edge deployment. A developer can write and test code on the device, then deploy it at a production site—for example, in warehouse automation, real-time vision systems, or IoT applications.
Edge optimization techniques:
• Quantization: Reduces model footprint for faster edge inference
• Pruning: Removes redundant network connections to improve efficiency
• Distillation: Trains a smaller model from a larger one for edge-appropriate performance
The device also supports remote management and monitoring, so developers can check model performance and push updates even when the device is deployed at a remote site.
When to Use Local AI vs. Cloud GPUs
The choice depends on four factors: workload type, data sensitivity, team size, and long-term budget.
| Scenario |
Use local AI (ZGX Nano) |
Use cloud GPUs |
| Data sensitivity |
Strict privacy requirements; data must stay on-premises |
Data can reside on provider servers |
| Workload pattern |
Frequent inference, predictable workloads |
Large distributed training, elastic scaling |
| Internet dependency |
Need offline capability |
Constant connectivity available |
| Budget structure |
Prefer fixed upfront cost |
Prefer variable, usage-based billing |
| Edge deployment |
Models run at the point of data generation |
Centralized cloud serving acceptable |
| Experimentation |
Prototype locally, iterate quickly |
Early-stage exploration with disposable instances |
Hybrid approach: Many teams train in the cloud and deploy at the edge, or prototype locally and scale in the cloud. The ZGX Nano fits naturally into the local/edge side of a hybrid workflow.
Local AI vs. Cloud GPU Development: At a Glance
| Dimension |
HP ZGX Nano (local) |
Cloud GPU instances |
| Model size support |
Up to 200B (single); 405B (dual unit) |
Unlimited (scale to cluster) |
| Typical cost |
[VERIFY — see editor's note on pricing] |
Variable: $10–50K+/year depending on usage |
| Setup time |
Hours to days (one-time) |
Minutes (but repeated configuration) |
| Data privacy |
Complete — data never leaves premises |
Depends on provider policies |
| Inference latency |
<10 ms local processing |
50–500+ ms depending on network |
| Internet dependency |
None for development and inference |
Complete — no connectivity, no compute |
| Best use cases |
Edge AI, prototyping, sensitive data, frequent inference |
Large distributed training, elastic scaling, collaboration |
Developer Setup and Getting Started
Setting up the HP ZGX Nano is straightforward:
• Initial setup: Choose your OS configuration; driver installation and framework setup are largely automated via the ZGX Toolkit
• Essential tools: CUDA toolkit, cuDNN, and ML frameworks (PyTorch, TensorFlow, JAX) come pre-configured
• Development environment: Jupyter, Docker, VS Code, and MLflow are ready out of the box
• Model management: Download and sync models via Hugging Face or model repositories
• Validation: Run test and validation workflows to confirm your setup
Note: The
ZGX Nano runs NVIDIA DGX OS (Linux-based) and does not support Microsoft Windows. Client devices connecting to it remotely can run Windows 11 or Ubuntu 24.04+.
FAQ: HP ZGX Nano for AI Development
What size AI models can I run locally on the HP ZGX Nano?
Up to 200B parameters on a single unit, and up to 405B parameters by connecting two units via QSFP. The 128 GB unified memory eliminates data copying between CPU and GPU, allowing large models to load without out-of-memory errors.
Does the HP ZGX Nano support PyTorch, TensorFlow, and other ML frameworks?
Yes. It runs NVIDIA DGX OS with the NVIDIA AI software stack, pre-configured for PyTorch, TensorFlow, JAX, and ONNX Runtime. The HP ZGX Toolkit adds MLflow, Ollama, and development environment tooling.
Can I use the ZGX Nano for both development and edge deployment?
Yes. Its compact form factor (150 × 150 × 51 mm) allows it to serve as a development workstation on a desk and then deploy at an edge site—factory floor, warehouse, or field location—without additional hardware.
How does local AI compare to renting cloud GPU instances?
Local infrastructure offers stronger data privacy, lower inference latency (<10 ms vs. 50–500+ ms), and predictable costs. Cloud GPUs offer unlimited scaling and are better for large distributed training or infrequent, elastic workloads. Many teams use a hybrid approach.
Can I connect multiple ZGX Nano units?
Yes. Two HP ZGX Nano units can be connected via QSFP for near-zero-latency scaling, enabling models up to 405B parameters. A compatible QSFP cable is required (sold separately).
Does the HP ZGX Nano run Windows?
No. The ZGX Nano runs NVIDIA DGX OS, a Linux-based operating system purpose-built for AI development. Client devices that connect to it remotely can run Windows 11 or Ubuntu 24.04+.
Next Steps
The HP ZGX Nano enables edge-first AI development with full data privacy and cost predictability. Whether you’re prototyping locally, fine-tuning domain-specific models, or deploying inference at the edge, it provides a complete local AI workflow without cloud dependency.
To get started: evaluate your model sizes and hardware requirements, consider your edge deployment needs, and explore
HP Z AI workstation resources for more information.
About the Author
Beata Perzanowska is a technology writer covering AI, IT infrastructure, and business technology topics.