Intel Xeon 6 and SambaNova RDU Power a New Heterogeneous Architecture for Scalable Agentic AI Inference

Apr 09,2026

SambaNova announced a deeper phase of collaboration with Intel, introducing a heterogeneous hardware architecture designed to deliver outstanding inference performance for demanding agentic AI workloads. The design combines GPUs for the prefilling stage, the Intel Xeon 6 processor as both the host and execution CPU, and the SambaNova RDU for decoding. The solution is expected to be available in the second half of 2026 for enterprises, cloud service providers, and autonomous AI initiatives that need to scale coding agents and other agent-based workloads.

According to SambaNova’s leadership, successful agentic AI deployments are increasingly following a pattern: GPUs initiate tasks, Xeon processors manage and run them, and RDUs complete them efficiently. By working with Intel, SambaNova provides customers with a practical blueprint that can be deployed in existing air-cooled data centers and remains fully compatible with today’s x86 software ecosystem, tools, and coding agents.

Intel emphasized that most data center software stacks are built on x86 and run on Xeon processors, offering a mature and reliable foundation for developers and enterprises operating at scale. As workloads evolve, heterogeneous computing becomes essential. This joint solution delivers a cost-effective, high-performance inference architecture powered by Xeon 6, purpose-built for large-scale deployment.

Agentic AI moves from concept to production

Agentic AI has rapidly progressed from experimental demonstrations to real-world deployment. Coding agents can now compile and execute code, call tools and APIs, access databases, and orchestrate workflows while relying on fast, low-latency large-model inference. In practice, this evolution exposes the limitations of GPU-only systems. While GPUs remain effective for the prefilling phase, overall performance and scalability increasingly depend on CPUs and dedicated inference accelerators.

Industry observers note that AI agents are generating exponentially more code, which must be compiled and executed in secure sandbox environments typically powered by server CPUs such as Xeon. This shift highlights the need for balanced architectures where different hardware types specialize in different stages of the workflow.

Analysts also point out that no single chip type can optimally handle every phase of agent workflows. The Intel–SambaNova approach stands out because it pairs RDUs for rapid decoding with Xeon CPUs for tool execution and system orchestration, delivering strong performance with fewer chips while remaining fully aligned with enterprise software environments.

Why Xeon 6 and RDU?

The architecture centers on Xeon 6 processors and SambaNova RDUs. The RDU is optimized to transform the economics of token generation by providing high-throughput, low-latency decoding for large language models. Meanwhile, Xeon 6 offers the memory bandwidth, PCIe connectivity, and on-chip acceleration needed for orchestration and execution tasks.

SambaNova’s internal testing shows that Xeon 6 can improve LLVM compilation speed by more than 50% compared with Arm-based server CPUs, and deliver up to 70% better vector database performance than comparable x86 solutions. These gains significantly accelerate end-to-end coding-agent workflows, helping developers move from prototype to production more quickly.

How the workload is divided

In this new design:

• GPUs handle the highly parallel prefilling stage, converting long prompts into key-value caches efficiently.

• RDUs work alongside Xeon 6 as dedicated inference engines for fast, efficient decoding once the CPU has completed task setup.

 •Xeon 6 serves as both the host CPU and the system control plane, coordinating agent tasks, distributing workloads, executing tools and APIs, compiling and running code, and validating outputs.

This separation of responsibilities reflects the broader evolution of agentic AI infrastructure. Rather than relying on a single chip to perform all functions, enterprises benefit from architectures that balance control, execution, and inference across specialized hardware.

Preparing for the next phase of AI deployment

This announcement marks a transition from partnership to large-scale commercial readiness. It signals confidence in a heterogeneous approach that enables enterprises, service providers, and global cloud platforms to deploy agentic AI with higher performance, better efficiency, and strong compatibility with existing infrastructure.

Produit RFQ