Mirantis is introducing its AI Factory Reference Architecture, which provides blueprints for building and managing AI factories.

Utilizing the company’s k0rdent AI platform that provides a templated, declarative model for rapid provisioning, the AI Factory Reference Architecture will enable AI workloads to be deployed within days of hardware being installed.

According to Mirantis, cloud-native workloads are usually designed for scale-out and multi-core operations, while AI workloads typically necessitate turning multiple GPU-based servers into a single computer with aggregated memory. 

Other challenges of AI workloads include the need for fine-tuning and configuration, multi-tenancy, data sovereignty, managing scale and sprawl, and skills availability.

The reference architecture attempts to address these challenges by providing reusable templates across application, platform, compute, storage and network, and security and compliance layers, which can be used to assemble infrastructure.

The application platform layer supports the launch of AI services. The platform layer enables automation of bare metal, VM, and Kubernetes clusters in data centers, clouds, and at the edge. The compute and GPU layer enables fractional provisioning, GPU sharing, and support for the latest features from major vendors. The storage and network layer provides high-throughput NVMe-oF, multi-tiered storage, and AI-optimized networking, including RDMA, Infiniband, RoCEv2, and SmartNIC/DPUs. And finally, the security and compliance layer offers zero trust, hard multi-tenancy, confidential computing, and data sovereignty. 

“The Mirantis AI Factory Reference Architecture is designed to be composable so that users can assemble infrastructure from reusable templates across compute, storage, GPU, and networking layers tailored to their specific AI workload needs,” Mirantis wrote in an announcement. 

Mirantis provides integrations with a number of vendors, including NVIDIA, AMD, and Intel, and it also includes a catalog of other validated third party and open source integrations, with tools like SLURM, KubeFlow, AIBrix, llm-d, Gcore, and ClearML.