Colocation Hosting for AI Workloads: What You Need to Know
AI workloads have broken the assumptions that traditional colocation was built on. The power density, cooling requirements, network bandwidth, and reliability expectations of GPU infrastructure are fundamentally different from conventional enterprise IT. A colocation facility that works perfectly for web servers and databases may be entirely inadequate for a rack of NVIDIA H100s.
This guide covers the specific technical requirements that AI workloads place on colocation hosting: power density, cooling methods, interconnects, security and compliance, SLA considerations, and a practical checklist to evaluate any provider you are considering.
How AI Changes the Colocation Equation
Traditional colocation was designed for a world where a server rack drew 5-8kW and the biggest concern was keeping air temperature below 27C. AI infrastructure changes every variable:
- Power draw increases 5-10x: A single rack of GPU servers can draw 30-50kW, with next-generation systems pushing toward 100kW per rack.
- Heat density overwhelms air cooling: The thermal output per square metre of an AI deployment can exceed what traditional HVAC systems are designed to handle.
- Network requirements shift from bandwidth to latency: AI training clusters need low-latency, high-bandwidth GPU-to-GPU interconnects (InfiniBand), not just internet connectivity.
- Uptime has direct revenue impact: GPU hardware is expensive, and idle time represents significant lost value. Downtime during a multi-day training run can waste days of compute.
- Workloads are sustained, not bursty: Training runs can sustain peak power draw for days or weeks continuously, unlike enterprise workloads that average well below peak.
These differences mean that evaluating colocation for AI workloads requires a different framework than traditional IT hosting. The sections below cover each critical dimension.
Power Density Explained
Power density -- the amount of electrical power consumed per rack -- is the single most important specification for AI colocation. Here is how it breaks down:
Standard IT: 5-10kW Per Rack
A typical enterprise server rack running web servers, application servers, or storage arrays draws 5-10kW. Most colocation facilities built before 2020 were designed for this range. The power distribution infrastructure (busways, PDUs, circuit breakers) and cooling systems are sized accordingly. You cannot simply plug a 40kW GPU rack into a facility designed for 10kW -- the electrical infrastructure will not support it.
AI and GPU Workloads: 30-50kW Per Rack
An 8-GPU NVIDIA H100 SXM server draws approximately 10kW under full load. Add networking, storage, and management infrastructure, and a fully populated GPU rack reaches 35-45kW. NVIDIA's DGX H100 system (a pre-integrated 8-GPU node) draws approximately 10.2kW per node. A standard rack can hold 4 such nodes, putting the total at approximately 40kW plus networking overhead.
Next Generation: 50-100kW+ Per Rack
NVIDIA's GB200 NVL72 architecture and similar next-generation GPU systems push power requirements even higher. A single NVL72 rack can draw upward of 120kW. The industry is actively developing infrastructure to support these densities, but very few facilities can deliver it today. If you are planning deployments beyond 50kW per rack, your provider options are extremely limited.
When evaluating a provider, ask: What is the maximum power delivery per rack with N+1 redundancy? The answer needs to be at or above your peak load, not your average load. GPU training workloads sustain near-peak power draw for extended periods, unlike traditional IT loads that fluctuate.
Cooling: Air, Liquid, and Everything Between
Where Air Cooling Hits Its Limits
Air cooling works by blowing cold air through server chassis. It is effective for power densities up to about 15-20kW per rack under ideal conditions, and can be stretched to 25-30kW with aggressive airflow management (hot/cold aisle containment, high-velocity fans, in-row cooling units). Beyond 30kW, the volume of air required and the heat differential needed become impractical. The air simply cannot remove heat fast enough from the concentrated heat sources inside GPU accelerators.
Direct Liquid Cooling (DLC)
Direct liquid cooling circulates fluid (typically water or a water-glycol mixture) through cold plates mounted directly on GPU chips, CPUs, and other high-heat components. The liquid absorbs heat at the chip and carries it to a heat exchanger or cooling distribution unit (CDU) where it is rejected. DLC is the dominant cooling approach for modern GPU deployments because it can handle very high heat loads (50kW+ per rack) efficiently and quietly. NVIDIA's reference designs for H100 and GB200 systems include DLC-ready configurations.
Rear-Door Heat Exchangers
A rear-door heat exchanger (RDHx) replaces the standard rear door of a server rack with a liquid-cooled coil. Hot exhaust air from the servers passes through the coil, which absorbs heat before the air enters the room. RDHx solutions can supplement air cooling by removing 50-80% of the rack's heat load, effectively extending the power density that air-cooled facilities can support. They are a pragmatic retrofit option for facilities that cannot implement full DLC.
Immersion Cooling
Immersion cooling submerges entire servers in a non-conductive fluid (typically a synthetic dielectric fluid). The fluid absorbs heat directly from all components. Single-phase immersion uses fluid that remains liquid; two-phase immersion uses fluid that boils at a controlled temperature, with the vapour condensing and dripping back. Immersion can handle extreme densities and is inherently silent. Adoption for GPU workloads is growing but remains a smaller fraction of the market compared to DLC.
When evaluating a provider's cooling capability, ask: What cooling technology is installed and operational? Planned or "available upon request" cooling infrastructure means construction time and risk. Operational cooling infrastructure means you can deploy now.
Network and Interconnects
InfiniBand: The Training Standard
For multi-node GPU training -- large language models, foundation models, distributed training of any kind -- InfiniBand is the interconnect of choice. InfiniBand provides high bandwidth (400Gbps for NDR, 800Gbps for XDR) with extremely low latency and native support for Remote Direct Memory Access (RDMA), which allows GPUs to communicate directly without CPU involvement.
NVIDIA's DGX SuperPOD reference architecture uses InfiniBand fabric to connect multiple DGX nodes. If you are deploying a training cluster, your colocation provider must support InfiniBand cabling between racks, which requires specific cable routing (InfiniBand cables are thick and have length limitations) and switch placement.
RDMA over Converged Ethernet (RoCE)
RoCE provides RDMA capability over standard Ethernet infrastructure. It uses Ethernet switches rather than InfiniBand switches, which can reduce cost and complexity. RoCE v2 (routable RoCE) works at Layer 3, making it compatible with standard network architectures. For some inference workloads and smaller training clusters, RoCE can be a practical alternative to InfiniBand, particularly when the performance difference is marginal for your specific workload.
External Connectivity
Beyond GPU-to-GPU interconnects, your colocation deployment needs robust external connectivity for data ingestion, model serving, and management. This means diverse fibre paths to multiple carriers, cloud provider on-ramps (direct connects to AWS, Azure, GCP), and internet exchange access for broad, low-latency internet connectivity. Carrier-neutral facilities where you choose your own providers offer the most flexibility and the best pricing leverage.
Security and Compliance
ISO 27001
ISO 27001 is the international standard for information security management systems (ISMS). Most reputable colocation providers hold this certification, which demonstrates that they have established and maintained a systematic approach to managing sensitive data. For AI companies processing personal data or operating in regulated sectors, your provider's ISO 27001 certification is typically a baseline requirement.
SOC 2 Type II
SOC 2 Type II reports assess a provider's controls over security, availability, processing integrity, confidentiality, and privacy over a sustained period (typically 6-12 months). Unlike SOC 2 Type I (a point-in-time assessment), Type II provides evidence that controls are consistently operating. If your customers or investors require SOC 2 compliance from you, your colocation provider's SOC 2 Type II certification flows up to support your own compliance posture.
Additional Certifications
Depending on your sector, you may also need: PCI DSS (for processing payment data), NHS Data Security and Protection Toolkit (for UK health data), Cyber Essentials Plus (a UK government-backed scheme), or specific government security classifications. Confirm that your provider holds or can support the certifications your compliance framework requires before signing a contract.
SLAs for AI Workloads
Standard colocation SLAs were designed for traditional IT. AI workloads have different risk profiles and failure costs:
- Power uptime: 99.99% (the "four nines" standard) allows approximately 52 minutes of downtime per year. For AI training workloads, even brief power interruptions can waste days of compute. Evaluate whether the provider's UPS and generator failover systems provide genuinely seamless power transitions.
- Cooling SLAs: Standard SLAs guarantee ambient temperatures within a range (typically 18-27C). For liquid-cooled GPU workloads, you need SLAs on coolant temperature and flow rate. A cooling failure with 40kW racks can cause thermal shutdowns within minutes.
- Network uptime: If your inference serving relies on the colocation provider's connectivity, the network SLA directly affects your service availability. Diverse paths and multiple carriers provide resilience beyond what a single-carrier SLA guarantees.
- Response time for incidents: When hardware needs physical intervention, the provider's response time matters. Ask for SLAs on remote hands response (target: under 15 minutes during business hours, under 30 minutes 24/7) and emergency response for power or cooling events.
- Penalty structures: What happens when an SLA is breached? Service credits are standard, but they rarely compensate for the actual cost of a GPU training run lost to a multi-hour outage. Understand the penalty structure and whether it materially aligns the provider's incentives with your needs.
Checklist: 10 Questions to Ask a Colocation Provider
Before committing to any colocation provider for AI workloads, get clear answers to these ten questions:
- What is the maximum power delivery per rack with N+1 redundancy? You need a number, not a range or a promise. If the answer is below your peak requirement, the facility cannot serve you.
- What liquid cooling technology is installed and operational? "Planned" or "available" is not the same as "deployed." Ask to see it running.
- What is the PUE of the facility? Lower is better. Below 1.3 is good. Below 1.2 is excellent. Above 1.5 means significant energy waste and likely air-cooling limitations.
- Can you support InfiniBand cabling between racks? If you need multi-node training, this is non-negotiable. Verify cable routing paths and length constraints.
- Which network carriers are present on-site? Carrier diversity provides redundancy and pricing leverage. At minimum, look for two independent carrier paths.
- What certifications do you hold? ISO 27001, SOC 2 Type II, and any sector-specific certifications you require. Ask for copies of current certificates.
- What are your SLAs for power, cooling, and network? Get specific numbers: uptime percentages, response times, and penalty structures. Compare across providers.
- What is your scalability path for my deployment? Can you add racks? Is adjacent space reserved? What is the lead time for expansion? A provider who can grow with you saves the cost and disruption of relocation.
- What remote hands services are included, and what is charged extra? Understand the cost of physical support before you need it. Budget for realistic levels of remote hands usage based on your deployment size.
- Can I visit the facility before committing? Any reputable provider welcomes site visits. If they refuse or delay, that is a significant red flag. Walk the data hall, inspect the cooling infrastructure, and verify that reality matches the sales deck.
This checklist covers the technical fundamentals. For commercial and pricing considerations, see our guides on GPU colocation and UK-specific colocation.
See how much you could save
Upload your cloud bill to our free AI-powered audit tool and get a detailed cost breakdown in minutes.
Get MatchedEvery Provider, Verified Against AI Requirements
ColoGPU verifies every provider against AI-specific requirements. Zero cost to you.
Get in Touch