Bare Metal Performance vs. Cloud VMs: A Practical Comparison

Cloud infrastructure marketing focuses on elasticity, global reach, and managed services. The performance comparison between cloud VMs and bare metal hardware rarely appears in that marketing material, because the comparison does not favor cloud VMs for sustained, predictable workloads.This article covers the specific mechanisms by which cloud VMs underperform bare metal, how to measure those…

CPU Steal Time: The Hidden Performance Tax

What CPU Steal Time Is

CPU steal time measures the percentage of time a virtual machine’s vCPU is waiting for the hypervisor to schedule it on a physical core. When multiple VMs share a physical server, their vCPUs compete for physical CPU time. When your VM wants to execute but the hypervisor is serving another VM, that wait time accumulates as steal time.

Steal time is visible in Linux via the ‘st’ column in top or mpstat output. On a healthy, lightly-loaded cloud VM, steal time might run 0-2%. On a heavily-loaded cloud host during peak hours, steal time of 10-30% is not unusual and means your application is receiving 70-90% of the CPU it believes it is running on.

How Steal Time Affects Applications

The impact of steal time is not uniform across workload types:

Latency-sensitive applications (APIs, databases, real-time processing): Steal time directly adds to response time. A 10ms database query with 15% steal time takes 11.5ms. Under sustained load, p99 latency (the worst 1% of requests) spikes disproportionately because steal time is not evenly distributed.

Batch processing (ETL, backups, report generation): Steal time extends total job duration proportionally. A 2-hour ETL job on a VM with 20% steal time takes 2.5 hours.

Throughput-based workloads (file processing, transcoding): Throughput drops proportionally to steal percentage.

On bare metal, steal time is zero by definition. The processor is not shared. Application code runs when the OS schedules it, not when a hypervisor grants permission.

The Noisy Neighbor Effect

How Shared Infrastructure Creates Variability

Noisy neighbor describes the situation where another tenant’s workload on the same physical server degrades your application’s performance. This affects more than just CPU:

Memory pressure: Hypervisors use memory balloon drivers to reclaim RAM from VMs when physical host memory is constrained. Your VM may have its allocated memory reduced without warning, triggering OS swapping.

Network I/O: Physical NICs are shared. A VM pushing large file transfers can saturate shared NIC bandwidth, degrading network throughput for all VMs on the same host.

Storage I/O: Cloud block storage (EBS, Persistent Disk) traverses a shared network fabric. Heavy I/O from adjacent tenants degrades IOPS for all tenants sharing that storage cluster.

Cloud providers implement controls (I/O credits, bandwidth limits, CPU credit systems) that limit the blast radius of noisy neighbors. These controls also limit your own peak performance. t3 instances on AWS use CPU credits: excellent average performance with burst capability, but sustained CPU-intensive workloads exhaust credits and throttle to baseline.

Memory bandwidth is frequently the bottleneck for database and analytics workloads, but cloud VM specifications typically do not list memory bandwidth. The reason: cloud VMs share the physical server’s memory channels with other VMs, so the available bandwidth per VM is a fraction of the physical hardware’s total.

A physical server with DDR5-4800 in 4-channel configuration has roughly 153 GB/s theoretical peak bandwidth. On a physical host running 4 VMs, each VM’s effective memory bandwidth approaches 38 GB/s under ideal conditions. Under contention, it drops further.

On InMotion’s Extreme Dedicated Server, the full 153 GB/s DDR5 bandwidth is dedicated to your workload. For analytics jobs scanning large datasets, this difference is the primary driver of performance improvement when migrating from cloud to bare metal.

Storage I/O: Network-Attached vs. Direct NVMe

Cloud Block Storage Architecture

AWS EBS, Google Persistent Disk, and Azure Managed Disks are network-attached storage systems. Your VM sends block I/O requests across the data center’s internal network to a storage cluster. This adds roughly 0.5-2ms of latency per I/O operation compared to local storage, and limits maximum IOPS and throughput based on the volume’s provisioned tier.

Storage TypeTypical LatencySequential ReadRandom IOPSCostAWS EBS gp3 (provisioned)0.5-1ms1,000 MB/s (max)16,000 IOPS (max)$0.08/GB/mo + IOPS feesAWS EBS io2 Block Express0.1-0.2ms4,000 MB/s256,000 IOPS (max)$0.125/GB/mo + $0.065/provisioned IOPSInMotion NVMe (direct)0.05-0.1ms5,000-7,000 MB/s500,000-1M IOPSIncluded in server cost

The cost comparison is significant. Provisioning 3.84TB of AWS EBS gp3 storage costs roughly $307 per month for the volume alone, before IOPS provisioning. The same 3.84TB of NVMe storage is included in InMotion Hosting’s Extreme Dedicated Server at a lower cost. Cloud-attached storage is not priced to compete with local NVMe.

Network Performance Differences

Latency to End Users

Both cloud and dedicated servers have latency characteristics determined primarily by physical distance to end users and network routing quality. Cloud providers have a global distribution advantage: AWS, Google, and Azure operate regions on every continent, while InMotion Hosting offers data centers in Los Angeles and Amsterdam.

For applications serving users concentrated in North America and Western Europe, InMotion’s data center locations cover the primary user bases. Los Angeles reaches North American users effectively; Amsterdam serves Western European users with low latency and satisfies EU data residency requirements. Applications requiring presence in Southeast Asia, Australia, or South America may need a CDN layer or a geographically distributed cloud deployment.

Predictability vs. Peak Performance

Cloud network bandwidth is typically subject to instance-level burst limits and shared NIC capacity. A c5.2xlarge on AWS provides up to 10Gbps of network bandwidth labeled as ‘Up to 10 Gbps,’ which means burst access to 10Gbps with actual sustained throughput lower and subject to traffic management.

InMotion’s dedicated servers include a 1Gbps port with the option to upgrade to a guaranteed 10Gbps unmetered port. Guaranteed 10Gbps is a different specification from ‘Up to 10Gbps burst.’ For applications that need sustained high-bandwidth transfer (video streaming, large file distribution, data ingestion), guaranteed bandwidth has operational value.

Benchmark: Database Query Latency

A practical comparison of p50 and p99 database query latency on cloud VMs vs. bare metal bare metal for a mid-size PostgreSQL deployment (50GB working set, standard OLTP query mix):

Environmentp50 Latencyp99 LatencyCPU Steal (avg)NotesAWS RDS db.r5.2xlarge4ms45msN/A (managed)Network overhead to RDS endpointAWS EC2 r5.2xlarge (64GB)3ms38ms3-12%EBS storage overhead + steal timeInMotion Advanced (64GB DDR4, NVMe)2.5ms12ms0%Local NVMe, no steal timeInMotion Extreme (192GB DDR5 ECC, NVMe)1.8ms8ms0%Full working set in buffer pool

p99 latency is where the difference is most pronounced. The worst 1% of requests on cloud infrastructure suffer from steal time spikes and storage network variability. On bare metal, p99 performance stays close to median performance because neither of those variability sources is present.

Where Cloud VMs Win

An honest comparison acknowledges the categories where cloud infrastructure genuinely outperforms bare metal dedicated servers:

Auto-scaling: Cloud infrastructure scales horizontally in minutes. Adding a bare metal server takes hours to days for provisioning.

Global distribution: 15-30 cloud regions vs. 2 InMotion data center locations. Applications requiring presence in multiple continents benefit from cloud’s global footprint.

Managed services: RDS, ElastiCache, Lambda, and similar managed services eliminate operational burden for teams without dedicated infrastructure staff.

Intermittent workloads: A batch job running 2 hours per week costs pennies on cloud spot instances. A dedicated server costs the same whether it runs 1 hour or 720 hours per month.

Making the Decision

If your workload runs continuously and requires predictable performance: bare metal dedicated wins on cost and performance

If your workload scales dramatically and unpredictably: cloud flexibility may justify the cost premium

If you are spending more than $300 per month on cloud compute for a stable workload: run the bare metal comparison

If p99 latency variability is affecting your application SLAs: bare metal’s zero steal time addresses the root cause

Source link