Cloud infrastructure marketing focuses on elasticity, global reach, and managed services. The performance comparison between cloud VMs and bare metal hardware rarely appears in that marketing material, because the comparison does not favor cloud VMs for sustained, predictable workloads.This article covers the specific mechanisms by which cloud VMs underperform bare metal, how to measure those…
CPU Steal Time: The Hidden Performance Tax
What CPU Steal Time Is
CPU steal time measures the percentage of time a virtual machine’s vCPU is waiting for the hypervisor to schedule it on a physical core. When multiple VMs share a physical server, their vCPUs compete for physical CPU time. When your VM wants to execute but the hypervisor is serving another VM, that wait time accumulates as steal time.
Steal time is visible in Linux via the ‘st’ column in top or mpstat output. On a healthy, lightly-loaded cloud VM, steal time might run 0-2%. On a heavily-loaded cloud host during peak hours, steal time of 10-30% is not unusual and means your application is receiving 70-90% of the CPU it believes it is running on.
How Steal Time Affects Applications
The impact of steal time is not uniform across workload types:
Latency-sensitive applications (APIs, databases, real-time processing): Steal time directly adds to response time. A 10ms database query with 15% steal time takes 11.5ms. Under sustained load, p99 latency (the worst 1% of requests) spikes disproportionately because steal time is not evenly distributed.
Batch processing (ETL, backups, report generation): Steal time extends total job duration proportionally. A 2-hour ETL job on a VM with 20% steal time takes 2.5 hours.
Throughput-based workloads (file processing, transcoding): Throughput drops proportionally to steal percentage.
On bare metal, steal time is zero by definition. The processor is not shared. Application code runs when the OS schedules it, not when a hypervisor grants permission.
The Noisy Neighbor Effect
How Shared Infrastructure Creates Variability
Noisy neighbor describes the situation where another tenant’s workload on the same physical server degrades your application’s performance. This affects more than just CPU:
Memory pressure: Hypervisors use memory balloon drivers to reclaim RAM from VMs when physical host memory is constrained. Your VM may have its allocated memory reduced without warning, triggering OS swapping.
Network I/O: Physical NICs are shared. A VM pushing large file transfers can saturate shared NIC bandwidth, degrading network throughput for all VMs on the same host.
Storage I/O: Cloud block storage (EBS, Persistent Disk) traverses a shared network fabric. Heavy I/O from adjacent tenants degrades IOPS for all tenants sharing that storage cluster.
Cloud providers implement controls (I/O credits, bandwidth limits, CPU credit systems) that limit the blast radius of noisy neighbors. These controls also limit your own peak performance. t3 instances on AWS use CPU credits: excellent average performance with burst capability, but sustained CPU-intensive workloads exhaust credits and throttle to baseline.
Memory bandwidth is frequently the bottleneck for database and analytics workloads, but cloud VM specifications typically do not list memory bandwidth. The reason: cloud VMs share the physical server’s memory channels with other VMs, so the available bandwidth per VM is a fraction of the physical hardware’s total.
A physical server with DDR5-4800 in 4-channel configuration has roughly 153 GB/s theoretical peak bandwidth. On a physical host running 4 VMs, each VM’s effective memory bandwidth approaches 38 GB/s under ideal conditions. Under contention, it drops further.
On InMotion’s Extreme Dedicated Server, the full 153 GB/s DDR5 bandwidth is dedicated to your workload. For analytics jobs scanning large datasets, this difference is the primary driver of performance improvement when migrating from cloud to bare metal.
Storage I/O: Network-Attached vs. Direct NVMe
Cloud Block Storage Architecture
AWS EBS, Google Persistent Disk, and Azure Managed Disks are network-attached storage systems. Your VM sends block I/O requests across the data center’s internal network to a storage cluster. This adds roughly 0.5-2ms of latency per I/O operation compared to local storage, and limits maximum IOPS and throughput based on the volume’s provisioned tier.
The cost comparison is significant. Provisioning 3.84TB of AWS EBS gp3 storage costs roughly $307 per month for the volume alone, before IOPS provisioning. The same 3.84TB of NVMe storage is included in InMotion Hosting’s Extreme Dedicated Server at a lower cost. Cloud-attached storage is not priced to compete with local NVMe.
Network Performance Differences
Latency to End Users
Both cloud and dedicated servers have latency characteristics determined primarily by physical distance to end users and network routing quality. Cloud providers have a global distribution advantage: AWS, Google, and Azure operate regions on every continent, while InMotion Hosting offers data centers in Los Angeles and Amsterdam.
For applications serving users concentrated in North America and Western Europe, InMotion’s data center locations cover the primary user bases. Los Angeles reaches North American users effectively; Amsterdam serves Western European users with low latency and satisfies EU data residency requirements. Applications requiring presence in Southeast Asia, Australia, or South America may need a CDN layer or a geographically distributed cloud deployment.
Predictability vs. Peak Performance
Cloud network bandwidth is typically subject to instance-level burst limits and shared NIC capacity. A c5.2xlarge on AWS provides up to 10Gbps of network bandwidth labeled as ‘Up to 10 Gbps,’ which means burst access to 10Gbps with actual sustained throughput lower and subject to traffic management.
InMotion’s dedicated servers include a 1Gbps port with the option to upgrade to a guaranteed 10Gbps unmetered port. Guaranteed 10Gbps is a different specification from ‘Up to 10Gbps burst.’ For applications that need sustained high-bandwidth transfer (video streaming, large file distribution, data ingestion), guaranteed bandwidth has operational value.
Benchmark: Database Query Latency
A practical comparison of p50 and p99 database query latency on cloud VMs vs. bare metal bare metal for a mid-size PostgreSQL deployment (50GB working set, standard OLTP query mix):
p99 latency is where the difference is most pronounced. The worst 1% of requests on cloud infrastructure suffer from steal time spikes and storage network variability. On bare metal, p99 performance stays close to median performance because neither of those variability sources is present.
Where Cloud VMs Win
An honest comparison acknowledges the categories where cloud infrastructure genuinely outperforms bare metal dedicated servers:
Auto-scaling: Cloud infrastructure scales horizontally in minutes. Adding a bare metal server takes hours to days for provisioning.
Global distribution: 15-30 cloud regions vs. 2 InMotion data center locations. Applications requiring presence in multiple continents benefit from cloud’s global footprint.
Managed services: RDS, ElastiCache, Lambda, and similar managed services eliminate operational burden for teams without dedicated infrastructure staff.
Intermittent workloads: A batch job running 2 hours per week costs pennies on cloud spot instances. A dedicated server costs the same whether it runs 1 hour or 720 hours per month.
Making the Decision
If your workload runs continuously and requires predictable performance: bare metal dedicated wins on cost and performance
If your workload scales dramatically and unpredictably: cloud flexibility may justify the cost premium
If you are spending more than $300 per month on cloud compute for a stable workload: run the bare metal comparison
If p99 latency variability is affecting your application SLAs: bare metal’s zero steal time addresses the root cause
