0 0

Data dive: VM sizes in the real world

The Tintri VMstore sees VMs and virtual disks, not just LUN-wide averages and correlations. This lets you see the activity in your data center in a way that was not possible before.

  • Using the information in Tintri’s autosupports, we can gather a wealth of information about virtual machine behavior in the real world.
  • Typical provisioned VM sizes among Tintri customers cluster between 40 GiB and 80 GiB. A few specific VM sizes are quite common: 40 GiB, 80 GiB and 64 GiB.
  • Thin provisioning provides Tintri customers with about a 2x space savings compared with thick provisioning. The effect is smaller for VMs on the extremes of size.

Every night, Tintri VMstores upload a summary of their health status and activity. This helps Tintri identify issues with systems in the field and allows our support team to proactively reach out to customers whose storage devices may need attention. It also gives Tintri engineering a better sense of how our product is actually used, and what areas might need attention. Real-world data from VMstore autosupports moves engineering decisions from “opinion-based” to “evidence-based.”

Virtual machine sizes

A simple question we can ask is “how big is a typical virtual machine?” 

I collected anonymized virtual machine size information, from approximately 400,000 different virtual machines, from Tintri customers' most recent nightly autosupports. The two capacity statistics that are most relevant are:

  • Provisioned size: how large of a virtual disk did the user create? Tintri VMstore collects this information from the file size of the .vmdk files attached to a VM.
  • Used size: how much logical data was written to the virtual disks and stored on Tintri?

In some cases, used size can exceed provisioned size. A VMware snapshot creates a -delta.vmdk file which does not add to the provisioned size, but does add to the used size. Tintri also reports a zero provisioned size for synthetic virtual machines that represent VM snapshots or replicas. In some environments, the provisioned size of a linked clone may also be underreported. I excluded from my analysis a significant number of VMs that had zero for both the provisioned and used size; these may be VMs that were deleted from storage but not from vCenter, or other corner cases.

VM sizes span several orders of magnitude. The smallest provisioned size in my data set was just a few hundreds of kilobytes, and the largest was more than 30 TiB. The distribution of VM sizes is graphed below:

Provisioned GiB chart

From the graph, we can see that most VMs are between 20 GiB and 160 GiB in size.

The most common sizes are 40 GiB (3.81 percent of the sample) and 80 GiB (3.78 percent). Other common sizes are 10 GiB, 30 GiB, 42 GiB, 44 GiB, 50 GiB, 60 GiB, 64 GiB, 80 GiB, 100 GiB, and 104 GiB. Together these sizes constitute 30 percent of the customer VMs.

It’s a little surprising to me that 20 GiB VMs are not more common; it is the 18th-most popular size, with 1.29% percent of the sample. But small VMs are relatively uncommon anyway. The following chart groups the smallest and largest VMs together for easier comparison:

Provisioned sizes chart

The distribution of used size is similar but skewed somewhat more towards smaller sizes. The used size is measured before any compression or deduplication, so it represents how much data the VM has written. For this measurement, a Tintri native clone or VAAI clone’s used size is only the amount of data written after the clone was created, so any OS image or existing data is not counted.

Distribution of used GiBs chart

While the peak of provisioned size is between 40-80 GiB, the most common used size is about half that, 20-40 GiB. About 10 percent of VMs have written less than a GiB of data.

On NFS-based datastores such as those Tintri provides, thin provisioning is the default. Very few VMs actually write to the entirety of their virtual disks. We can examine how much benefit thin provisioning provides, for each size of VMs. Overall, the percentage of a VM’s provisioned size that is actually written is about 40-50%, but there is some variation:

Written percentage of VMs' provisioned size chart

Very small VMs, of course, tend to fill up more of their disks. But it also appears that larger VMs are less thin-provisioned, on average, than normal-sized VMs. This might be due to more careful capacity planning for “monster” VMs, or it may be that these VMs are expanded as needed with additional disks in order to meet rising capacity demands.

This study shows just a single point-in-time view of Tintri’s customer base. It would be interesting to examine how VM sizes change over time, in order to predict what the demands on future storage products will be. I also did not examine the effects of Tintri’s compression and deduplication, which further reduce the amount of physical storage required.

As virtualized environments scale, there is a need to go beyond the real-time per-VM analytics which Tintri already provides, to understand the behavior of the entire population of VMs. Aggregate statistics such as the ones described here help Tintri make decisions about engineering priorities and likely workloads. But they can also help our customers understand their own environments better, by understanding what is driving storage usage and looking for outliers. Tintri is actively working on many exciting projects to bring this sort of “big data” analytics to our customers.

See for yourself.

Did this data dive get you interested in how you can get VM-level statistics yourself? We've got an interactive demo for you—no registration required. See storage differently, starting now.

Try the Tintri UI

Mark Gritter / May 02, 2016

Mark is a co-founder and architect at Tintri. Mark was in the Ph.D. program in Computer Science in Stanford University when he decided to join Kealia as one of the first employees. While at Kealia ...more