We all know the story: the physical servers in the data center model from years past yielded an extremely high value of idle time. In many instances, 80-90% of the operational time, the server was idle. This reality led to the adoption of virtualization in the data center, which brings us to today.
We now exist in a time where multiple operating systems can co-exist on a single piece of hardware and share common physical resources, a la the trusty hypervisor. Any number of hypervisors can exist on the market; aside from some higher level functionality and integrations, they all perform the same basic task: contention management.
The hypervisor eats, sleeps and breathes contention. Without contention, we would be sitting with idle servers, again. The question that needs answering is whether contention is a good thing or a bad thing. Let's take a look at the answer in a little more detail.
A virtualization environment experiences contention in any number of ways:
The basis of virtualization and the hypervisor is truly contention management. The contention from above results in the ability to operate a single piece of server hardware above the 80-90% idle mark and make the most out of the server investment. They represent the core operational functions of virtualization (sharing CPU, Memory, Storage, and Network). At this level, contention is a good thing.
However, too much of a good thing can end up being disastrous. This is very much the case with contention in a virtualization environment. Too much contention for resources can lead to a degradation of performance and functionality that will bring even the most basic of environments to their knees.
Virtual machines are configured with a defined number of virtual CPUs. Physical servers are configured with a defined number of processors, cores and processing threads. The hypervisor ensures that virtual machines take turns with accessing the physical CPU resources available on the server. The contention problem tends to arise when a large number of virtual CPU operations need to be processed on a server. Too many operations at the same time can cause the hypervisor to queue the operations until the appropriate number of physical processors are available. The end result is that a virtual server may need to wait a significant amount of time until it can be scheduled—a state known as the CPU Ready state—particularly if it uses many virtual CPUs. It is not uncommon to see a highly contentious environment reach in 100s - 1000s of milliseconds until a CPU request can be processed! These delays in scheduling VMs can dramatically increase the IO latencies seen by the guest OS. It is a good idea not to give VMs more virtual CPU than they actually need.
Virtual machine memory management is one of the most magical functions that a hypervisor can perform. Even with the maturity of the virtualization ecosystem, there remains a lack of understanding by IT departments, which can lead to a world of hurt.
Much of memory management requires an understanding of how memory is used in an environment. Some operating systems consume ALL memory while others only consume what is necessary at the time. Some applications consume ALL memory, while others give and take as necessary. Many more have memory leaks that take, take and take some more until all resources are exhausted.
The hypervisor is able to understand much of how memory is used by the guest virtual machines on the system and usher memory around as necessary. The guest OS believes that it has everything it is configured with when, in reality, the hypervisor has only given it what it needs. Plus, with techniques like transparent page sharing, common memory contents can be shares amongst virtual machines. Magic, I tell you!
Memory management goes out the window when it turns out the guest virtual machine actually needs the resources it is looking for. In a highly contentious environment, the resulting is a significant amount of virtual host swapping, if not worse. Recall that a swapping event is one in which memory resources have been exhausted, so the operating system pushes memory functions to additional space on traditional storage media (like local disk or SAN/NAS locations).
One of the most common times for high memory contention is a boot storm, a period of time in which a significant number of virtual machines on a host are booting at the same time. Windows operating systems, while booting, consume ALL memory configured. In the event of a power failure or a VDI workload, it is very possible that ALL physical memory can be consumed at the same time. The result is that the performance of ALL workloads on the server can suffer until the demand for memory has subsided, which may very well end up being a significant amount of time. This is especially true if the CxO or application owner is standing over your shoulder.
Be sure to check back here next week for part 2 of this post - which explores how this affects the storage and network sides.
Tintri all-flash storage and software controls each application automatically