Testing performed at VMware labs in fall of 2012 by
Tristan Todd, EUC Reference Architect, firstname.lastname@example.org
Rex Walters, VP Technology, Strategic Alliances, email@example.com
Both VMware and Tintri are actively engaged at a number of large enterprise “virtual desktop infrastructure” (VDI) projects at a number of shared customer accounts. Both companies wanted to conduct formal performance tests, in a controlled lab environment, of VMware View with Tintri storage.
We wanted to determine the maximum number of desk- tops that can be simultaneously supported on a single T540 without running into performance issues. A secondary goal was to validate a minimal server hardware configuration (hosts and networking as well as the storage itself). Lastly, we wished to characterize the IO workloads presented by large numbers of desktop users, and validate the benefits Tintri technology provides to VMware View environments and VDI in general.
Tintri believes quite strongly that a successful VDI project ultimately depends on three key characteristics:
Simplicity: Administrators responsible for initially configuring and managing an environment are the most obvious beneficiaries of management simplicity.
At the risk of seeming hard-hearted, however, finan- cially motivated executives care surprisingly little how hard someone has to work to perform their job. What they do care about, however, is the success or failure of new projects, how quickly they can be rolled out, and how costly they will be to manage.
Keeping things simple is the surest way to ensure quick and continued success.
Low cost: While it’s generally a mistake to think that virtual desktops can ever be less costly than physical desktops, VDI projects are unlikely to reach even the pilot stage unless costs are kept within reason. Storage costs alone can often undermine a new VDI project (because of the surprising IO demands).
Just as an enterprise storage array is more costly than the collection of individual hard disk drives it contains, the enterprise-grade hardware infrastructure required to host virtual desktops will always be more costly than an equivalent number of physical desktops. The former provides additional security, availability, and data management benefits that simply don’t come for free. Nonetheless, the financial justification for these benefits isn’t possible if the basic hardware infrastruc- ture costs aren’t kept within reason.
Performance: Put simply, users won’t be happy unless their virtual desktops provide at least the performance and overall user experience that they received from a physical desktop. The bar is continually being raised: these days users expect their desktops to perform like an SSD-enabled ultrabook, not the hard-disk based PCs of a few years ago.
Tintri believes that hard-disk-drive (HDD) based storage arrays are incapable of balancing these requirements for a VDI project—the random IO performance demands from virtual desktops completely overwhelm the cost and complexity of the design. With an HDD-based array it becomes: “Simplicity, low cost, or performance: pick any one.”
Used correctly, however, flash storage easily satisfies the per- formance demands of virtual desktops. The trick is to keep it simple and reasonably priced. Simply throwing a flash cache in front of an HDD-based array does not suffice. Invariably, a simple cache actually increases complexity and costs. Even siz- ing the cache correctly is a difficult and complex topic.
As the rest of this document will show, the Tintri VMstore T540 storage system provides an extremely simple and low cost storage system with sufficient performance for any VDI project with need of more than 200 simultaneously active desktops.
Both VMware and Tintri agree that the only meaningful way to validate that a storage system is capable of meeting the demands of a VDI project is to actually simulate real user behavior.
A common but flawed approach to sizing is to create a brutally simple model of an “average desktop user”. A typical model would state that such a user generates, say, 20 IOPS on average during the course of their work. The belief then, is that any storage system that can support 20,000 IOPS would suffice to support 1,000 virtual desktop users.
Some vendors attempt to use (very) slightly more sophisticated models with factors for read/write ratios, block sizes, and perhaps two or three different user types rather than a single “average desktop.” They fail to address the most important point, however, which is that IO demands change over the course of a working day.
We believe a more sophisticated approach than a simple model based on IOPS is to deploy some large number
of virtual desktops, then simulate user activity within each one of the guest VMs. This ensures that the overall environment (hosts, network, as well as storage) provides adequate performance to meet the IO demands.
The basic methodology we used during each of the refer- ence architecture tests was to provision a large pool of desktop VMs (typically 1,000 desktops which is the maxi- mum number of VMs formally supported on a VMstore T540) then execute some number of iterations of the user activity simulator (View Planner) and evaluate the results
We chose to use VMware View Planner v2.1 to simulate user activity during our reference architecture testing. View Planner is a tool maintained by VMware expressly for VMware and VMware partners to test VDI perfor- mance.
VMware’s View Planner tool simulates application workloads for various user types (task workers, knowledge workers, and power users) by actually running applications typically used in a Windows desktop environment. During the execution of a workload, applications are randomly called to perform com- mon desktop user operations, including open, save, close, minimize and maximize windows; view an HTML page, insert text, insert words and numbers, conduct a slideshow, view a video, send and receive email, and compress files.
View Planner then uses a patent pending watermark tech- nique to quantify the user experience and measure applica- tion latency on a user client/remote machine.
The standardized View Planner workload consists of nine applications performing a combined total of 44 user opera- tions. These user operations are separated into three groups, shown in the table below. The operations in Group A are used to determine the View Planner score, while the opera- tions in Groups B and C are used to generate additional load.
Figure 1: View Planner Operations
The View Planner score is the 95th percentile value for application response time for Group A operations. This value represents a quality of service score based on operational latency experienced by users. A passing score is when the View Planner score is less than 1.5 seconds, which indicates an acceptable level of user experience at scale.
Tintri feels that while the 1.5 second threshold represents an adequate user experience, it is more indicative of the experi- ence from a hard-disk based desktop. Ultrabook quality user experience would be indicated by View Planner scores well under the 1.5 second threshold.
Figure 2: Test hardware.
The testing environment comprised a fairly modest amount of hardware: just four servers for desktop VMs, a small server for server VMs, a 10 gigabit network switch, a 1 gigabit network switch, and a Tintri VMstore T540. All hardware fit in approxi- mately one half of a standard sized equipment rack.
Note that the VMstore contained all server VM images as well as the desktops themselves. We also performed our first tests (succesfully) with just four hosts (without the dedicated host for the server VMs). View Planner run rules, however, require that View Planner and other server VMs run on separate server hardware.
All hosts were loaded with VMware vSphere ESXi 5.0 build 623860.
Table 1: Hardware configuration
Each host was configured with a traditional vSwitch that was used for the host management interface. This vSwitch was backed by dual 1GbE uplinks in a teamed configuration.
A dVSwitch was also deployed that was uplinked with redun- dant 10GbE adapters on each host. Trunk ports and VLAN tagging were utilized to properly segregate traffic (NFS, VM, and VMotion) traffic for security and performance reasons.
Figure 3: Virtual network configuration.
Figure 4: VMstore Contents
The VMstore contained all VM images, both servers and desk- tops (refer to Figure 4).
Note that this document and the vSphere client displays the capacity of the T540 as “12 TB.” This indicates 12 × 210 bytes (or roughly 13.5 trillion bytes).
The virtual server VM configurations were as as indicated in table 2.
We used a View Connection/Manager Server (v5.1, Build 704644).
A single vCenter server was deployed and the View Composer service was installed locally. The basic configura- tion of View Composer was as shown in table 3.
Table 2: Server VM configuration.
Table 3: vCenter / View Composer configuration
Tables 4-6: View Manager configuration
The operating system image deployed for the desktop images uses Microsoft Windows 7 SP1 with all Important MS secu- rity updates installed. These images are considered to be
the Gold images. Upon configuration and customization (according to View Planner specifications), these images were converted to virtual machine templates.
The virtual hardware configuration of the Master desktop virtual machine varies based on the computing resources required by the end user. The following tables define the default virtual hardware specifications. Exceptions to this default specification are handled on a pool-specific basis.
Table 7: Windows 7 master image configuration
Microsoft SQL Server 2008 was installed on the same Windows 2008 virtual machine as vCenter and View Composer.
vCenter Operations Manager for View 1.1 was deployed in a standard configuration with appliances configured with sizing assumption of 100 hosts and 2000 virtual machines. VCOps was used for detail metrics gathering and to produce the performance graphs included in this document.
View Planner testing was conducted using “local” mode, where no remote simulated launchers were used to simulate PCoIP sessions. This is a standard configuration for conduct- ing storage integration testing.
We carefully followed all relevant best practice published by VMware and Tintri.
All network traffic was carefully separated onto separate vmkernel networks:
The Tintri VMstore was configured as a single datastore with a single logical data IP address on the 10 gigabit IP storage network, and was used by all esxi hosts. A separate IP address was used to configure and manage the T540.
The VMstore T540 was configured with redundant physical 10 gigabit ports to the data network to provide a failover path. Jumbo frames were not configured.
All network configuration was in accordance with the Tintri NFS and vSphere Best Practices guide. This includes disabling Storage I/O Control (SIOC) on every vSphere host.
All hosts used PCIe x8 or greater slots for their 10 gigabit network adapters.
Note: Though the published VMware tested maximum num- ber of virtual machine objects per NFS volume in vSphere 5 is 250, it is possible to support 1000 virtual machines in a single VMstore T540.
We adjusted the concurrent power and provisioning settings in View Manager (max concurrent power ops, max concur- rent view composer ops, and max concurrent view provision- ing ops) for optimal performance during each test run.
Figure 5: Logical overview.
During the course of the testing we validated the following:
1000 desktops (full clone or linked clone) easily fit and perform well on a single Tintri VMstore T540.
The 5 host, one T540 architecture is a scalable unit for any number of virtual desktops (a half rack of equipment for each 1,000 desktops).
All management, monitoring, infrastructure servers can be housed on a single Tintri VMstore T540
Achieved View Planner benchmark certification for delivering low application latency (and thus good user experience).
This section documents the details of each test performed.
The Tintri VMstore is, as both authors agreed, one of the easiest shared storage solutions to deploy and integrate into a vSphere environment. Because the Tintri VMstore is pre- sented as a single NFS mount of 13.5 TB, the administrator only has to manage a single large datastore. The tested archi- tecture relied on 10 GbE networking which lends itself well to simple architecture. Cables were minimized and bandwidth was efficiently allocated using distributed virtual switches and VLAN tagging.
No complicated workload or sizing estimation was required prior to deploying the Tintri array. The VMstore is designed
to handle data tiering and SSD placement automatically on the back end. Therefore, little effort is required to tune or optimize the storage environment through storage placement or tiering.
Both authors believe that Tintri deployment documentation is better than that provided by most storage vendors. The VMware lab teams were able to easily rack and configure the VMstore equipment; using excellent deployment prerequisite lists and deployment checklists provided by Tintri.
Bringing the Tintri storage online was fast and painless. With the Tintri array powered on, from logging into vCenter until the vCenter VM was migrated to the NFS store took approxi- mately 8 minutes. Minimal vSphere setting changes were required to adhere to standard VMware NFS and Tintri best practices.
During testing, desktop pool deployments were fast and consistent. The deployment of linked clones and full clones was tested at scale to 1000 seats with very few stability issues. Desktop pool deployments were fast, in fact some of the fastest seen in the technical marketing labs. Some key performance milestones were:
500 linked clones deployed per hour
1,000 linked clones deployed in 2 hours and 30 minutes.
1,000 full clones deployed in 2 hours and 22 minutes (with the Tintri VAAI plugin installed on the esxi hosts).
1,000 desktops booted (cold start) in 16 minutes.
Individual desktop recompose in 1 minute, 15 seconds.
1,000 linked clone pool recompose in 2 hours and 45 minutes.
All View Planner testing was carried out in strict accordance with the View Planner benchmark rules as defined in the View Planner Workload Usage Rules (revised 21 October 2011). All View Planner test results were reviewed and approved by the VMware Performance Benchmark Team in early October 2012.
Table 8: View Planner results
Figure 6: Desktop vDisk latency using linked clones.
Figure 7: Desktop vDisk Latency using full clones (with VAAI).
Desktop Virtual Disk latency (vDisk Latency) is considered to be one of the key metrics associated with application response time and thus VDI user experience. This metric directly shows user perceived, end-to-end latency inclusive of host, network, and storage delays. During benchmark testing, this is a key metric to watch as density or workload intensity are increased.
vDisk performance during all Tintri testing was quite good and exhibited latency characteristics similar to those seen on high performance ultrabooks. During testing, both read and write latency levels were easily better than physical PCs and laptops with 7200-rpm or 8400-rpm hard drives. Read and write latency levels of 5ms or lower are considered quite good in the world of hosted desktops.
Figure 8: Host CPU utilization with 4 esxi hosts.
Figure 9: Host RAM utilization with 4 esxi hosts.
With 250 desktops on each host, CPU and RAM resources were pushed well beyond the limits considered by many to be best- practice maximums.
It is strongly advised not to run at this level of utilization in production. It is always a good idea to operate at no more than 80% host CPU and RAM utilization.
The tested architecture, while it can run 1000 task-worker desktops in a stable manner, is ideally suited for no more than 80% concurrency, where only 800 of the desktops are active with a logged-on user at one time. To scale beyond 800 active sessions would require additional host and network resources, beyond what was deployed for the initial testing.
Figure 10: Datastore latency as measured by VCOPS
Figure 11: Datastore CPU as measured by VCOPS
Storage performance measured at cluster level (front end) in vCenter Operations Manager was quite good. Overall the front- end latency was low and the VMstore was able to maintain low latency even during periodic IO bursts.
Average read latency of 1.7 ms
Peak read latency of 7.2 ms
Average write latency of 6.29 ms
Peak write latency of 30.5 ms (host bottleneck during data collection)
Peak throughput (during deployment) of 400 MB/s
Peak IOPS (during deployment) of 26K IOPS
In typical VDI environments it is ill-advised to perform storage intensive operations such as pool deployments when shared infrastructure (host and especially storage) is in use with production workload. During testing, a 500-seat linked clone pool was deployed on the VMstore. During a View Planner run, another 500-seat pool was composed and deployed in the back- ground on the same storage array (and same hosts). As follows, you can see IO-burst activity associated with the pool deploy- ment (at approximately 0030 on 10/5). The second graph shows that datastore latency levels remained relatively consistent despite the background operations.
Figure 12: IOPS with deploy operation during View Planner run
Figure 13: Latency with deploy operation during View Planner run.
NOTE: Write latency levels increases slightly at the tail-end of the test period. This increase is associated with View Planner log files being written out to the View Planner test harness.
During boot storm the Tintri VMstore demonstrated the ability to serve up over 30K IOPs during a startup of 1000 desktops. Total duration to bring all desktops online was approximately 16 minutes (measured at View Manager).
Figure 14: 1,000 user boot storm.
During a login storm the Tintri VMstore demonstrated the ability to burst to nearly 30K IOPs during a 2-hour, 1000 user login window
Figure 15: Monday morning login storm (1,000 users in 2 hours)
The results of this testing are significant for a number of reasons. Together, VMware and Tintri were able to define a compact and easy-to-deploy VDI infrastructure pod that goes beyond the conventional thinking for throughput, client support, perfor- mance and cost per desktop. Beyond that, through Tintri’s unique VM-aware architecture and per-VM QoS, workloads that would otherwise be incompatible with one another, were proven to coexist successfully in the same datastore thereby further driving down the cost of the infrastructure. And finally, because of Tintri’s ease of setup and administration, new benchmarks for initial deployment and desktop provisioning were established that were previously unseen in the VMware labs.
Tintri all-flash storage and software controls each application automatically