Choosing Which Applications to Run in Flash

Edward Lee

Architect

As flash storage has become cheaper, it’s becoming increasingly practical to move application storage from hard-disk drive (HDD) to solid-state drive (SSD). Where are we today and how far have we come?

In my previous blog post, “When is it cheaper to use SSD vs. HDD?” I described how the IO density of an application, or IOPS generated per GB of data, can be useful for determining when to run applications from SSD vs. HDD. Flash is most cost-effective when used to run applications with high IO density that generates lots of IOPS per GB of data. In this post, I’ll discuss how to characterize applications and determine if it’s cost-effective to run them on SSD.

 

The cost of flash in 2006 made it too expensive for any typical application (see Figure 1, below). However, the cost of flash has declined faster than the cost of disk, making it feasible to run an increasing number of applications in flash (see Figure 2, below).

Figure 1 graphs data from a publication, “Migrating Server Storage to SSDs: Analysis of Tradeoffs.” Each data point is a server application workload with the y-axis showing the random read IOPS and the x-axis showing the capacity needed to run the application at peak performance. In all, there are 49 workloads representing applications ranging from corporate Exchange servers, commercial Web services, file servers, database servers, and Web caches/proxies. The applications vary greatly both in terms of the IOPS and capacity.

Figure 1: SSD IO Density Threshold, 2006

The diagonal line in Figure 1 corresponds to an IO density of 50 IOPS per GB. In 2006, all workloads above this line were best run from SSD while all points below this line are best run from HDD. As can be seen, in 2006, it was not economical to run any of these applications from SSD.

Today, given the large drop in SSD prices as well as the advent of MLC SSDs, the break-even IO density threshold is about 1 IOPS per GB, as illustrated in Figure 2. At this threshold, approximately half of the workloads can be run economically from SSD. What a difference a few years makes!

However, raw SSD capacity remains 10 times more expensive than disk. To make SSD cost-effective, you want to avoid storing entire applications on SSD. Only relatively “hot” components should live in flash. By applying optimization techniques such as deduplication, compression, and working set analysis, the threshold IO density can be significantly lowered. I’ll discuss some of these techniques in a future post.

Figure 2: SSD IO Density Threshold, Today

In Figure 2, we can roughly group the workloads into three different classes. The first contains the hot, small applications that generate high IOPS and use little capacity. These are like the small active databases that easily fit in SSD today. The second contains cold, midsize applications such as home directories and static web pages, which can grow large but are easily cached. The last consist of warm, large applications such as email and large databases that are actively accessed and are difficult to cache. In this sample, this third class represents the largest amount of total IOPS and capacity, and represents the biggest challenge and opportunity for SSD storage vendors today.

The affordability of SSD for common applications has come a long way in just the past six years. In fact, we are just now entering the phase where larger mainstream applications can be run cost-effectively from SSD. Given the continued drop in SSD prices, it seems only a matter of several years before the majority of applications can benefit from SSD. Given that technologies such as MLC flash, inline dedupe and compression, and working set analysis can greatly accelerate this trend, the future may be closer than many people believe.