Aug 042015
 

Previously, I covered the basics of storage subsystem metrics and testing in my article Analyzing I/O Subsystem Performance for SQL Server, including an introduction of CrystalDiskMark 4.0. CrystalDiskMark was recently rewritten to use Microsoft DiskSpd for its testing, which makes it an even more valuable tool for your initial storage subsystem testing efforts. DiskSpd provides the functionality needed to generate a wide variety of disk request patterns, which can be very helpful in diagnosis and analysis of I/O performance issues with a lot more flexibility than older benchmark tools like SQLIO. It is extremely useful for synthetic storage subsystem testing when you want a greater level of control than that available in CrystalDiskMark.

Now, we are going to dive a little deeper into how to actually use Microsoft DiskSpd to test your storage subsystem without using CrystalDiskMark 4.0. In order to do this, you’ll need to download and unzip DiskSpd. To make things easier, I always copy the desired diskspd.exe executable file from the appropriate executable folder (amd64fre, armfre or x86fre) to a short, simple path like C:\DiskSpd. In most cases you will want the 64-bit version of DiskSpd from the amd64fre folder.

Once you have the diskspd.exe executable file available, you will need to open a command prompt with administrative rights (by choosing “Run as Administrator”), and then navigate to the directory where you copied the diskspd.exe file.

Here are some of the command line parameters that you will want to start out with:

Parameter Description
-b Block size of the I/O, specified as (K/M/G). For example –b8K means an 8KB block size, which is relevant for SQL Server
-d Test duration in seconds. Tests of 30-60 seconds are usually long enough to get valid results
-o Outstanding I/Os (meaning queue depth) per target, per worker thread
-t Worker threads per test file target
-h Disable software caching at the operating system level and hardware write caching, which is a good idea for testing SQL Server
-r Random or sequential flag. If –r is used random tests are done, otherwise sequential tests are done
-w Write percentage. For example, –w25 means 25% writes, 75% reads
-Z Workload test write source buffer size, specified as (K/M/G). Used to supply random data for writes, which is a good idea for SQL Server testing
-L Capture latency information during the test, which is a very good idea for testing SQL Server
-c Creates workload file(s) of the specified size, specified as (K/M/G)

Table 1: Basic command line parameters for DiskSpd

You will also want to specify the test file location and the file name for the results at the end of the line. Here is an example command line:

diskspd –b8K –d30 –o4 –t8 –h –r –w25 –L –Z1G –c20G T:\iotest.dat > DiskSpeedResults.txt

This example command line will run a 30 second random I/O test using a 20GB test file located on the T: drive, with a 25% write and 75% read ratio, with an 8K block size. It will use eight worker threads, each with four outstanding I/Os and a write entropy value seed of 1GB. It will save the results of the test to a text file called DiskSpeedResults.txt. This is a pretty good set of parameters for a SQL Server OLTP workload.

Figure 1: Example command line for DiskSpdFigure 1: Example command line for DiskSpd

Running the test starts with a default five second warm up time (before any measurements actually start), and then the actual test will run for the specified duration in seconds with a default cool down time of zero seconds. When the test finishes, DiskSpd will provide a description of the test and the detailed results. By default this will be a simple text summary in a text file using the file name that you specified, which will be in the same directory as the diskspd executable.

Here are what the results look like for this particular test run on my workstation.

Figure 2: Example DiskSpd test resultsFigure 2: Example DiskSpd test results

The first section of the results gives you the exact command line that was used for the test, then specifies all of the input parameters that were used for the test run (which include the default values that may not have been specified in the actual command line). Next, the test results are shown starting with the actual test time, thread count, and logical processor count. The CPU section shows the CPU utilization for each logical processor, including user and kernel time, for the test interval.

The more interesting part of the test results comes next. You get the total bytes, total I/Os, MB/second, I/O per second (IOPS), and your average latency in milliseconds. These results are broken out for each thread (four in our case), with separate sections in the results for Total IO, Read IO, and Write IO. The results for each thread should be very similar in most cases. Rather than initially focusing on the absolute values for each measurement, I like to compare the values when I run the same test on different logical drives, (after changing the location of the test file in the command line), which lets you compare the performance for each logical drive.

The last section of the test results is even more interesting. It shows a percentile analysis of the distribution of the latency test results starting from the minimum value in milliseconds going up to the maximum value in milliseconds, broken out for reads, writes, and total latency. The “nines” in the %-ile column refer to the number of nines, where 3-nines means 99.9, 4-nines means 99.99, etc. The reason why the values for the higher percentile rows are the same is because this test had a relatively low number of total operations. If you want to accurately characterize the higher percentiles, you will have to run a longer duration test that generates a higher number of separate I/O operations.

What you want to look for in these results is the point where the values make a large jump. For example, in this test we can see that 99% of the reads had a latency of 1.832 milliseconds or less.

Figure 3: Latency results distributionFigure 3: Latency results distribution

As you can see, running DiskSpd is actually pretty simple once you understand what the basic parameters mean and how they are used. Not only can you run DiskSpd from an old-fashioned command line, you can also run it using PowerShell. DiskSpd also gives you a lot more detailed information than you get from SQLIO. The more complicated part of using DiskSpd is analyzing and interpreting the results, which is something I will cover in a future article.

  24 Responses to “Using Microsoft DiskSpd to Test Your Storage Subsystem”

  1. Is DiskSpd a rebranding of SQLIO?

  2. Nice post Glenn. Nice tool.

  3. Ismail,
    No, DiskSpd is not a rebranding of SQLIO. It is a completely different testing tool.

    Mike,
    Thanks! Glad you liked it.

  4. Nice post. I like to use Diskspd for testing basic storage performance on slow SQL server systems. It is fast, easy to use, and the results are usually pretty clear: Very high latency, low number of IOPS, etc.

  5. Should the SQL Server be idle when this is used? I am looking to test new storage and want to benchmark our existing SAN in production and then compare to the same tests performed on evaluation storage that my organisation will be looking to purchase. Should I have the production SQL instances shutdown and no other work being performed on then whilst testing?

  6. Nice 1! Been having issues with our SAN. SAN Team will be reconfiguring for better results. Will be nice to do some before\after testing.

  7. MonkeyButler,

    That depends on what sort of test you decide to run. An intense DskSpd test that runs for more than a few seconds can definitely affect your storage subsystem in a negative way (by putting a heavy load on it). The I/O activity from your normal SQL Server workload will also affect your benchmark results. Ideally, you would want to run DskSpd testing when there is no production workload from SQL Server, to eliminate both possibilities. It would be a more valid test to compare your current system to the evaluation system.

  8. Whit,

    That is a good use case for DiskSpd. You can also run CrystalDiskMark before and after, just to have some more data points.

  9. Is it possible to run this on a server with more than 64 processors? I am receiving the following error.

    C:\…\amd64fre>diskspd -b8K -d120 -o4 -t8 -h -r -w25 -L -Z1G -c20G R:\io.dat > DiskSpeedResults.txt
    WARNING: Complete CPU utilization cannot currently be gathered within DISKSPD for this system.
        Use alternate mechanisms to gather this data such as perfmon/logman.
        Active KGroups 2 > 1 and/or processor count 80 > 64.
    Error opening file: ûd120 [2]

    There has been an error during threads execution
    Error generating I/O requests
    • @Maxwell I had the same problem and it was due to cut and paste from the web, try to manually retype the whole command line

  10. This is just what I needed, when I needed it! I've used SQLIO before, but the output of this is so much better. Thank you!

  11. Maxwell,

    From that error message, it appears that not working with more than 64 cores is a known issue or is by design. I have reached out to some people at Microsoft to try to find out more information. Thanks!

    • Hi Glenn,
      it could be that it will not work with more than 64bits, but the specific "error accessing file" happened to me trying to cut and paste the commands from internet.
      The reference to "ûd120" (with a strange character near the d120 that is supposed to be -d120 parameter) helped me understanding that something was wrong in the parameters' handling

  12. Norberto,

    I think that Maxwell is having a different issue than you had. It looks like his system has 80 cores, and DiskSpd has a problem with that. It does not seem to have anything to do with 32-bit vs. 64-bit

    • After further testing, you are correct Glenn. While I did remedy the strange character issue by manually keying the command, the issue with > 64 CPU cores is real. Keep in mind the total core count is due to HyperThreading. The server has 40 physical cores and 80 logical processors. The tool works perfectly on a 48 logical core server. Thanks!

  13. diskspd is not only a freely available tool released by Microsoft, but the source is also freely available via GitHub:
    https://github.com/microsoft/diskspd
    That means that DiskSpd can only use publicly available APIs to obtain its information, and unfortunately, the API that is called to efficiently get CPU stats doesn't yet support > 64 CPUs.
    I'm told that this is being worked on in Windows Server 2016, but can't say for sure what form that will take.
    So, for now, systems beyond 64 cores will need to use Perfmon to gather CPU statistics during a run, but the other information should be accurate.

  14. Nice post Glenn. Would it possible to configure block size different for writes and reads? I have a SQL box subsystem to test with 25KB of writes and 46KB for reads and then measuring throughput in MB to compare it with old disk subsystem.
    I have thought to test it in two stages and then combine just want to know if there is already better way.

  15. The other comment I got from the guys that wrote diskspd is that you can probably get by with a lot smaller entropy seed. You're using -Z1G. Is there a particular reason for allocating a gig for the entropy buffer? SQLIO used a buffer of about 20MB.

  16. Just wanted to thank you for this post; I will be testing and incorporating it myself into my toolbag. Thanks.

  17. Can Diskspd generate all unique data or unique blocks of data with the diskspd test file? Is this the -Z parameter? Seems like FIO is the only tool capable of doing this.
    IOmeter can't generate unique data either.

  18. thx for the post. You screen shot shows -T4 which I think means 4 threads but you mention 8 in the comments preceeding the screen shot. The example does show -T8 however :)

  19. Thx! Very helpful post. Would you be able to provide a resource for ballpark metrics required for a healthy SQL environment? For example, what is the min IOPS or min MB/s for an OLTP SQL instance? Or guidance on 3/4/5 9s latency?

    Again, many thanks

 Leave a Reply

(required)

(required)