Copyright: | Copyright © 2008 Jon Nelson |
---|---|
Date: | Jul 2008 |
This is an expansion of a previous post ( http://pycurious.blogspot.com/2007/12/some-raid10-performance-numbers.html ).
Since that time, I have redeployed using RAID10,f2. The redeployment went very well, but I'm not getting the performance I quite desired. More on that in another post. In the meantime, I slightly enhanced one of my benchmark scripts and decided to give it a go again.
1 Hardware and Setup
- the kernel is 2.6.25,5, openSUSE 11.0 "default" kernel for x86-64
- the CPU is an AMD x86-64, x2 3600+ in power-saving mode (1000 MHz)
- the motherboard is an EPoX MF570SLI which uses the nVidia MCP55 SATA controller (PCIe).
- in contrast to an earlier test, this time thera are 4 drives - 4 different makes of SATA II, 7200 rpm drives.
- each drive is capable of not much more than 80 MB/s (at best - the outermost tracks) and, on average, more like 70 MB/s for the portions of the disk involved in these tests
- the raids are comprised of 4x 4GB partitions, all from the first 8G of the disk.
- the system was largely idle but is a live system
- the system has 1 GB of RAM
- in contrast to the earlier test, the 'cfq' scheduler was used. I forgot to change it.
- the stripe_sizes and caches, queues sizes, and flusher parameters were left at their defaults
2 Important Notes
the caches were dropped before each invocation of 'dd':
echo 3 > /proc/sys/vm/drop_cachesthe 'write' portion of the test used conv=fdatasync
I did not test filesystem performance. This is just about the edge capabilities of linux RAID in various configurations.
I did not use iflag=direct (which sets O_DIRECT)
I ran each test 5 times, taking the mean average.
3 Questions
Initially, I just wanted to run a bunch of tests and eyeball the results. It's easy to do that, and draw conclusions from the data. However, it is maybe more useful to ask yourself, "What questions can be answered?" Here are a few questions I came up with, and the answers I came up with:
What did you really test?
Basically I tested streaming read and write performance to a series of raid levels and formats, using different chunk sizes for each.
I did not want to use any filesystem which only gets in the way for this kind of test - I wasn't testing the filesystem, I was testing to see how different raid formats, layouts, and chunk sizes make a difference.
A future installment may include filesystem testing as well, which I find just as if not more important, however it's so much more variable that I'm not really sure much sense can be found in the noise.
Why didn't you include my-favorite-raid-level?
I only wanted to include raid levels for which there is some redundancy. I could have included raid 1+0 but my test script is not sufficiently smart for that. Perhaps I'll include that in a future installment.
Can I have the source to the test program?
Sure. I'll try to make it available if somebody asks, but it's really nothing special. Futhermore, it's my intent to refine it a bit to support filesystem testing (via bonnie++ or iozone, preferred) and so on.
When using raid5, does the format matter?
If you squint your eyes a bit, the write performance, regardless of format, were all pretty close. The read performance was more variable, but still did not vary all that much. Chunk size seemed to matter more. Left-symmetric did the best overall, however.
How is the performance graphed versus predicted?
Left to the reader to comment!
Did you do the readahead settings for the test?
No. I left them at their defaults.
What are you using to generate this?
I am using reStructuredText, combined with Pygments.
Which tool do you use to make graphs?
Google Charts (by way of pygooglechart), a bunch of shell and Python. I used flot previously.
How do the individual drives perform?
The drives are:
<6>ata3.00: ATA-7: Hitachi HDT725032VLA360, V54OA52A, max UDMA/133 <6>ata4.00: ATA-8: SAMSUNG HD321KJ, CP100-10, max UDMA7 <6>ata5.00: ATA-7: ST3320620AS, 3.AAK, max UDMA/133 <6>ata6.00: ATA-8: WDC WD3200AAKS-75VYA0, 12.01B02, max UDMA/133And their performance:
/dev/sdb: Timing buffered disk reads: 218 MB in 3.01 seconds = 72.44 MB/sec /dev/sdc: Timing buffered disk reads: 234 MB in 3.00 seconds = 77.92 MB/sec /dev/sdd: Timing buffered disk reads: 228 MB in 3.02 seconds = 75.60 MB/sec /dev/sde: Timing buffered disk reads: 234 MB in 3.02 seconds = 77.57 MB/sec
What difference does the scheduler make?
As can clearly be seen on the RAID5 graphs, the IO scheduler can make a big difference. Using cfq or nooop, reads start out almost a full point faster than the others, and writes are 1/2 point faster.
On the other hand, for RAID6, the scheduler doesn't seem to make much difference at all. At least for streaming reads/writes, which is all I'm testing here.
For RAID10,n2 and RAID10,o2 the story is the same as for RAID6, but there is some impact (up to 1.0 points!) for RAID10,f2.
What revisions have you made to this document?
I re-ran the tests to include 2048K chunk sizes, and removed 128K as it wasn't very interesting and it cluttered up the graphs.
I also re-ran the entire set of tests for the other three schedulers, noop, anticipatory, and deadline.
I re-did the graphs using the Google Charts API (by way of pygooglechart) instead of using flot. There was nothing wrong with flot, in fact I found the software really nice to use, but some people found the google charts "prettier" and it's somewhat easier for me to use.
4 Unanswered Questions
While I don't have the data in this article, I did originally perform these tests on 2.6.22.18. The results were rather noisier, and in most cases a bit worse.
Why aren't raid10,f2 reads getting closer to 4.0x ?
What's with the strange drop in performance at 512K chunk sizes for RAID10,f2 for the deadline and noop schedulers, only to rise again at 1024K (and then drop at 2048K)?
Why are raid10,o2 reads so AWFUL?
Neil Brown was kind enough to suggest re-running with a larger chunk size, which I did.
The read performance did, indeed, perform better. Up to the 3.0 mark, in fact.
Why do raid6 reads behave the way they do? I would have expected a more linear graph - the raid6 write graph is very smooth.
From 64 to 256k chunk size, there is little change (in either direction, for reads or writes) but at 512K the reads really improve and continue to do so as the chunk size increases.
What should the theoretical performance of the various raid levels and formats look like?
For raid10,f2 I would suspect that 4.0 would be perfect (for reads), and for sustained writes something like 1.5.
I get 1.5 like this:
the avg. speed of writing a given chunk of data should look like this:
avg of writing to outer track + writing to inner track -> (70 + 35) / 2.0, (assuming inner track is 1/2 the speed of outer tracks) and theoretically we could write to 2 devices at a time, so... (( 70 + 35 ) / 2.0) * 2.0 / 70.0 = 1.5x.
In reality, we do a bit better than that, probably due to the fact that I'm not using the whole disk and therefore the speed of the inner tracks of the region I'm actually using is greater than would otherwise be true.
5 Tables, Charts n Graphs
The following results are expressed in terms of a single with (a baseline) with 1.0 being the speed of a single drive (about 70MB/s).
scheduler | level | layout | chunk | writing | reading |
---|---|---|---|---|---|
cfq | raid10 | f2 | 64 | 1.48 | 3.01 |
cfq | raid10 | f2 | 128 | 1.49 | 3.88 |
cfq | raid10 | f2 | 256 | 1.50 | 3.68 |
cfq | raid10 | f2 | 512 | 1.52 | 3.65 |
cfq | raid10 | f2 | 1024 | 1.47 | 3.76 |
cfq | raid10 | f2 | 2048 | 1.52 | 3.73 |
cfq | raid10 | n2 | 64 | 1.78 | 1.89 |
cfq | raid10 | n2 | 128 | 1.85 | 1.87 |
cfq | raid10 | n2 | 256 | 1.82 | 2.00 |
cfq | raid10 | n2 | 512 | 1.84 | 2.15 |
cfq | raid10 | n2 | 1024 | 1.83 | 2.42 |
cfq | raid10 | n2 | 2048 | 1.83 | 2.70 |
cfq | raid10 | o2 | 64 | 1.83 | 1.96 |
cfq | raid10 | o2 | 128 | 1.80 | 1.96 |
cfq | raid10 | o2 | 256 | 1.84 | 1.98 |
cfq | raid10 | o2 | 512 | 1.80 | 1.98 |
cfq | raid10 | o2 | 1024 | 1.83 | 2.49 |
cfq | raid10 | o2 | 2048 | 1.80 | 3.13 |
cfq | raid5 | left-asymmetric | 64 | 1.72 | 2.51 |
cfq | raid5 | left-asymmetric | 128 | 1.67 | 2.79 |
cfq | raid5 | left-asymmetric | 256 | 1.52 | 2.92 |
cfq | raid5 | left-asymmetric | 512 | 1.31 | 2.76 |
cfq | raid5 | left-asymmetric | 1024 | 1.06 | 3.44 |
cfq | raid5 | left-asymmetric | 2048 | 0.56 | 3.25 |
cfq | raid5 | left-symmetric | 64 | 1.74 | 2.71 |
cfq | raid5 | left-symmetric | 128 | 1.73 | 2.76 |
cfq | raid5 | left-symmetric | 256 | 1.55 | 2.97 |
cfq | raid5 | left-symmetric | 512 | 1.34 | 2.88 |
cfq | raid5 | left-symmetric | 1024 | 1.08 | 3.44 |
cfq | raid5 | left-symmetric | 2048 | 0.58 | 3.50 |
cfq | raid5 | right-asymmetric | 64 | 1.75 | 2.70 |
cfq | raid5 | right-asymmetric | 128 | 1.61 | 2.88 |
cfq | raid5 | right-asymmetric | 256 | 1.58 | 2.88 |
cfq | raid5 | right-asymmetric | 512 | 1.28 | 2.88 |
cfq | raid5 | right-asymmetric | 1024 | 1.04 | 3.25 |
cfq | raid5 | right-asymmetric | 2048 | 0.54 | 3.31 |
cfq | raid5 | right-symmetric | 64 | 1.75 | 2.79 |
cfq | raid5 | right-symmetric | 128 | 1.69 | 2.81 |
cfq | raid5 | right-symmetric | 256 | 1.56 | 2.88 |
cfq | raid5 | right-symmetric | 512 | 1.30 | 2.75 |
cfq | raid5 | right-symmetric | 1024 | 1.01 | 3.02 |
cfq | raid5 | right-symmetric | 2048 | 0.49 | 3.24 |
cfq | raid6 | 64 | 1.30 | 1.76 | |
cfq | raid6 | 128 | 1.24 | 1.96 | |
cfq | raid6 | 256 | 1.17 | 1.91 | |
cfq | raid6 | 512 | 1.04 | 2.70 | |
cfq | raid6 | 1024 | 0.87 | 2.92 | |
cfq | raid6 | 2048 | 0.60 | 3.31 | |
deadline | raid10 | f2 | 64 | 1.78 | 2.63 |
deadline | raid10 | f2 | 256 | 1.82 | 3.80 |
deadline | raid10 | f2 | 512 | 1.72 | 3.32 |
deadline | raid10 | f2 | 1024 | 1.75 | 3.61 |
deadline | raid10 | f2 | 2048 | 1.47 | 3.40 |
deadline | raid10 | n2 | 64 | 1.96 | 1.21 |
deadline | raid10 | n2 | 256 | 1.88 | 1.85 |
deadline | raid10 | n2 | 512 | 1.84 | 2.10 |
deadline | raid10 | n2 | 1024 | 1.89 | 2.41 |
deadline | raid10 | n2 | 2048 | 1.84 | 2.59 |
deadline | raid10 | o2 | 64 | 1.80 | 1.94 |
deadline | raid10 | o2 | 256 | 1.82 | 1.96 |
deadline | raid10 | o2 | 512 | 1.73 | 1.94 |
deadline | raid10 | o2 | 1024 | 1.87 | 2.63 |
deadline | raid10 | o2 | 2048 | 1.82 | 3.13 |
deadline | raid5 | left-asymmetric | 64 | 1.67 | 2.55 |
deadline | raid5 | left-asymmetric | 256 | 1.43 | 2.84 |
deadline | raid5 | left-asymmetric | 512 | 1.22 | 2.76 |
deadline | raid5 | left-asymmetric | 1024 | 1.04 | 3.27 |
deadline | raid5 | left-asymmetric | 2048 | 0.52 | 3.31 |
deadline | raid5 | left-symmetric | 64 | 1.61 | 2.32 |
deadline | raid5 | left-symmetric | 256 | 1.42 | 2.89 |
deadline | raid5 | left-symmetric | 512 | 1.26 | 2.89 |
deadline | raid5 | left-symmetric | 1024 | 1.08 | 3.14 |
deadline | raid5 | left-symmetric | 2048 | 0.55 | 3.31 |
deadline | raid5 | right-asymmetric | 64 | 1.68 | 2.15 |
deadline | raid5 | right-asymmetric | 256 | 1.50 | 2.88 |
deadline | raid5 | right-asymmetric | 512 | 1.23 | 2.83 |
deadline | raid5 | right-asymmetric | 1024 | 0.97 | 3.44 |
deadline | raid5 | right-asymmetric | 2048 | 0.47 | 3.24 |
deadline | raid5 | right-symmetric | 64 | 1.64 | 2.11 |
deadline | raid5 | right-symmetric | 256 | 1.50 | 2.84 |
deadline | raid5 | right-symmetric | 512 | 1.22 | 2.83 |
deadline | raid5 | right-symmetric | 1024 | 1.00 | 3.02 |
deadline | raid5 | right-symmetric | 2048 | 0.43 | 3.19 |
deadline | raid6 | 64 | 1.22 | 1.73 | |
deadline | raid6 | 256 | 1.20 | 1.75 | |
deadline | raid6 | 512 | 1.04 | 2.45 | |
deadline | raid6 | 1024 | 0.89 | 3.19 | |
deadline | raid6 | 2048 | 0.57 | 3.32 | |
anticipatory | raid10 | f2 | 64 | 1.62 | 2.59 |
anticipatory | raid10 | f2 | 128 | 1.59 | 3.50 |
anticipatory | raid10 | f2 | 256 | 1.61 | 3.46 |
anticipatory | raid10 | f2 | 512 | 1.65 | 3.73 |
anticipatory | raid10 | f2 | 1024 | 1.61 | 3.58 |
anticipatory | raid10 | f2 | 2048 | 1.47 | 3.80 |
anticipatory | raid10 | n2 | 64 | 1.87 | 1.21 |
anticipatory | raid10 | n2 | 128 | 1.83 | 1.45 |
anticipatory | raid10 | n2 | 256 | 1.83 | 1.90 |
anticipatory | raid10 | n2 | 512 | 1.83 | 2.20 |
anticipatory | raid10 | n2 | 1024 | 1.82 | 2.45 |
anticipatory | raid10 | n2 | 2048 | 1.82 | 2.70 |
anticipatory | raid10 | o2 | 64 | 1.82 | 1.91 |
anticipatory | raid10 | o2 | 128 | 1.85 | 1.94 |
anticipatory | raid10 | o2 | 256 | 1.86 | 2.05 |
anticipatory | raid10 | o2 | 512 | 1.80 | 1.96 |
anticipatory | raid10 | o2 | 1024 | 1.83 | 2.63 |
anticipatory | raid10 | o2 | 2048 | 1.78 | 3.19 |
anticipatory | raid5 | left-asymmetric | 64 | 1.62 | 2.42 |
anticipatory | raid5 | left-asymmetric | 128 | 1.59 | 2.63 |
anticipatory | raid5 | left-asymmetric | 256 | 1.48 | 2.79 |
anticipatory | raid5 | left-asymmetric | 512 | 1.32 | 2.88 |
anticipatory | raid5 | left-asymmetric | 1024 | 1.10 | 3.37 |
anticipatory | raid5 | left-asymmetric | 2048 | 0.54 | 3.25 |
anticipatory | raid5 | left-symmetric | 64 | 1.67 | 2.49 |
anticipatory | raid5 | left-symmetric | 128 | 1.62 | 2.76 |
anticipatory | raid5 | left-symmetric | 256 | 1.52 | 2.83 |
anticipatory | raid5 | left-symmetric | 512 | 1.32 | 2.76 |
anticipatory | raid5 | left-symmetric | 1024 | 1.10 | 3.32 |
anticipatory | raid5 | left-symmetric | 2048 | 0.58 | 3.25 |
anticipatory | raid5 | right-asymmetric | 64 | 1.67 | 2.17 |
anticipatory | raid5 | right-asymmetric | 128 | 1.55 | 2.63 |
anticipatory | raid5 | right-asymmetric | 256 | 1.48 | 2.76 |
anticipatory | raid5 | right-asymmetric | 512 | 1.30 | 2.92 |
anticipatory | raid5 | right-asymmetric | 1024 | 1.09 | 3.37 |
anticipatory | raid5 | right-asymmetric | 2048 | 0.52 | 3.37 |
anticipatory | raid5 | right-symmetric | 64 | 1.72 | 2.19 |
anticipatory | raid5 | right-symmetric | 128 | 1.67 | 2.63 |
anticipatory | raid5 | right-symmetric | 256 | 1.47 | 2.88 |
anticipatory | raid5 | right-symmetric | 512 | 1.32 | 2.88 |
anticipatory | raid5 | right-symmetric | 1024 | 1.07 | 3.02 |
anticipatory | raid5 | right-symmetric | 2048 | 0.47 | 3.20 |
anticipatory | raid6 | 64 | 1.26 | 1.75 | |
anticipatory | raid6 | 128 | 1.22 | 1.67 | |
anticipatory | raid6 | 256 | 1.19 | 1.77 | |
anticipatory | raid6 | 512 | 1.03 | 2.59 | |
anticipatory | raid6 | 1024 | 0.91 | 3.08 | |
anticipatory | raid6 | 2048 | 0.58 | 3.24 | |
noop | raid10 | f2 | 64 | 1.40 | 2.71 |
noop | raid10 | f2 | 256 | 1.42 | 3.80 |
noop | raid10 | f2 | 512 | 1.42 | 3.38 |
noop | raid10 | f2 | 1024 | 1.42 | 3.65 |
noop | raid10 | f2 | 2048 | 1.46 | 3.38 |
noop | raid10 | n2 | 64 | 1.84 | 1.21 |
noop | raid10 | n2 | 256 | 1.83 | 1.90 |
noop | raid10 | n2 | 512 | 1.85 | 2.18 |
noop | raid10 | n2 | 1024 | 1.85 | 2.45 |
noop | raid10 | n2 | 2048 | 1.83 | 2.55 |
noop | raid10 | o2 | 64 | 1.82 | 1.90 |
noop | raid10 | o2 | 256 | 1.85 | 1.92 |
noop | raid10 | o2 | 512 | 1.80 | 1.97 |
noop | raid10 | o2 | 1024 | 1.62 | 2.63 |
noop | raid10 | o2 | 2048 | 1.78 | 3.13 |
noop | raid5 | left-asymmetric | 64 | 1.75 | 2.63 |
noop | raid5 | left-asymmetric | 256 | 1.62 | 2.92 |
noop | raid5 | left-asymmetric | 512 | 1.37 | 2.92 |
noop | raid5 | left-asymmetric | 1024 | 1.09 | 3.32 |
noop | raid5 | left-asymmetric | 2048 | 0.54 | 3.50 |
noop | raid5 | left-symmetric | 64 | 1.78 | 2.20 |
noop | raid5 | left-symmetric | 256 | 1.62 | 2.88 |
noop | raid5 | left-symmetric | 512 | 1.37 | 2.88 |
noop | raid5 | left-symmetric | 1024 | 1.12 | 3.25 |
noop | raid5 | left-symmetric | 2048 | 0.58 | 3.37 |
noop | raid5 | right-asymmetric | 64 | 1.78 | 2.23 |
noop | raid5 | right-asymmetric | 256 | 1.61 | 2.97 |
noop | raid5 | right-asymmetric | 512 | 1.38 | 2.89 |
noop | raid5 | right-asymmetric | 1024 | 1.04 | 3.30 |
noop | raid5 | right-asymmetric | 2048 | 0.52 | 3.25 |
noop | raid5 | right-symmetric | 64 | 1.78 | 2.29 |
noop | raid5 | right-symmetric | 256 | 1.65 | 2.84 |
noop | raid5 | right-symmetric | 512 | 1.38 | 2.92 |
noop | raid5 | right-symmetric | 1024 | 1.09 | 3.03 |
noop | raid5 | right-symmetric | 2048 | 0.47 | 3.19 |
noop | raid6 | 64 | 1.29 | 1.72 | |
noop | raid6 | 256 | 1.21 | 1.84 | |
noop | raid6 | 512 | 1.05 | 2.56 | |
noop | raid6 | 1024 | 0.88 | 3.08 | |
noop | raid6 | 2048 | 0.61 | 3.31 |
7 comments:
You can also try:
RAID5 via Google Charts,
RAID6, and
RAID10.
Why are raid10,o2 reads so AWFUL?
to get a read boost with o2, you need to have chunks large enough that they totally
cover one or more cylinders. That way
the drive can seek over skipped chucks
rather than read over them.
If your chunks are exactly cylinder sized and perfectly aligned you would get close to a factor of 2. But perfect alignment is impossible with today's drives. So the best you can get is having the chunk size a little over twice the cylinder size. Then you will also skip one cylinder and sometimes two. If your chunksize is between 1 and 2 cylinders you will sometimes skip one cylinder, and so get a partial speedup. I think you are seeing that with the chunksize of 1024. Try 2048!
neilbrown: I adjusted the configuration to include 2048 (and exclude 128) and re-ran the tests. I'll be putting the results up in the next day or so!
I'd be interested in a graph where the best performing raid5, raid6, raid10 configs were stacked up against each other.
Thanks,
Leif
I was also thinking that another graph: all of the raid levels and layouts as the xaxis for just one chunk size. Probably 64, 512, and 2048K.
broken images!
the html page is ../2008/07/.. and the img files are referred to as being in a subdirectory. But actually the image files are in a subdirectory of ../2007/08/.. so they don't appear!
Broken images fixed.
Post a Comment