Hide Forgot
[Migrated from savannah BTS] - bug 26402 [https://savannah.nongnu.org/bugs/?26402] Wed 29 Apr 2009 10:23:09 PM GMT, original submission by Erick Tryzelaar <erickt>: I was debugging some poor performance issues with stripe across multiple machines, and I found a particularly bad edge case. I added a stripe volume, but with only one subvolume. This resulted in some terrible performance. If I just directly connect to the server, I can pull off about 50-70MB/s, but when I add the null-stripe, the performance drops to between 3-5MB/s. Reads, however, reach 112MB/s, which was the same we get without striping. -------------------------------------------------------------------------------- Thu 30 Apr 2009 04:45:21 AM GMT, comment #1 by Raghavendra <raghavendra>: what is the stripe size being used and the chunk sizes of writes? With (stripe-size < write-chunk-size) and only one suvolume, there can be degradation in performance since a single write will be split into multiple chunks which are serially handled in the server. Also, since optimal stripe size should enable each write to be split into equal chunks to be stored in subvolumes, it makes sense to set the stripe size equal to the write chunk size. -------------------------------------------------------------------------------- Fri 01 May 2009 11:19:00 PM GMT, comment #2 by Erick Tryzelaar <erickt>: I believe it was with just the default arguments, which I believe is a stripe-size of 128KB. How can I check what the write-chunk-size is? Here's my server.vol file: volume posix type storage/posix option directory /tmp/gluster end-volume volume locks type features/locks subvolumes posix end-volume volume io-threads type performance/io-threads option thread-count 16 subvolumes locks end-volume volume server type protocol/server option transport-type tcp option auth.addr.io-threads.allow * subvolumes io-threads end-volume and my client.vol file: volume machine01 type protocol/client option transport-type tcp option remote-host machine01 option remote-subvolume io-threads end-volume volume stripe type cluster/stripe subvolumes machine01 end-volume If I add a write-behind, the writes go up to 113MB/s: volume write-behind type performance/write-behind option cache-size 1MB # subvolumes stripe subvolumes read-ahead end-volume I've also compared different block sizes and their read-write performance, without any of the client performance translators: | block size | write | read | | 1GB | 62MB/s | 113MB/s | | 128MB | 62MB/s | 113MB/s | | 64MB | 64MB/s | 113MB/s | | 32MB | 60MB/s | 113MB/s | | 16MB | 55MB/s | 113MB/s | | 8MB | 50MB/s | 113MB/s | | 4MB | 45MB/s | 113MB/s | | 1MB | 23MB/s | 113MB/s | | 512KB | 12MB/s | 113MB/s | | 256KB | 8MB/s | 113MB/s | | 128KB | 4MB/s | 113MB/s | | 64KB | 2MB/s | 113MB/s | | default | 4MB/s | 113MB/s | So it appears that these performance issues really start happening around a block size of 16-32MB. I'm guessing that's still a much larger size than the write-chunk-size, and would correlate with some other performance problems I've seen with a stripe of 4 machines. With that, I'm seeing the overhead of striping limit the maximum amount a client can read unless the stripe-size is greater than 16MB. I'll run some numbers and file another bug about that.
Were the block-sizes were given as MB during benchmarking using dd (I am assuming dd was used here)? If so, B corresponds to 1000 bytes not 1024 bytes, which causes writes in non-standard offsets. writes at offsets multiples of 128K are more efficient in case of stripe. Hence the performance degradation. I checked with/without using B in the block size of dd, for blocksizes specified in terms of 1024 bytes, there was no performance degradation b/w stiped and non-striped glusterfs. As you've mentioned earlier with write-behind, there is a performance improvement. Write-behind is used to overcome performance degradation due to using non-standard offsets. So we are suggesting to use write-behind along with stripe. Can this bug be closed, with solution that stripe should be used with write-behind?
Sending a mail to Erick Tryzelaar <idadesub.net> Were the block-sizes were given as MB during benchmarking using dd (I am assuming dd was used here)? If so, B corresponds to 1000 bytes not 1024 bytes, which causes writes in non-standard offsets. writes at offsets multiples of 128K are more efficient in case of stripe. Hence the performance degradation. I checked with/without using B in the block size of dd, for blocksizes specified in terms of 1024 bytes, there was no performance degradation b/w stiped and non-striped glusterfs. As you've mentioned earlier with write-behind, there is a performance improvement. Write-behind is used to overcome performance degradation due to using non-standard offsets. So we are suggesting to use write-behind along with stripe. Can this bug be closed, with solution that stripe should be used with write-behind?
Resolving the bug, hoping that user is ok with the solution suggested