Bug 761799 (GLUSTER-67) - poor stripe write performance with one subvolume
Summary: poor stripe write performance with one subvolume
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-67
Product: GlusterFS
Classification: Community
Component: stripe
Version: mainline
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-25 06:54 UTC by Basavanagowda Kanur
Modified: 2009-08-04 05:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Basavanagowda Kanur 2009-06-25 06:54:57 UTC
[Migrated from savannah BTS] - bug 26402 [https://savannah.nongnu.org/bugs/?26402]

Wed 29 Apr 2009 10:23:09 PM GMT, original submission by Erick Tryzelaar <erickt>:

I was debugging some poor performance issues with stripe across multiple machines, and I found a particularly bad edge case. I added a stripe volume, but with only one subvolume. This resulted in some terrible performance. If I just directly connect to the server, I can pull off about 50-70MB/s, but when I add the null-stripe, the performance drops to between 3-5MB/s. Reads, however, reach 112MB/s, which was the same we get without striping.

--------------------------------------------------------------------------------
Thu 30 Apr 2009 04:45:21 AM GMT, comment #1 by Raghavendra <raghavendra>:

what is the stripe size being used and the chunk sizes of writes? With (stripe-size < write-chunk-size) and only one suvolume, there can be degradation in performance since a single write will be split into multiple chunks which are serially handled in the server.

Also, since optimal stripe size should enable each write to be split into equal chunks to be stored in subvolumes, it makes sense to set the stripe size equal to the write chunk size.

--------------------------------------------------------------------------------
Fri 01 May 2009 11:19:00 PM GMT, comment #2 by Erick Tryzelaar <erickt>:

I believe it was with just the default arguments, which I believe is a stripe-size of 128KB. How can I check what the write-chunk-size is? Here's my server.vol file:

volume posix
type storage/posix
option directory /tmp/gluster
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume io-threads
type performance/io-threads
option thread-count 16
subvolumes locks
end-volume

volume server
type protocol/server
option transport-type tcp
option auth.addr.io-threads.allow *
subvolumes io-threads
end-volume

and my client.vol file:

volume machine01
type protocol/client
option transport-type tcp
option remote-host machine01
option remote-subvolume io-threads
end-volume

volume stripe
type cluster/stripe
subvolumes machine01
end-volume

If I add a write-behind, the writes go up to 113MB/s:

volume write-behind
type performance/write-behind
option cache-size 1MB
# subvolumes stripe
subvolumes read-ahead
end-volume

I've also compared different block sizes and their read-write performance, without any of the client performance translators:

| block size | write | read |
| 1GB | 62MB/s | 113MB/s |
| 128MB | 62MB/s | 113MB/s |
| 64MB | 64MB/s | 113MB/s |
| 32MB | 60MB/s | 113MB/s |
| 16MB | 55MB/s | 113MB/s |
| 8MB | 50MB/s | 113MB/s |
| 4MB | 45MB/s | 113MB/s |
| 1MB | 23MB/s | 113MB/s |
| 512KB | 12MB/s | 113MB/s |
| 256KB | 8MB/s | 113MB/s |
| 128KB | 4MB/s | 113MB/s |
| 64KB | 2MB/s | 113MB/s |
| default | 4MB/s | 113MB/s |

So it appears that these performance issues really start happening around a block size of 16-32MB. I'm guessing that's still a much larger size than the write-chunk-size, and would correlate with some other performance problems I've seen with a stripe of 4 machines. With that, I'm seeing the overhead of striping limit the maximum amount a client can read unless the stripe-size is greater than 16MB. I'll run some numbers and file another bug about that.

Comment 1 Raghavendra G 2009-07-28 03:20:12 UTC
Were the block-sizes were given as MB during benchmarking using dd (I am assuming dd was used here)? If so, B corresponds to 1000 bytes not 1024 bytes, which causes writes in non-standard offsets. writes at offsets multiples of 128K are more efficient in case of stripe. Hence the performance degradation. I checked with/without using B in the block size of dd, for blocksizes specified in terms of 1024 bytes, there was no performance degradation b/w stiped and non-striped glusterfs.

As you've mentioned earlier with write-behind, there is a performance improvement. Write-behind is used to overcome performance degradation due to using non-standard offsets. So we are suggesting to use write-behind along with stripe.

Can this bug be closed, with solution that stripe should be used with write-behind?

Comment 2 Raghavendra G 2009-07-28 03:23:50 UTC
Sending a mail to Erick Tryzelaar <idadesub.net> 

 Were the block-sizes were given as MB during benchmarking using dd (I am
 assuming dd was used here)? If so, B corresponds to 1000 bytes not 1024 bytes,
 which causes writes in non-standard offsets. writes at offsets multiples of
 128K are more efficient in case of stripe. Hence the performance degradation. I
 checked with/without using B in the block size of dd, for blocksizes specified
in terms of 1024 bytes, there was no performance degradation b/w stiped and
 non-striped glusterfs.
 
 As you've mentioned earlier with write-behind, there is a performance
 improvement. Write-behind is used to overcome performance degradation due to
 using non-standard offsets. So we are suggesting to use write-behind along with
 stripe.
 
 Can this bug be closed, with solution that stripe should be used with
 write-behind?

Comment 3 Raghavendra G 2009-08-04 02:12:43 UTC
Resolving the bug, hoping that user is ok with the solution suggested


Note You need to log in before you can comment on or make changes to this bug.