Bug 761801 (GLUSTER-69)

Summary: poor stripe read/write performance with a stripe of 4 machines
Product: [Community] GlusterFS Reporter: Basavanagowda Kanur <gowda>
Component: stripeAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: mainlineCC: gluster-bugs, shehjart, vijay, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Basavanagowda Kanur 2009-06-25 07:02:53 UTC
[Migrated from savannah BTS] - bug 26418 [https://savannah.nongnu.org/bugs/?26418]

Sat 02 May 2009 12:12:23 AM GMT, original submission by Erick Tryzelaar <erickt>:

This appears to be related to bug #26402. I'm writing and reading a 1GB file full of random data into a stripe of 4 idle machines and seeing how they perform. I've discovered that I really have to pump up the block size to get the performance I'd expect without any of the performance translators:

server.vol:

volume posix
type storage/posix
option directory /tmp/gluster
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume io-threads
type performance/io-threads
option thread-count 16
subvolumes locks
end-volume

volume server
type protocol/server
option transport-type tcp
option auth.addr.io-threads.allow *
subvolumes io-threads
end-volume

client.vol:

volume machine01
type protocol/client
option transport-type tcp
option remote-host machine01
option remote-subvolume io-threads
end-volume

volume machine02
type protocol/client
option transport-type tcp
option remote-host machine02
option remote-subvolume io-threads
end-volume

volume machine03
type protocol/client
option transport-type tcp
option remote-host machine03
option remote-subvolume io-threads
end-volume

volume machine04
type protocol/client
option transport-type tcp
option remote-host machine04
option remote-subvolume io-threads
end-volume

volume stripe
type cluster/stripe
option block-size *:512KB
subvolumes machine01 machine02 machine03 machine04
end-volume

My test is just to stream the 1GB file to and from gluster with this:

rm /mnt/glusterfs/giant-1gb
cat /tmp/gluster-data/giant-1gb | pv > /mnt/glusterfs/giant-1gb
sleep 5
cat /mnt/glusterfs/giant-1gb | pv > /dev/null

Here are the results I had without the performance translators:

| block-size | write | read |
| 64MB | 64MB/s | 113MB/s |
| 32MB | 64MB/s | 113MB/s |
| 24MB | 65MB/s | 113MB/s |
| 20MB | 64MB/s | 113MB/s |
| 16MB | 64MB/s | 105MB/s |
| 8MB | 62MB/s | 73MB/s |
| 4MB | 62MB/s | 55MB/s |
| 1MB | 49MB/s | 32MB/s |
| 512KB | 54MB/s | 38MB/s |
| 256KB | 51MB/s | 45MB/s |
| 128KB | 48MB/s | 40MB/s |

While there's some noise in the results, I was able to get in this range of bandwidth usage repeatably. So somewhere around 8-16MB the read performance drops, and later for the writes , which are held up by the slow disks.

With read-ahead and write-behind (default options) in the client.vol file:

volume read-ahead
type performance/read-ahead
subvolumes stripe
end-volume

volume write-behind
type performance/write-behind
subvolumes read-ahead
end-volume

I'm getting:

| block-size | write | read |
| 64MB | 108MB/s | 113MB/s |
| 32MB | 106MB/s | 105MB/s |
| 16MB | 77MB/s | 87MB/s |
| 8MB | 89MB/s | 88MB/s |
| 4MB | 74MB/s | 105MB/s |
| 1MB | 63MB/s | 68MB/s |
| 512KB | 67MB/s | 55MB/s |
| 256KB | 61MB/s | 42MB/s |
| 128KB | 48MB/s | 56MB/s |

The overall performance for the smaller block-size is better, but it appeared to be a bit more noisy. The tests results weren't as repeatable as before as I'm guessing the caches are not being consistently used.

Is this expected behavior?

--------------------------------------------------------------------------------
Sat 02 May 2009 12:57:04 AM GMT, comment #1 by Erick Tryzelaar <erickt>:

I did find something interesting with the read-ahead page-count setting. If I set it to 1, then my performance stabilizes a lot at 90MB/s nearly exactly. Also I only need a write-behind cache of 4MB to saturate my link in most cases:

| block-size | write | read |
| 64MB | 113MB/s | 90MB/s |
| 32MB | 113MB/s | 90MB/s |
| 4MB | 113MB/s | 90MB/s |
| 256KB | 113MB/s | 90MB/s |
| 128KB | 113MB/s | 90MB/s |

This is very repeatable, I'm only within +-2MB/s of this every copy. With two pages:

| block-size | write | read |
| 256MB | 113MB/s | 113MB/s |
| 64MB | 113MB/s | 102MB/s |
| 32MB | 105MB/s | 105MB/s |
| 4MB | 113MB/s | 80MB/s | +- 10MB/s
| 256KB | 113MB/s | 50MB/s | +- 5MB/s
| 128KB | 113MB/s | 35MB/s | +- 5MB/s

It's much more noisy, and really affects performance pretty dramatically.

Comment 1 Amar Tumballi 2009-07-13 21:03:18 UTC
I suspect the bug we had in write-behind (O_RDONLY being 0) which used to disable the write-behind performance for 'open' files could have caused this. We need to run our benchmarking with newer version to see if this is still the case.

Comment 2 Amar Tumballi 2009-12-05 17:43:52 UTC
This bug needs benchmarking to be done over latest stripe. Which can be done after the 3.0.0 release. Currently the focus of testing is on stability of new codebase. Removing this bug from 'dependency' list of bug 762118.

I will work on this after the release.