Bug 1406547

Summary: Poor write performance with sharding enabled
Product: [Community] GlusterFS Reporter: David Galloway <dgallowa>
Component: shardingAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED EOL QA Contact: bugs <bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.8CC: bturner, bugs, kdhananj, sarumuga, sasundar
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-07 10:37:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
replica 2 and replica 3 arbiter 1 tests before and after sharding none

Description David Galloway 2016-12-20 21:42:29 UTC
Created attachment 1234084 [details]
replica 2 and replica 3 arbiter 1 tests before and after sharding

Description of problem:
Enabling sharding on gluster volumes seems to at least halve write performance.

Version-Release number of selected component (if applicable):
3.8.6

How reproducible:
Very

Steps to Reproduce:
1. Create replica 2 (or replica 3 arbiter 1) volume
2. Fuse mount the volume and perform some write tests
3. Enable sharding

Actual results:
Observe slower write performance.

Expected results:
No or minimal decrease in write performance?

Additional info:
See attachment for write speed tests as requested by Ravi on internal mailing list.

Bear in mind the hosts used for this test are actively used as backend VM storage for my RHEV cluster so the results vary a bit.  What is clear whether using replica 2 or arbiter is that sharding dramatically reduces write speeds.

For reference, I am able to achieve write speeds of around 300MB/s when writing directly to disk (no Gluster involved).

Comment 1 David Galloway 2016-12-21 22:20:59 UTC
I should add some additional details...

store01 and store02 have 7x 4TB disks in a hardware RAID5 setup.  Each node has 2x10Gb NICs in bonded mode.

store03 has a 2x 1TB disk in software RAID1 with a 1Gb link.  It's sole purpose is to serve as the arbiter node.

hv01 (used to test fuse mount) has a 10Gb link.

Comment 2 David Galloway 2017-01-17 17:19:45 UTC
Are there some additional tests I can run to provide better debug info?

Comment 3 Krutika Dhananjay 2017-01-18 16:19:40 UTC
(In reply to David Galloway from comment #2)
> Are there some additional tests I can run to provide better debug info?

Hi,

So I wrote a quick patch - http://review.gluster.org/16399 - just last Friday to add some optimisations in sharding for better perf. The patch needs some amount of testing. I didn't associate that patch with this bug id because I still need to ensure that it works, so I didn't want to raise the bug reporter's hope (yet) that there is a fix available.

If/when the idea is found to work through testing, would you be willing to try out the patch and provide feedback on any perf improvement from the fix? :)

-Krutika

Comment 4 David Galloway 2017-01-18 19:08:43 UTC
(In reply to Krutika Dhananjay from comment #3)
> (In reply to David Galloway from comment #2)
> > Are there some additional tests I can run to provide better debug info?
> 
> Hi,
> 
> So I wrote a quick patch - http://review.gluster.org/16399 - just last
> Friday to add some optimisations in sharding for better perf. The patch
> needs some amount of testing. I didn't associate that patch with this bug id
> because I still need to ensure that it works, so I didn't want to raise the
> bug reporter's hope (yet) that there is a fix available.
> 
> If/when the idea is found to work through testing, would you be willing to
> try out the patch and provide feedback on any perf improvement from the fix?
> :)
> 

With a little hand-holding, sure, I could probably try the patch out and provide some feedback.

Comment 7 David Galloway 2017-03-16 16:05:12 UTC
Bump.

We're hoping to set up a new cluster on SSD storage and would really like to see this bug resolved before we move VMs over to it.

Comment 10 Krutika Dhananjay 2017-03-20 10:42:44 UTC
Ben,

I tested the latest patch set (https://review.gluster.org/#/c/16399/3) after some fixes and it passed most tests (except for the ones involving multiple clients where real-time stats are not gotten). I also tested it with VMs and launched 3 vms and installed OS and performed some IO inside of them and it all ran fine without any glitch. I'm waiting to hear from you on comment #9. Based on your response, I will send you the build link.

-Krutika

Comment 12 Niels de Vos 2017-11-07 10:37:11 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.