Bug 1406547 - Poor write performance with sharding enabled
Summary: Poor write performance with sharding enabled
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: sharding
Version: 3.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Krutika Dhananjay
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-20 21:42 UTC by David Galloway
Modified: 2017-11-07 10:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-07 10:37:11 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
replica 2 and replica 3 arbiter 1 tests before and after sharding (5.78 KB, text/plain)
2016-12-20 21:42 UTC, David Galloway
no flags Details

Description David Galloway 2016-12-20 21:42:29 UTC
Created attachment 1234084 [details]
replica 2 and replica 3 arbiter 1 tests before and after sharding

Description of problem:
Enabling sharding on gluster volumes seems to at least halve write performance.

Version-Release number of selected component (if applicable):
3.8.6

How reproducible:
Very

Steps to Reproduce:
1. Create replica 2 (or replica 3 arbiter 1) volume
2. Fuse mount the volume and perform some write tests
3. Enable sharding

Actual results:
Observe slower write performance.

Expected results:
No or minimal decrease in write performance?

Additional info:
See attachment for write speed tests as requested by Ravi on internal mailing list.

Bear in mind the hosts used for this test are actively used as backend VM storage for my RHEV cluster so the results vary a bit.  What is clear whether using replica 2 or arbiter is that sharding dramatically reduces write speeds.

For reference, I am able to achieve write speeds of around 300MB/s when writing directly to disk (no Gluster involved).

Comment 1 David Galloway 2016-12-21 22:20:59 UTC
I should add some additional details...

store01 and store02 have 7x 4TB disks in a hardware RAID5 setup.  Each node has 2x10Gb NICs in bonded mode.

store03 has a 2x 1TB disk in software RAID1 with a 1Gb link.  It's sole purpose is to serve as the arbiter node.

hv01 (used to test fuse mount) has a 10Gb link.

Comment 2 David Galloway 2017-01-17 17:19:45 UTC
Are there some additional tests I can run to provide better debug info?

Comment 3 Krutika Dhananjay 2017-01-18 16:19:40 UTC
(In reply to David Galloway from comment #2)
> Are there some additional tests I can run to provide better debug info?

Hi,

So I wrote a quick patch - http://review.gluster.org/16399 - just last Friday to add some optimisations in sharding for better perf. The patch needs some amount of testing. I didn't associate that patch with this bug id because I still need to ensure that it works, so I didn't want to raise the bug reporter's hope (yet) that there is a fix available.

If/when the idea is found to work through testing, would you be willing to try out the patch and provide feedback on any perf improvement from the fix? :)

-Krutika

Comment 4 David Galloway 2017-01-18 19:08:43 UTC
(In reply to Krutika Dhananjay from comment #3)
> (In reply to David Galloway from comment #2)
> > Are there some additional tests I can run to provide better debug info?
> 
> Hi,
> 
> So I wrote a quick patch - http://review.gluster.org/16399 - just last
> Friday to add some optimisations in sharding for better perf. The patch
> needs some amount of testing. I didn't associate that patch with this bug id
> because I still need to ensure that it works, so I didn't want to raise the
> bug reporter's hope (yet) that there is a fix available.
> 
> If/when the idea is found to work through testing, would you be willing to
> try out the patch and provide feedback on any perf improvement from the fix?
> :)
> 

With a little hand-holding, sure, I could probably try the patch out and provide some feedback.

Comment 7 David Galloway 2017-03-16 16:05:12 UTC
Bump.

We're hoping to set up a new cluster on SSD storage and would really like to see this bug resolved before we move VMs over to it.

Comment 10 Krutika Dhananjay 2017-03-20 10:42:44 UTC
Ben,

I tested the latest patch set (https://review.gluster.org/#/c/16399/3) after some fixes and it passed most tests (except for the ones involving multiple clients where real-time stats are not gotten). I also tested it with VMs and launched 3 vms and installed OS and performed some IO inside of them and it all ran fine without any glitch. I'm waiting to hear from you on comment #9. Based on your response, I will send you the build link.

-Krutika

Comment 12 Niels de Vos 2017-11-07 10:37:11 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.