Created attachment 1234084 [details] replica 2 and replica 3 arbiter 1 tests before and after sharding Description of problem: Enabling sharding on gluster volumes seems to at least halve write performance. Version-Release number of selected component (if applicable): 3.8.6 How reproducible: Very Steps to Reproduce: 1. Create replica 2 (or replica 3 arbiter 1) volume 2. Fuse mount the volume and perform some write tests 3. Enable sharding Actual results: Observe slower write performance. Expected results: No or minimal decrease in write performance? Additional info: See attachment for write speed tests as requested by Ravi on internal mailing list. Bear in mind the hosts used for this test are actively used as backend VM storage for my RHEV cluster so the results vary a bit. What is clear whether using replica 2 or arbiter is that sharding dramatically reduces write speeds. For reference, I am able to achieve write speeds of around 300MB/s when writing directly to disk (no Gluster involved).
I should add some additional details... store01 and store02 have 7x 4TB disks in a hardware RAID5 setup. Each node has 2x10Gb NICs in bonded mode. store03 has a 2x 1TB disk in software RAID1 with a 1Gb link. It's sole purpose is to serve as the arbiter node. hv01 (used to test fuse mount) has a 10Gb link.
Are there some additional tests I can run to provide better debug info?
(In reply to David Galloway from comment #2) > Are there some additional tests I can run to provide better debug info? Hi, So I wrote a quick patch - http://review.gluster.org/16399 - just last Friday to add some optimisations in sharding for better perf. The patch needs some amount of testing. I didn't associate that patch with this bug id because I still need to ensure that it works, so I didn't want to raise the bug reporter's hope (yet) that there is a fix available. If/when the idea is found to work through testing, would you be willing to try out the patch and provide feedback on any perf improvement from the fix? :) -Krutika
(In reply to Krutika Dhananjay from comment #3) > (In reply to David Galloway from comment #2) > > Are there some additional tests I can run to provide better debug info? > > Hi, > > So I wrote a quick patch - http://review.gluster.org/16399 - just last > Friday to add some optimisations in sharding for better perf. The patch > needs some amount of testing. I didn't associate that patch with this bug id > because I still need to ensure that it works, so I didn't want to raise the > bug reporter's hope (yet) that there is a fix available. > > If/when the idea is found to work through testing, would you be willing to > try out the patch and provide feedback on any perf improvement from the fix? > :) > With a little hand-holding, sure, I could probably try the patch out and provide some feedback.
Bump. We're hoping to set up a new cluster on SSD storage and would really like to see this bug resolved before we move VMs over to it.
Ben, I tested the latest patch set (https://review.gluster.org/#/c/16399/3) after some fixes and it passed most tests (except for the ones involving multiple clients where real-time stats are not gotten). I also tested it with VMs and launched 3 vms and installed OS and performed some IO inside of them and it all ran fine without any glitch. I'm waiting to hear from you on comment #9. Based on your response, I will send you the build link. -Krutika
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.