Description of problem: ------------------------- On a 3-way replicated volume mounted on a Windows client, one of the bricks was brought down. The write speed observed while copying a .iso file was substantially reduced as compared to the speeds observed with all bricks in the replica set being up. See results of the copy operations below - With all 3 bricks up -------------------- PS Z:\> robocopy C:\Users\shruti\Downloads\test Z:\ RHS.iso ------------------------------------------------------------------------ ROBOCOPY :: Robust File Copy for Windows ------------------------------------------------------------------------ Started : Wed Jun 03 16:36:45 2015 Source : C:\Users\shruti\Downloads\test\ Dest : Z:\ Files : RHS.iso Options : /COPY:DAT /R:1000000 /W:30 ------------------------------------------------------------------------ 1 C:\Users\shruti\Downloads\test\ 100% New File 1.5 g RHS.iso ------------------------------------------------------------------------ Total Copied Skipped Mismatch FAILED Extras Dirs : 1 0 1 0 0 0 Files : 1 1 0 0 0 0 Bytes : 1.566 g 1.566 g 0 0 0 0 Times : 0:01:21 0:01:21 0:00:00 0:00:00 Speed : 20750024 Bytes/sec. Speed : 1187.325 MegaBytes/min. Ended : Wed Jun 03 16:38:06 2015 With one brick in replica set brought down ------------------------------------------- PS Z:\> robocopy C:\Users\shruti\Downloads\test Z:\ RHS.iso ------------------------------------------------------------------------------- ROBOCOPY :: Robust File Copy for Windows ------------------------------------------------------------------------------- Started : Wed Jun 03 16:39:38 2015 Source : C:\Users\shruti\Downloads\test\ Dest : Z:\ Files : RHS.iso Options : /COPY:DAT /R:1000000 /W:30 ------------------------------------------------------------------------------ 1 C:\Users\shruti\Downloads\test\ 100% New File 1.5 g RHS.iso ------------------------------------------------------------------------------ Total Copied Skipped Mismatch FAILED Extras Dirs : 1 0 1 0 0 0 Files : 1 1 0 0 0 0 Bytes : 1.566 g 1.566 g 0 0 0 0 Times : 0:03:26 0:03:26 0:00:00 0:00:00 Speed : 8158524 Bytes/sec. Speed : 466.834 MegaBytes/min. Ended : Wed Jun 03 16:43:05 2015 Such a drop in performance is not observed with a 2-way replicated volume in the same setup. Version-Release number of selected component (if applicable): ---------------------------------------------------------------- glusterfs-3.7.0-3.el6rhs.x86_64 samba-4.1.17-7.el6rhs.x86_64 How reproducible: ------------------ 100% Steps to Reproduce: -------------------- 1. Created a 6x3 volume and mounted on a Windows 7 client machine. 2. Copied a .iso file from the Windows local drive to the gluster share. 3. Brought down one brick of the replica set and ran the copy operation again. Actual results: ----------------- There is a significant drop in write speeds when results from step 3 are compared to results from step 2. Write speed with one brick down reduce to roughly 40% of the speed observed with all bricks up. Expected results: ------------------ Expected behaviour is that write speed should not be affected if one brick in a replica set is down. Additional info: ----------------- See attached output of `gluster volume profile info' command at various instances during the copy operation.
Created attachment 1034293 [details] profile info with all bricks up
Created attachment 1034295 [details] profile info with one brick down - 1
Created attachment 1034297 [details] profile info with one brick down - 2
Created attachment 1034298 [details] profile info with one brick down - 3
Created attachment 1034300 [details] profile info with one brick down - 4
I seem to remember that AFR started doing FSYNC on every write if a subvolume went down. In the past, most AFR volumes had only 2 subvolumes, so if you lost one of them, you were in danger of losing data if the 2nd one went down. However, with 3-way replication, it may be too pessimistic - you still have 2 out of 3 subvolumes so why start doing FSYNC at that point? suggested actions: run Gluster volume profile and see if FSYNC FOP is more frequent with 1 AFR subvolume down. See if FSYNC-per-WRITE code is still there in AFR and enabled. If so, can we can fix code to only do FSYNC-per-WRITE if we are down to a single AFR subvolume, regardless of whether we are doing 2- or 3-way replication.
Is this really a blocker? I would suggest that this can slip out to an asynch update.
Doc text is edited. Please sign off to be included in Known Issues.
Hi Monti, I have made a slight modification to the doc text. Please update to this text.
Verified on - glusterfs-3.7.1-15.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html