Bug 1255698 - Write performance from a Windows client on 3-way replicated volume decreases substantially when one brick in the replica set is brought down
Write performance from a Windows client on 3-way replicated volume decreases ...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.7.3
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Ravishankar N
: Triaged
Depends On: 1227759 1250170
Blocks: 1216951 1223636 glusterfs-3.7.4
  Show dependency treegraph
 
Reported: 2015-08-21 07:06 EDT by Ravishankar N
Modified: 2015-09-09 05:40 EDT (History)
0 users

See Also:
Fixed In Version: glusterfs-3.7.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1250170
Environment:
Last Closed: 2015-09-09 05:40:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ravishankar N 2015-08-21 07:06:14 EDT
Description of problem:
-------------------------

On a 3-way replicated volume mounted on a Windows client, one of the bricks was brought down. The write speed observed while copying a .iso file was substantially reduced as compared to the speeds observed with all bricks in the replica set being up.

See results of the copy operations below -

With all 3 bricks up
--------------------

PS Z:\> robocopy C:\Users\shruti\Downloads\test Z:\ RHS.iso

------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows
------------------------------------------------------------------------

  Started : Wed Jun 03 16:36:45 2015

   Source : C:\Users\shruti\Downloads\test\
     Dest : Z:\

    Files : RHS.iso

  Options : /COPY:DAT /R:1000000 /W:30

------------------------------------------------------------------------

                           1    C:\Users\shruti\Downloads\test\
100%        New File               1.5 g        RHS.iso

------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :         1         0         1         0         0         0
   Files :         1         1         0         0         0         0
   Bytes :   1.566 g   1.566 g         0         0         0         0
   Times :   0:01:21   0:01:21                       0:00:00   0:00:00


   Speed :            20750024 Bytes/sec.
   Speed :            1187.325 MegaBytes/min.

   Ended : Wed Jun 03 16:38:06 2015


With one brick in replica set brought down
-------------------------------------------

PS Z:\> robocopy C:\Users\shruti\Downloads\test Z:\ RHS.iso

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows
-------------------------------------------------------------------------------

  Started : Wed Jun 03 16:39:38 2015

   Source : C:\Users\shruti\Downloads\test\
     Dest : Z:\

    Files : RHS.iso

  Options : /COPY:DAT /R:1000000 /W:30

------------------------------------------------------------------------------

                           1    C:\Users\shruti\Downloads\test\
100%        New File               1.5 g        RHS.iso

------------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :         1         0         1         0         0         0
   Files :         1         1         0         0         0         0
   Bytes :   1.566 g   1.566 g         0         0         0         0
   Times :   0:03:26   0:03:26                       0:00:00   0:00:00


   Speed :             8158524 Bytes/sec.
   Speed :             466.834 MegaBytes/min.

   Ended : Wed Jun 03 16:43:05 2015

Such a drop in performance is not observed with a 2-way replicated volume in the same setup.

Version-Release number of selected component (if applicable):
----------------------------------------------------------------
glusterfs-3.7.0-3.el6rhs.x86_64
samba-4.1.17-7.el6rhs.x86_64

How reproducible:
------------------
100%

Steps to Reproduce:
--------------------
1. Created a 6x3 volume and mounted on a Windows 7 client machine.
2. Copied a .iso file from the Windows local drive to the gluster share.
3. Brought down one brick of the replica set and ran the copy operation again.

Actual results:
-----------------

There is a significant drop in write speeds when results from step 3 are compared to results from step 2. Write speed with one brick down reduce to roughly 40% of the speed observed with all bricks up.

Expected results:
------------------

Expected behaviour is that write speed should not be affected if one brick in a replica set is down.

--- Additional comment from Ravishankar N on 2015-08-04 12:09:36 EDT ---

Patch: http://review.gluster.org/#/c/11827/

--- Additional comment from Anand Avati on 2015-08-05 13:00:35 EDT ---

REVIEW: http://review.gluster.org/11827 (afr: modify afr_txn_nothing_failed()) posted (#2) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Ben England on 2015-08-09 21:33:50 EDT ---

adding perf team members to cc list
Comment 1 Anand Avati 2015-08-21 07:08:53 EDT
REVIEW: http://review.gluster.org/11985 (afr: modify afr_txn_nothing_failed()) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar@redhat.com)
Comment 2 Anand Avati 2015-08-27 02:09:15 EDT
REVIEW: http://review.gluster.org/11985 (afr: modify afr_txn_nothing_failed()) posted (#2) for review on release-3.7 by Ravishankar N (ravishankar@redhat.com)
Comment 3 Anand Avati 2015-08-31 02:34:23 EDT
REVIEW: http://review.gluster.org/11985 (afr: modify afr_txn_nothing_failed()) posted (#4) for review on release-3.7 by Vijay Bellur (vbellur@redhat.com)
Comment 4 Anand Avati 2015-08-31 11:03:13 EDT
COMMIT: http://review.gluster.org/11985 committed in release-3.7 by Kaushal M (kaushal@redhat.com) 
------
commit 7924eb1a11fe0b1443903a69b7e93e4767061064
Author: Ravishankar N <ravishankar@redhat.com>
Date:   Tue Aug 4 18:37:47 2015 +0530

    afr: modify afr_txn_nothing_failed()
    
    Backport of http://review.gluster.org/#/c/11827/
    
    In an AFR transaction, we need to consider something as failed only if the
    failure (either in the pre-op or the FOP phase) occurs on the bricks on which a
    transaction lock was obtained.
    
    Without this, we would end up considering the transaction as failure even on the
    bricks on which the lock was not obtained, resulting in unnecessary fsyncs
    during the post-op phase of every write transaction for non-appending writes.
    
    Change-Id: Iee79e5d85dc7b4c41459d8bdd04a8454bdaf9a9d
    BUG: 1255698
    Signed-off-by: Ravishankar N <ravishankar@redhat.com>
    Reviewed-on: http://review.gluster.org/11985
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Comment 5 Kaushal 2015-09-09 05:40:22 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.