Bug 1227759

Summary:

Write performance from a Windows client on 3-way replicated volume decreases substantially when one brick in the replica set is brought down

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Shruti Sampat <ssampat>

Component:

replicate

Assignee:

Ravishankar N <ravishankar>

Status:

CLOSED ERRATA

QA Contact:

Ben Turner <bturner>

Severity:

high

Docs Contact:

Priority:

high

Version:

rhgs-3.1

CC:

asriram, asrivast, bengland, bturner, divya, mlawrenc, nsathyan, ravishankar, rhs-bugs, rwheeler, storage-qa-internal, vagarwal

Target Milestone:

---

Keywords:

ZStream

Target Release:

RHGS 3.1.1

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

glusterfs-3.7.1-13

Doc Type:

Bug Fix

Doc Text:

Previously, if a brick in a replica went down, there was a chance of drastic reduction in write speed due to extra fsyncs that happened. With the fix, this issue is resolved.

Story Points:

---

Clone Of:

Clones:

1250170 (view as bug list)

Environment:

Last Closed:

2015-10-05 07:10:12 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1216951, 1223636, 1250170, 1251815, 1255698

Attachments:

Description	Flags
profile info with all bricks up	none
profile info with one brick down - 1	none
profile info with one brick down - 2	none
profile info with one brick down - 3	none
profile info with one brick down - 4	none

Description Shruti Sampat 2015-06-03 12:42:09 UTC

Description of problem:
-------------------------

On a 3-way replicated volume mounted on a Windows client, one of the bricks was brought down. The write speed observed while copying a .iso file was substantially reduced as compared to the speeds observed with all bricks in the replica set being up.

See results of the copy operations below -

With all 3 bricks up
--------------------

PS Z:\> robocopy C:\Users\shruti\Downloads\test Z:\ RHS.iso

------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows
------------------------------------------------------------------------

  Started : Wed Jun 03 16:36:45 2015

   Source : C:\Users\shruti\Downloads\test\
     Dest : Z:\

    Files : RHS.iso

  Options : /COPY:DAT /R:1000000 /W:30

------------------------------------------------------------------------

                           1    C:\Users\shruti\Downloads\test\
100%        New File               1.5 g        RHS.iso

------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :         1         0         1         0         0         0
   Files :         1         1         0         0         0         0
   Bytes :   1.566 g   1.566 g         0         0         0         0
   Times :   0:01:21   0:01:21                       0:00:00   0:00:00


   Speed :            20750024 Bytes/sec.
   Speed :            1187.325 MegaBytes/min.

   Ended : Wed Jun 03 16:38:06 2015


With one brick in replica set brought down
-------------------------------------------

PS Z:\> robocopy C:\Users\shruti\Downloads\test Z:\ RHS.iso

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows
-------------------------------------------------------------------------------

  Started : Wed Jun 03 16:39:38 2015

   Source : C:\Users\shruti\Downloads\test\
     Dest : Z:\

    Files : RHS.iso

  Options : /COPY:DAT /R:1000000 /W:30

------------------------------------------------------------------------------

                           1    C:\Users\shruti\Downloads\test\
100%        New File               1.5 g        RHS.iso

------------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :         1         0         1         0         0         0
   Files :         1         1         0         0         0         0
   Bytes :   1.566 g   1.566 g         0         0         0         0
   Times :   0:03:26   0:03:26                       0:00:00   0:00:00


   Speed :             8158524 Bytes/sec.
   Speed :             466.834 MegaBytes/min.

   Ended : Wed Jun 03 16:43:05 2015

Such a drop in performance is not observed with a 2-way replicated volume in the same setup.

Version-Release number of selected component (if applicable):
----------------------------------------------------------------
glusterfs-3.7.0-3.el6rhs.x86_64
samba-4.1.17-7.el6rhs.x86_64

How reproducible:
------------------
100%

Steps to Reproduce:
--------------------
1. Created a 6x3 volume and mounted on a Windows 7 client machine.
2. Copied a .iso file from the Windows local drive to the gluster share.
3. Brought down one brick of the replica set and ran the copy operation again.

Actual results:
-----------------

There is a significant drop in write speeds when results from step 3 are compared to results from step 2. Write speed with one brick down reduce to roughly 40% of the speed observed with all bricks up.

Expected results:
------------------

Expected behaviour is that write speed should not be affected if one brick in a replica set is down.

Additional info:
-----------------

See attached output of `gluster volume profile info' command at various instances during the copy operation.

Comment 1 Shruti Sampat 2015-06-03 12:43:56 UTC

Created attachment 1034293 [details]
profile info with all bricks up

Comment 2 Shruti Sampat 2015-06-03 12:44:36 UTC

Created attachment 1034295 [details]
profile info with one brick down - 1

Comment 3 Shruti Sampat 2015-06-03 12:45:02 UTC

Created attachment 1034297 [details]
profile info with one brick down - 2

Comment 4 Shruti Sampat 2015-06-03 12:45:24 UTC

Created attachment 1034298 [details]
profile info with one brick down - 3

Comment 5 Shruti Sampat 2015-06-03 12:45:51 UTC

Created attachment 1034300 [details]
profile info with one brick down - 4

Comment 6 Ben England 2015-06-11 14:55:34 UTC

I seem to remember that AFR started doing FSYNC on every write if a subvolume went down.  In the past, most AFR volumes had only 2 subvolumes, so if you lost one of them, you were in danger of losing data if the 2nd one went down.  However, with 3-way replication, it may be too pessimistic - you still have 2 out of 3 subvolumes so why start doing FSYNC at that point?

suggested actions: run Gluster volume profile and see if FSYNC FOP is more frequent with 1 AFR subvolume down.  See if FSYNC-per-WRITE code is still there in AFR and enabled.

If so, can we can fix code to only do FSYNC-per-WRITE if we are down to a single AFR subvolume, regardless of whether we are doing 2- or 3-way replication.

Comment 8 Ric Wheeler 2015-06-23 14:00:49 UTC

Is this really a blocker? I would suggest that this can slip out to an asynch update.

Comment 11 monti lawrence 2015-07-22 17:23:58 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 12 Ravishankar N 2015-07-24 06:27:47 UTC

Hi Monti, I have made a slight modification to the doc text. Please update to this text.

Comment 16 Ben Turner 2015-09-17 23:00:38 UTC

Verified on - glusterfs-3.7.1-15.el6rhs.x86_64

Comment 18 errata-xmlrpc 2015-10-05 07:10:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html