1227759 – Write performance from a Windows client on 3-way replicated volume decreases substantially when one brick in the replica set is brought down

Bug 1227759 - Write performance from a Windows client on 3-way replicated volume decreases substantially when one brick in the replica set is brought down

Summary: Write performance from a Windows client on 3-way replicated volume decreases ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.1
Assignee:	Ravishankar N
QA Contact:	Ben Turner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1216951 1223636 1250170 1251815 1255698
TreeView+	depends on / blocked

Reported:	2015-06-03 12:42 UTC by Shruti Sampat
Modified:	2016-09-17 12:18 UTC (History)
CC List:	12 users (show)
Fixed In Version:	glusterfs-3.7.1-13
Doc Type:	Bug Fix
Doc Text:	Previously, if a brick in a replica went down, there was a chance of drastic reduction in write speed due to extra fsyncs that happened. With the fix, this issue is resolved.
Clone Of:
Clones:	1250170 (view as bug list)
Environment:
Last Closed:	2015-10-05 07:10:12 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
profile info with all bricks up (37.77 KB, text/plain) 2015-06-03 12:43 UTC, Shruti Sampat	no flags	Details
profile info with one brick down - 1 (37.83 KB, text/plain) 2015-06-03 12:44 UTC, Shruti Sampat	no flags	Details
profile info with one brick down - 2 (22.86 KB, text/plain) 2015-06-03 12:45 UTC, Shruti Sampat	no flags	Details
profile info with one brick down - 3 (22.86 KB, text/plain) 2015-06-03 12:45 UTC, Shruti Sampat	no flags	Details
profile info with one brick down - 4 (22.86 KB, text/plain) 2015-06-03 12:45 UTC, Shruti Sampat	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1845	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.1 update	2015-10-05 11:06:22 UTC

Description Shruti Sampat 2015-06-03 12:42:09 UTC

Description of problem:
-------------------------

On a 3-way replicated volume mounted on a Windows client, one of the bricks was brought down. The write speed observed while copying a .iso file was substantially reduced as compared to the speeds observed with all bricks in the replica set being up.

See results of the copy operations below -

With all 3 bricks up
--------------------

PS Z:\> robocopy C:\Users\shruti\Downloads\test Z:\ RHS.iso

------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows
------------------------------------------------------------------------

  Started : Wed Jun 03 16:36:45 2015

   Source : C:\Users\shruti\Downloads\test\
     Dest : Z:\

    Files : RHS.iso

  Options : /COPY:DAT /R:1000000 /W:30

------------------------------------------------------------------------

                           1    C:\Users\shruti\Downloads\test\
100%        New File               1.5 g        RHS.iso

------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :         1         0         1         0         0         0
   Files :         1         1         0         0         0         0
   Bytes :   1.566 g   1.566 g         0         0         0         0
   Times :   0:01:21   0:01:21                       0:00:00   0:00:00


   Speed :            20750024 Bytes/sec.
   Speed :            1187.325 MegaBytes/min.

   Ended : Wed Jun 03 16:38:06 2015


With one brick in replica set brought down
-------------------------------------------

PS Z:\> robocopy C:\Users\shruti\Downloads\test Z:\ RHS.iso

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows
-------------------------------------------------------------------------------

  Started : Wed Jun 03 16:39:38 2015

   Source : C:\Users\shruti\Downloads\test\
     Dest : Z:\

    Files : RHS.iso

  Options : /COPY:DAT /R:1000000 /W:30

------------------------------------------------------------------------------

                           1    C:\Users\shruti\Downloads\test\
100%        New File               1.5 g        RHS.iso

------------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :         1         0         1         0         0         0
   Files :         1         1         0         0         0         0
   Bytes :   1.566 g   1.566 g         0         0         0         0
   Times :   0:03:26   0:03:26                       0:00:00   0:00:00


   Speed :             8158524 Bytes/sec.
   Speed :             466.834 MegaBytes/min.

   Ended : Wed Jun 03 16:43:05 2015

Such a drop in performance is not observed with a 2-way replicated volume in the same setup.

Version-Release number of selected component (if applicable):
----------------------------------------------------------------
glusterfs-3.7.0-3.el6rhs.x86_64
samba-4.1.17-7.el6rhs.x86_64

How reproducible:
------------------
100%

Steps to Reproduce:
--------------------
1. Created a 6x3 volume and mounted on a Windows 7 client machine.
2. Copied a .iso file from the Windows local drive to the gluster share.
3. Brought down one brick of the replica set and ran the copy operation again.

Actual results:
-----------------

There is a significant drop in write speeds when results from step 3 are compared to results from step 2. Write speed with one brick down reduce to roughly 40% of the speed observed with all bricks up.

Expected results:
------------------

Expected behaviour is that write speed should not be affected if one brick in a replica set is down.

Additional info:
-----------------

See attached output of `gluster volume profile info' command at various instances during the copy operation.

Comment 1 Shruti Sampat 2015-06-03 12:43:56 UTC

Created attachment 1034293 [details]
profile info with all bricks up

Comment 2 Shruti Sampat 2015-06-03 12:44:36 UTC

Created attachment 1034295 [details]
profile info with one brick down - 1

Comment 3 Shruti Sampat 2015-06-03 12:45:02 UTC

Created attachment 1034297 [details]
profile info with one brick down - 2

Comment 4 Shruti Sampat 2015-06-03 12:45:24 UTC

Created attachment 1034298 [details]
profile info with one brick down - 3

Comment 5 Shruti Sampat 2015-06-03 12:45:51 UTC

Created attachment 1034300 [details]
profile info with one brick down - 4

Comment 6 Ben England 2015-06-11 14:55:34 UTC

I seem to remember that AFR started doing FSYNC on every write if a subvolume went down.  In the past, most AFR volumes had only 2 subvolumes, so if you lost one of them, you were in danger of losing data if the 2nd one went down.  However, with 3-way replication, it may be too pessimistic - you still have 2 out of 3 subvolumes so why start doing FSYNC at that point?

suggested actions: run Gluster volume profile and see if FSYNC FOP is more frequent with 1 AFR subvolume down.  See if FSYNC-per-WRITE code is still there in AFR and enabled.

If so, can we can fix code to only do FSYNC-per-WRITE if we are down to a single AFR subvolume, regardless of whether we are doing 2- or 3-way replication.

Comment 8 Ric Wheeler 2015-06-23 14:00:49 UTC

Is this really a blocker? I would suggest that this can slip out to an asynch update.

Comment 11 monti lawrence 2015-07-22 17:23:58 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 12 Ravishankar N 2015-07-24 06:27:47 UTC

Hi Monti, I have made a slight modification to the doc text. Please update to this text.

Comment 16 Ben Turner 2015-09-17 23:00:38 UTC

Verified on - glusterfs-3.7.1-15.el6rhs.x86_64

Comment 18 errata-xmlrpc 2015-10-05 07:10:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html

Note You need to log in before you can comment on or make changes to this bug.