Bug 994375 - When a brick is down with I/O error volume restart fails causing cluster unusable
Summary: When a brick is down with I/O error volume restart fails causing cluster unus...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 992959
TreeView+ depends on / blocked
 
Reported: 2013-08-07 07:03 UTC by Kaushal
Modified: 2014-04-17 11:45 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 992959
Environment:
Last Closed: 2014-04-17 11:45:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kaushal 2013-08-07 07:03:11 UTC
+++ This bug was initially created as a clone of Bug #992959 +++

Description of problem:

In a Nx2 volume if a brick is down with I/O error, allow volume restart.

When one of bricks in a node goes down if tried to restart the volume we see:

[root@rc-2 ~]# gluster volume start bugz
volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error

This will defeat the entire high-availability argument. One brick going down ends up bringing down the entire cluster.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.15rhs

How reproducible:
Always

Steps to Reproduce:
1. Create a Nx2 volume
2. Bring down one of the bricks (xfstests/src/godown <mount-point>
3. Try to stop/start volume

Actual results:
Volume restart not allowed.

Expected results:
Should allow volume restart.

--- Additional comment from Sachidananda Urs on 2013-08-05 15:13:31 IST ---

[root@rc-2 ~]# gluster volume start bugz force
volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error

Comment 1 Anand Avati 2013-08-07 07:04:38 UTC
REVIEW: http://review.gluster.org/5510 (glusterd: Try to start all bricks on 'start force') posted (#1) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-08-07 07:20:31 UTC
REVIEW: http://review.gluster.org/5510 (glusterd: Try to start all bricks on 'start force') posted (#2) for review on master by Kaushal M (kaushal)

Comment 3 Anand Avati 2013-08-18 14:24:42 UTC
COMMIT: http://review.gluster.org/5510 committed in master by Vijay Bellur (vbellur) 
------
commit e79be3d1655edb2b9f64a13e1fabae601c7d19e4
Author: Kaushal M <kaushal>
Date:   Wed Aug 7 12:25:07 2013 +0530

    glusterd: Try to start all bricks on 'start force'
    
    A volume would fail to start if any one of the bricks fails staging or
    fails to start, even with the 'force' option. With this patch, when the
    'force' option is given for a volume start, glusterd will continue and
    start other bricks even if one fails staging or starting.
    
    Also did a small fix in changelog, to prevent it crashing when it fails
    to init.
    
    Change-Id: I7efbd9ab13d12d69b0335ae54143fa17586f8f98
    BUG: 994375
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/5510
    Reviewed-by: Venky Shankar <vshankar>
    Reviewed-by: Amar Tumballi <amarts>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 4 Niels de Vos 2014-04-17 11:45:24 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.