Bug 992959 - When a brick is down with I/O error volume restart fails causing cluster unusable
Summary: When a brick is down with I/O error volume restart fails causing cluster unus...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Kaushal
QA Contact: Sachidananda Urs
URL:
Whiteboard:
Depends On: 994375
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-05 09:32 UTC by Sachidananda Urs
Modified: 2013-09-23 22:35 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.4.0.18rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 994375 (view as bug list)
Environment:
Last Closed: 2013-09-23 22:35:59 UTC
Embargoed:


Attachments (Terms of Use)

Description Sachidananda Urs 2013-08-05 09:32:25 UTC
Description of problem:

In a Nx2 volume if a brick is down with I/O error, allow volume restart.

When one of bricks in a node goes down if tried to restart the volume we see:

[root@rc-2 ~]# gluster volume start bugz
volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error

This will defeat the entire high-availability argument. One brick going down ends up bringing down the entire cluster.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.15rhs

How reproducible:
Always

Steps to Reproduce:
1. Create a Nx2 volume
2. Bring down one of the bricks (xfstests/src/godown <mount-point>
3. Try to stop/start volume

Actual results:
Volume restart not allowed.

Expected results:
Should allow volume restart.

Comment 1 Sachidananda Urs 2013-08-05 09:43:31 UTC
[root@rc-2 ~]# gluster volume start bugz force
volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error

Comment 5 Sachidananda Urs 2013-08-14 10:01:16 UTC
I can still see that the bug is not fixed:


[root@boggs ~]# gluster --version
glusterfs 3.4.0.19rhs built on Aug 14 2013 00:11:42
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

===================================

[root@boggs ~]# gluster volume info
 
Volume Name: foo
Type: Replicate
Volume ID: b47b4690-1594-4f44-ae3e-e1e86ceacd53
Status: Stopped
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.37.72:/rhs/brick1/foo
Brick2: 10.70.37.97:/rhs/brick1/foo

======================================

[root@boggs ~]# gluster volume start foo
volume start: foo: failed: Failed to find brick directory /rhs/brick1/foo for volume foo. Reason : Input/output error
[root@boggs ~]#

Comment 6 Sachidananda Urs 2013-08-14 10:02:31 UTC
[root@boggs ~]# gluster volume start foo force                                                                  
volume start: foo: success

Comment 7 Sachidananda Urs 2013-08-14 10:03:54 UTC
Ignore my earlier comments, after discussion we agreed that `force' should be used.

Comment 8 Scott Haines 2013-09-23 22:35:59 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.