Bug 992959 - When a brick is down with I/O error volume restart fails causing cluster unusable
When a brick is down with I/O error volume restart fails causing cluster unus...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Kaushal
Sachidananda Urs
:
Depends On: 994375
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-05 05:32 EDT by Sachidananda Urs
Modified: 2013-09-23 18:35 EDT (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.18rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 994375 (view as bug list)
Environment:
Last Closed: 2013-09-23 18:35:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Sachidananda Urs 2013-08-05 05:32:25 EDT
Description of problem:

In a Nx2 volume if a brick is down with I/O error, allow volume restart.

When one of bricks in a node goes down if tried to restart the volume we see:

[root@rc-2 ~]# gluster volume start bugz
volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error

This will defeat the entire high-availability argument. One brick going down ends up bringing down the entire cluster.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.15rhs

How reproducible:
Always

Steps to Reproduce:
1. Create a Nx2 volume
2. Bring down one of the bricks (xfstests/src/godown <mount-point>
3. Try to stop/start volume

Actual results:
Volume restart not allowed.

Expected results:
Should allow volume restart.
Comment 1 Sachidananda Urs 2013-08-05 05:43:31 EDT
[root@rc-2 ~]# gluster volume start bugz force
volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error
Comment 5 Sachidananda Urs 2013-08-14 06:01:16 EDT
I can still see that the bug is not fixed:


[root@boggs ~]# gluster --version
glusterfs 3.4.0.19rhs built on Aug 14 2013 00:11:42
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

===================================

[root@boggs ~]# gluster volume info
 
Volume Name: foo
Type: Replicate
Volume ID: b47b4690-1594-4f44-ae3e-e1e86ceacd53
Status: Stopped
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.37.72:/rhs/brick1/foo
Brick2: 10.70.37.97:/rhs/brick1/foo

======================================

[root@boggs ~]# gluster volume start foo
volume start: foo: failed: Failed to find brick directory /rhs/brick1/foo for volume foo. Reason : Input/output error
[root@boggs ~]#
Comment 6 Sachidananda Urs 2013-08-14 06:02:31 EDT
[root@boggs ~]# gluster volume start foo force                                                                  
volume start: foo: success
Comment 7 Sachidananda Urs 2013-08-14 06:03:54 EDT
Ignore my earlier comments, after discussion we agreed that `force' should be used.
Comment 8 Scott Haines 2013-09-23 18:35:59 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.