992959 – When a brick is down with I/O error volume restart fails causing cluster unusable

Bug 992959 - When a brick is down with I/O error volume restart fails causing cluster unusable

Summary: When a brick is down with I/O error volume restart fails causing cluster unus...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Kaushal
QA Contact:	Sachidananda Urs
Docs Contact:
URL:
Whiteboard:
Depends On:	994375
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-05 09:32 UTC by Sachidananda Urs
Modified:	2013-09-23 22:35 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.4.0.18rhs-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	994375 (view as bug list)
Environment:
Last Closed:	2013-09-23 22:35:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sachidananda Urs 2013-08-05 09:32:25 UTC

Description of problem:

In a Nx2 volume if a brick is down with I/O error, allow volume restart.

When one of bricks in a node goes down if tried to restart the volume we see:

[root@rc-2 ~]# gluster volume start bugz
volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error

This will defeat the entire high-availability argument. One brick going down ends up bringing down the entire cluster.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.15rhs

How reproducible:
Always

Steps to Reproduce:
1. Create a Nx2 volume
2. Bring down one of the bricks (xfstests/src/godown <mount-point>
3. Try to stop/start volume

Actual results:
Volume restart not allowed.

Expected results:
Should allow volume restart.

Comment 1 Sachidananda Urs 2013-08-05 09:43:31 UTC

[root@rc-2 ~]# gluster volume start bugz force
volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error

Comment 4 Amar Tumballi 2013-08-07 10:33:06 UTC

https://code.engineering.redhat.com/gerrit/#/c/11182/

Comment 5 Sachidananda Urs 2013-08-14 10:01:16 UTC

I can still see that the bug is not fixed:


[root@boggs ~]# gluster --version
glusterfs 3.4.0.19rhs built on Aug 14 2013 00:11:42
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

===================================

[root@boggs ~]# gluster volume info
 
Volume Name: foo
Type: Replicate
Volume ID: b47b4690-1594-4f44-ae3e-e1e86ceacd53
Status: Stopped
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.37.72:/rhs/brick1/foo
Brick2: 10.70.37.97:/rhs/brick1/foo

======================================

[root@boggs ~]# gluster volume start foo
volume start: foo: failed: Failed to find brick directory /rhs/brick1/foo for volume foo. Reason : Input/output error
[root@boggs ~]#

Comment 6 Sachidananda Urs 2013-08-14 10:02:31 UTC

[root@boggs ~]# gluster volume start foo force                                                                  
volume start: foo: success

Comment 7 Sachidananda Urs 2013-08-14 10:03:54 UTC

Ignore my earlier comments, after discussion we agreed that `force' should be used.

Comment 8 Scott Haines 2013-09-23 22:35:59 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.