Description of problem: In a Nx2 volume if a brick is down with I/O error, allow volume restart. When one of bricks in a node goes down if tried to restart the volume we see: [root@rc-2 ~]# gluster volume start bugz volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error This will defeat the entire high-availability argument. One brick going down ends up bringing down the entire cluster. Version-Release number of selected component (if applicable): glusterfs 3.4.0.15rhs How reproducible: Always Steps to Reproduce: 1. Create a Nx2 volume 2. Bring down one of the bricks (xfstests/src/godown <mount-point> 3. Try to stop/start volume Actual results: Volume restart not allowed. Expected results: Should allow volume restart.
[root@rc-2 ~]# gluster volume start bugz force volume start: bugz: failed: Failed to find brick directory /rhs/brick1/rc-1 for volume bugz. Reason : Input/output error
https://code.engineering.redhat.com/gerrit/#/c/11182/
I can still see that the bug is not fixed: [root@boggs ~]# gluster --version glusterfs 3.4.0.19rhs built on Aug 14 2013 00:11:42 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. =================================== [root@boggs ~]# gluster volume info Volume Name: foo Type: Replicate Volume ID: b47b4690-1594-4f44-ae3e-e1e86ceacd53 Status: Stopped Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.70.37.72:/rhs/brick1/foo Brick2: 10.70.37.97:/rhs/brick1/foo ====================================== [root@boggs ~]# gluster volume start foo volume start: foo: failed: Failed to find brick directory /rhs/brick1/foo for volume foo. Reason : Input/output error [root@boggs ~]#
[root@boggs ~]# gluster volume start foo force volume start: foo: success
Ignore my earlier comments, after discussion we agreed that `force' should be used.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html