Created attachment 697283 [details] engine logs Description of problem: --------------------------------------- In a cluster of two nodes, one of the nodes is removed using "peer detach force". For a volume that had a brick on each of the two servers, the Console now shows only one brick (the one on the detached server is no longer seen on the Console). On trying to start the volume, start fails. On the storage node, running the command "gluster volume info <vol-name>" shows the brick that was residing on the detached server too. Volume start on the storage node fails with the following seen in the gluster logs - --------------------------------------- [2013-02-14 16:22:20.460327] E [glusterd-volume-ops.c:903:glusterd_op_stage_start_volume] 0-: Unable to resolve brick 10.70.35.71:/opt/gluster/volume1 /b1 Version-Release number of selected component (if applicable): Red Hat Storage Console Version: 2.1.0-0.qa5.el6rhs How reproducible: Always Steps to Reproduce: 1. For a two-node cluster (say peer1 and peer2), create a volume having one brick (say brick1 on peer1 and brick2 on peer2) each on both the nodes. 2. Run "gluster peer detach <IP-of-peer1>" on peer2. Actual results: peer1 now disappears from the Servers tab on Console and brick1 also disappears. Volume start fails with the following in the message in the Events log - "Could not start Gluster Volume volume1." Expected results: The Console should display the cause for failure to start the volume. Additional info: Find engine logs attached.
(In reply to comment #0) > Created attachment 697283 [details] > engine logs > > Description of problem: > --------------------------------------- > In a cluster of two nodes, one of the nodes is removed using "peer detach > force". > For a volume that had a brick on each of the two servers, the Console now > shows only one brick (the one on the detached server is no longer seen on > the Console). On trying to start the volume, start fails. > > On the storage node, running the command "gluster volume info <vol-name>" > shows the brick that was residing on the detached server too. Volume start > on the storage node fails with the following seen in the gluster logs - > --------------------------------------- > > [2013-02-14 16:22:20.460327] E > [glusterd-volume-ops.c:903:glusterd_op_stage_start_volume] 0-: Unable to > resolve brick 10.70.35.71:/opt/gluster/volume1 > /b1 What do you see on the gluster cli output when you try to start the volume? > > Version-Release number of selected component (if applicable): > Red Hat Storage Console Version: 2.1.0-0.qa5.el6rhs > > How reproducible: > Always > > Steps to Reproduce: > 1. For a two-node cluster (say peer1 and peer2), create a volume having one > brick (say brick1 on peer1 and brick2 on peer2) each on both the nodes. > 2. Run "gluster peer detach <IP-of-peer1>" on peer2. > > Actual results: > peer1 now disappears from the Servers tab on Console and brick1 also > disappears. I think this behavior is fine, because the brick is not a valid one anymore - it's server is not part of the cluster. > Volume start fails with the following in the message in the Events log - > > "Could not start Gluster Volume volume1." > > Expected results: > The Console should display the cause for failure to start the volume. Can be done, if gluster cli provides the cause for failure. > > Additional info: > Find engine logs attached.
(In reply to comment #2) > (In reply to comment #0) > > Created attachment 697283 [details] > > engine logs > > > > Description of problem: > > --------------------------------------- > > In a cluster of two nodes, one of the nodes is removed using "peer detach > > force". > > For a volume that had a brick on each of the two servers, the Console now > > shows only one brick (the one on the detached server is no longer seen on > > the Console). On trying to start the volume, start fails. > > > > On the storage node, running the command "gluster volume info <vol-name>" > > shows the brick that was residing on the detached server too. Volume start > > on the storage node fails with the following seen in the gluster logs - > > --------------------------------------- > > > > [2013-02-14 16:22:20.460327] E > > [glusterd-volume-ops.c:903:glusterd_op_stage_start_volume] 0-: Unable to > > resolve brick 10.70.35.71:/opt/gluster/volume1 > > /b1 > > What do you see on the gluster cli output when you try to start the volume? It says "volume start: <vol-name>: failed". > > > > > Version-Release number of selected component (if applicable): > > Red Hat Storage Console Version: 2.1.0-0.qa5.el6rhs > > > > How reproducible: > > Always > > > > Steps to Reproduce: > > 1. For a two-node cluster (say peer1 and peer2), create a volume having one > > brick (say brick1 on peer1 and brick2 on peer2) each on both the nodes. > > 2. Run "gluster peer detach <IP-of-peer1>" on peer2. > > > > Actual results: > > peer1 now disappears from the Servers tab on Console and brick1 also > > disappears. > > I think this behavior is fine, because the brick is not a valid one anymore > - it's server is not part of the cluster. But running "gluster volume info <vol-name>" on the gluster CLI lists the bricks that reside on the detached server too. > > > Volume start fails with the following in the message in the Events log - > > > > "Could not start Gluster Volume volume1." > > > > Expected results: > > The Console should display the cause for failure to start the volume. > > Can be done, if gluster cli provides the cause for failure. > > > > > Additional info: > > Find engine logs attached.
(In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #0) > > > Created attachment 697283 [details] > > > engine logs > > > > > > Description of problem: > > > --------------------------------------- > > > In a cluster of two nodes, one of the nodes is removed using "peer detach > > > force". > > > For a volume that had a brick on each of the two servers, the Console now > > > shows only one brick (the one on the detached server is no longer seen on > > > the Console). On trying to start the volume, start fails. > > > > > > On the storage node, running the command "gluster volume info <vol-name>" > > > shows the brick that was residing on the detached server too. Volume start > > > on the storage node fails with the following seen in the gluster logs - > > > --------------------------------------- > > > > > > [2013-02-14 16:22:20.460327] E > > > [glusterd-volume-ops.c:903:glusterd_op_stage_start_volume] 0-: Unable to > > > resolve brick 10.70.35.71:/opt/gluster/volume1 > > > /b1 > > > > What do you see on the gluster cli output when you try to start the volume? > It says "volume start: <vol-name>: failed". I think this is the problem. GlusterFS should provided a more meaningful message explaining why it failed. I suggest you raise a bug in glusterfs for this. > > > > > > > > Version-Release number of selected component (if applicable): > > > Red Hat Storage Console Version: 2.1.0-0.qa5.el6rhs > > > > > > How reproducible: > > > Always > > > > > > Steps to Reproduce: > > > 1. For a two-node cluster (say peer1 and peer2), create a volume having one > > > brick (say brick1 on peer1 and brick2 on peer2) each on both the nodes. > > > 2. Run "gluster peer detach <IP-of-peer1>" on peer2. > > > > > > Actual results: > > > peer1 now disappears from the Servers tab on Console and brick1 also > > > disappears. > > > > I think this behavior is fine, because the brick is not a valid one anymore > > - it's server is not part of the cluster. > But running "gluster volume info <vol-name>" on the gluster CLI lists the > bricks that reside on the detached server too. I agree that the behavior in UI is a little different from gluster CLI in this case, but showing the brick in the UI also would be a little misleading as the server itself is not part of the cluster. I think it should be enough if the start volume error on UI shows the exact error given by the CLI. If you don't see it in the UI, you can raise a different bug for the same. > > > > > Volume start fails with the following in the message in the Events log - > > > > > > "Could not start Gluster Volume volume1." > > > > > > Expected results: > > > The Console should display the cause for failure to start the volume. > > > > Can be done, if gluster cli provides the cause for failure. > > > > > > > > Additional info: > > > Find engine logs attached.
Per Feb 20 bug triage meeting, targeting for 2.1.
As discussed with Shruti, this behavior is fine. We should probably detect such a bad configuration (volume has bricks on non-peer servers), and generate alert for the same. She is going to raise an RFE for the same.