Bug 1024183 - [RHS-C] Host detach fails but does not show an error in UI
[RHS-C] Host detach fails but does not show an error in UI
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc (Show other bugs)
2.1
Unspecified Unspecified
high Severity high
: ---
: RHGS 2.1.2
Assigned To: Ramesh N
Shruti Sampat
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-29 01:24 EDT by Shubhendu Tripathi
Modified: 2015-05-13 12:27 EDT (History)
11 users (show)

See Also:
Fixed In Version: cb11
Doc Type: Bug Fix
Doc Text:
Cause: There was some issue with the error handling in engine side while removing a host from cluster. Consequence: 1. When remove host fails in the Gluster side, host was getting removed from Engine DB and RHSC-UI. Fix: Error handling was corrected in engine side to handle the errors returned from VDSM/GLUSTER in remove host (gluster peer detach). Result: When remove host fails in the VDSM/GLUSTER side and if the reason is that the host being removed is not part of the cluster(Host is already from gluster using CLI) then it will be removed from Engine DB. But if the remove host fails in server VDSM/GLuster side because of some other reason, then host will not be removed from engine DB and an appropriate event message will be shown in the events tab.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-02-25 02:56:56 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
RHSC and RHS logs (7.35 MB, application/x-gzip)
2013-12-04 15:21 EST, Matt Mahoney
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 20655 None None None Never
oVirt gerrit 22240 None None None Never
oVirt gerrit 22284 None None None Never

  None (edit)
Description Shubhendu Tripathi 2013-10-29 01:24:16 EDT
Description of problem:
Removing a host from the cluster (host detach) fails in backend and shows an exception in vdsm.log. But the same error is not shown in the UI and host gets detached successfully.

Actually the host remains in the cluster and lists while running the command "gluster peer status".

Version-Release number of selected component (if applicable):


How reproducible:
Ocassionally

Steps to Reproduce:
1. Create a cluster and add two hosts h1 and h2 using FQDN
2. Remove the host h1 from the UI
3. Verify the vdsm.log which shows an exception

Actual results:
The host h1 gets deleted from the UI without any error popup

Expected results:
The host should not be deleted from the UI and there should be an error popup

Additional info:
Comment 2 Shubhendu Tripathi 2013-12-03 23:24:14 EST
Matt,

Earlier, if a host is detached from CLI using "peer detach" option and the same was not yet synchronized with UI. Now if a user try to remove host from UI it used to be successful.

Ideally it should show and error. Now after the patch if user tries to remove the host there would be and shown in events tab that host is already detached.
Comment 3 Matt Mahoney 2013-12-04 15:21:03 EST
Created attachment 832857 [details]
RHSC and RHS logs
Comment 5 Shubhendu Tripathi 2013-12-04 22:43:13 EST
As the remove host does not wait for the response, throwing a pop up is not possible. All that is possible to show an event message saying "Failed to remove gluster server from cluster".
Comment 6 Kanagaraj 2013-12-04 23:08:01 EST
Actions like Stop/Start/Remove Volume and Remove/Maintenance hosts are asynchronous in nature. If there are any validation errors like host has bricks, then an error popup will be thrown. The ui only wait for the validation to succeed, and won't wait for the execution to happen because that might take longer time.

So it is not possible to show an error popup if the respective operation is failed in gluster. This can be conveyed only through event logs.
Comment 7 Shruti Sampat 2013-12-09 12:09:10 EST
Performed the following steps to verify the fix - 

1. Create a cluster of two nodes via the Console ( say n1 and n2 )
2. Using the gluster CLI on n1, peer probe n3
3. Create volume v1, with bricks on n1 and n2 ( which are managed by the Console )
4. Create volume v2 with bricks on n2 and n3 ( n2 is managed by the Console, n3 is *not* managed by the Console )
5. Remove v1 from the UI ( v1 is listed on the Console because engine knows both n1 and n2 ).
According to the UI, now there aren't any volumes in the cluster. But the volume v2 is still present with bricks on n2 and n3.
6. Move n2 to maintenance and try to remove it.

gluster will not allow detaching a peer with bricks on it ( n2 still has bricks that belong to volume v2 ). So peer detach will fail and an events log message will be seen in the UI saying "Failure to remove gluster server n2 from cluster test". This works as expected.

Consider the following scenario - 

A host that is not participating in any volume, is moved to maintenance via the Console. The same host is detached from the cluster using the 'gluster peer detach' command via the gluster CLI on one of the other nodes in the cluster. Now, an attempt to remove the host via the Console, fails as the host is not really a peer. The only possible action for a user of the Console, is to activate the host, which would cause the host to be peer probed and added to the cluster and returned to UP state.

The engine should ideally remove a host that was found not to be a peer, when an attempt is made to remove it from the UI. Moving back to ASSIGNED.
Comment 8 Ramesh N 2013-12-12 00:12:57 EST
With the fix, engine will remove the host if that is already found not in the peer list.
Comment 9 Shruti Sampat 2013-12-18 05:56:55 EST
Verified as fixed in Red Hat Storage Console Version: 2.1.2-0.27.beta.el6_5.
Comment 10 Dusmant 2014-01-06 04:28:43 EST
It's an internal bug seen during the Corbett timeline. Considering it's now addressed, no need to document it. 

Changed the flag appropriately...
Comment 12 errata-xmlrpc 2014-02-25 02:56:56 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html

Note You need to log in before you can comment on or make changes to this bug.