Description of problem: Removing a host from the cluster (host detach) fails in backend and shows an exception in vdsm.log. But the same error is not shown in the UI and host gets detached successfully. Actually the host remains in the cluster and lists while running the command "gluster peer status". Version-Release number of selected component (if applicable): How reproducible: Ocassionally Steps to Reproduce: 1. Create a cluster and add two hosts h1 and h2 using FQDN 2. Remove the host h1 from the UI 3. Verify the vdsm.log which shows an exception Actual results: The host h1 gets deleted from the UI without any error popup Expected results: The host should not be deleted from the UI and there should be an error popup Additional info:
Matt, Earlier, if a host is detached from CLI using "peer detach" option and the same was not yet synchronized with UI. Now if a user try to remove host from UI it used to be successful. Ideally it should show and error. Now after the patch if user tries to remove the host there would be and shown in events tab that host is already detached.
Created attachment 832857 [details] RHSC and RHS logs
As the remove host does not wait for the response, throwing a pop up is not possible. All that is possible to show an event message saying "Failed to remove gluster server from cluster".
Actions like Stop/Start/Remove Volume and Remove/Maintenance hosts are asynchronous in nature. If there are any validation errors like host has bricks, then an error popup will be thrown. The ui only wait for the validation to succeed, and won't wait for the execution to happen because that might take longer time. So it is not possible to show an error popup if the respective operation is failed in gluster. This can be conveyed only through event logs.
Performed the following steps to verify the fix - 1. Create a cluster of two nodes via the Console ( say n1 and n2 ) 2. Using the gluster CLI on n1, peer probe n3 3. Create volume v1, with bricks on n1 and n2 ( which are managed by the Console ) 4. Create volume v2 with bricks on n2 and n3 ( n2 is managed by the Console, n3 is *not* managed by the Console ) 5. Remove v1 from the UI ( v1 is listed on the Console because engine knows both n1 and n2 ). According to the UI, now there aren't any volumes in the cluster. But the volume v2 is still present with bricks on n2 and n3. 6. Move n2 to maintenance and try to remove it. gluster will not allow detaching a peer with bricks on it ( n2 still has bricks that belong to volume v2 ). So peer detach will fail and an events log message will be seen in the UI saying "Failure to remove gluster server n2 from cluster test". This works as expected. Consider the following scenario - A host that is not participating in any volume, is moved to maintenance via the Console. The same host is detached from the cluster using the 'gluster peer detach' command via the gluster CLI on one of the other nodes in the cluster. Now, an attempt to remove the host via the Console, fails as the host is not really a peer. The only possible action for a user of the Console, is to activate the host, which would cause the host to be peer probed and added to the cluster and returned to UP state. The engine should ideally remove a host that was found not to be a peer, when an attempt is made to remove it from the UI. Moving back to ASSIGNED.
With the fix, engine will remove the host if that is already found not in the peer list.
Verified as fixed in Red Hat Storage Console Version: 2.1.2-0.27.beta.el6_5.
It's an internal bug seen during the Corbett timeline. Considering it's now addressed, no need to document it. Changed the flag appropriately...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html