Bug 1102885 - Storage node undeploy process should work with down nodes
Summary: Storage node undeploy process should work with down nodes
Keywords:
Status: NEW
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server, Plugins, Storage Node
Version: 4.10,4.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nobody
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1102887
TreeView+ depends on / blocked
 
Reported: 2014-05-29 18:40 UTC by John Sanda
Modified: 2022-03-31 04:28 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
: 1102887 (view as bug list)
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description John Sanda 2014-05-29 18:40:08 UTC
Description of problem:
The undeploy process does several things,

* Decommissions the node from the cluster such that
  * Its data is redistributed to other nodes 
  * Cluster nodes will no longer communicate with said node
* The node is shut down
* The rhq-storage and rhq-data directories are purged from disk
* The The storage node resource is removed from inventory
* The storage node entity, i.e., row in rhq_storage_node table, is deleted

If the node is down, the decommission operation fails and the undeploy process cannot complete. The very reason for wanting to undeploy the node in the first place may be because the node is down. 

I have also seen users do the following when they want to get rid of a storage node,

1. Stop the storage node
2. Remove the storage node resource from inventory
3. (Optionally) Delete the rhq-storage directory

Once again, we cannot decommission the node and so the undeploy process cannot complete. The decommission operation has to be run on the node being removed. If we cannot do that, then we should fall back to calling StorageServiceMBean.removeNode(String hostIdString) from a running node. It can be called on any node in the cluster. If the removeNode() operation fails, then we fall back to StorageServiceMBean.forceRemoveCompletion() as a last resort.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


Note You need to log in before you can comment on or make changes to this bug.