Bug 1002174

Summary: uninventory storage node for stopped agent destabilizes system and leads to Administration-> Topology -> Storage Nodes page dysfunction
Product: [Other] RHQ Project Reporter: Armine Hovsepyan <ahovsepy>
Component: Core UIAssignee: Jirka Kremser <jkremser>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.9CC: hrupp, jkremser, loleary, mfoley, myarboro
Target Milestone: ---   
Target Release: RHQ 4.9   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-24 19:08:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 951619    
Attachments:
Description Flags
cass-server-uninventory.png
none
platform-uninventory.png
none
storage-node-uninventory.png none

Description Armine Hovsepyan 2013-08-28 14:48:35 UTC
clone of bug #976741
 Armine 2013-06-21 06:48:38 EDT

Description of problem:
rhq storage node data model - record is not being removed from db and ui after uninventory

Version-Release number of selected component (if applicable):
rhq 4.8 build e59e69d

How reproducible:
always

Steps to Reproduce:
1. install rhq 4.8 with storage  (ip1)
2. install agent and storage on another vm  (ip2)
3. stop agent on ip2 and uninventory platform from inventory list

Actual results:
After step 2 Exception is vislbe on the top ofthe page -> http://d.pr/i/4z6L  
After step 3 removed storage still visible in Administration -> Topology -> Storage nodes without resouce id  -> click on which leads to exception
removed storage node details visible in rhq_storage_nodes table without resource id  --> http://d.pr/i/gNr0

Expected results:
After step 2  storage node data without exceptions visible under Administration -> Topology -> Storage nodes  -- http://d.pr/i/6hgR
After step 3 removed storage removed from in Administration -> Topology -> Storage nodes without resouce id  
removed storage node details removed from rhq_storage_nodes table 

Additional info:

Armine 2013-06-21 09:16:21 EDT
Blocks: 951619
[reply] [−]
Private
Comment 1 John Sanda 2013-06-21 09:41:15 EDT

This is not fully implemented yet. Removing a storage node from inventory is going to have to do several things including,

* Remove the resource hierarchy from the database
* Remove the storage node entity from the database
* Remove the node from the Cassandra cluster

The last one is the tricky part. There are JMX operations we want to invoke to let other nodes in the cluster know that we are permanently removing the node. Then depending the cluster size, we may have to change the replication factor for the cluster and perform maintenance to make sure data is where it belongs in the cluster.

[reply] [−]
Private
Comment 2 Charles Crouch 2013-07-01 15:58:59 EDT

Per 7/1 BZ triage: target jon32

This needs to be implemented as part of the JON3.2 Beta/RHQ4.9

Priority: unspecified → high
Target Release: --- → JON 3.2.0
[reply] [−]
Private
Comment 3 John Sanda 2013-08-22 21:37:22 EDT

Support for undeployment is available with master builds. https://docs.jboss.org/author/display/RHQ/Deploying+Storage+Nodes provides the details of what is involved in the process. Moving to ON_QA.

Status: NEW → ON_QA
Assignee: rhq-maint → jsanda
Target Milestone: --- → GA

Comment 1 Armine Hovsepyan 2013-08-28 14:48:57 UTC

Stopping agent on IP2 and uninventorying platform from server gui leads to Administration-> Topology -> Storage Nodes page dysfunction as well as destabilization of system.

Comment 3 Larry O'Leary 2013-08-29 13:48:43 UTC
As reported in 1002236 this issue also occurs without removing the platform. In the case of 1002236 it may still be related to the storage node not being in inventory. The agent startup and discovery was delayed.

Comment 4 Jirka Kremser 2013-09-02 11:04:33 UTC
branch:  master
link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=56e34b687
time:    2013-09-02 13:00:07 +0200
commit:  56e34b687112afad9b6e972dab37ed774736adb3
author:  Jirka Kremser - jkremser
message: [BZ 1002174] - uninventory storage node for stopped agent destabilizes
         system and leads to Administration-> Topology -> Storage Nodes
         page dysfunction - Adding yet another confirmation box when
         uninventorying the platform or storage node.

In the notification we inform the user that he/she should run the undeploy op. on the storage node first.

Comment 5 Armine Hovsepyan 2013-09-02 16:15:01 UTC
verified, thank you.

screen-shots attached

Comment 6 Armine Hovsepyan 2013-09-02 16:15:37 UTC
Created attachment 792925 [details]
cass-server-uninventory.png

Comment 7 Armine Hovsepyan 2013-09-02 16:15:52 UTC
Created attachment 792926 [details]
platform-uninventory.png

Comment 8 Armine Hovsepyan 2013-09-02 16:16:58 UTC
Created attachment 792927 [details]
storage-node-uninventory.png

Comment 9 Heiko W. Rupp 2013-09-24 19:08:50 UTC
Bulk closing of RHQ 4.9 verified items