Bug 1002174 - uninventory storage node for stopped agent destabilizes system and leads to Administration-> Topology -> Storage Nodes page dysfunction
Summary: uninventory storage node for stopped agent destabilizes system and leads to A...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Core UI
Version: 4.9
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHQ 4.9
Assignee: Jirka Kremser
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 951619
TreeView+ depends on / blocked
 
Reported: 2013-08-28 14:48 UTC by Armine Hovsepyan
Modified: 2015-09-03 00:01 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-09-24 19:08:50 UTC
Embargoed:


Attachments (Terms of Use)
cass-server-uninventory.png (172.97 KB, image/png)
2013-09-02 16:15 UTC, Armine Hovsepyan
no flags Details
platform-uninventory.png (111.15 KB, image/png)
2013-09-02 16:15 UTC, Armine Hovsepyan
no flags Details
storage-node-uninventory.png (173.25 KB, image/png)
2013-09-02 16:16 UTC, Armine Hovsepyan
no flags Details

Description Armine Hovsepyan 2013-08-28 14:48:35 UTC
clone of bug #976741
 Armine 2013-06-21 06:48:38 EDT

Description of problem:
rhq storage node data model - record is not being removed from db and ui after uninventory

Version-Release number of selected component (if applicable):
rhq 4.8 build e59e69d

How reproducible:
always

Steps to Reproduce:
1. install rhq 4.8 with storage  (ip1)
2. install agent and storage on another vm  (ip2)
3. stop agent on ip2 and uninventory platform from inventory list

Actual results:
After step 2 Exception is vislbe on the top ofthe page -> http://d.pr/i/4z6L  
After step 3 removed storage still visible in Administration -> Topology -> Storage nodes without resouce id  -> click on which leads to exception
removed storage node details visible in rhq_storage_nodes table without resource id  --> http://d.pr/i/gNr0

Expected results:
After step 2  storage node data without exceptions visible under Administration -> Topology -> Storage nodes  -- http://d.pr/i/6hgR
After step 3 removed storage removed from in Administration -> Topology -> Storage nodes without resouce id  
removed storage node details removed from rhq_storage_nodes table 

Additional info:

Armine 2013-06-21 09:16:21 EDT
Blocks: 951619
[reply] [−]
Private
Comment 1 John Sanda 2013-06-21 09:41:15 EDT

This is not fully implemented yet. Removing a storage node from inventory is going to have to do several things including,

* Remove the resource hierarchy from the database
* Remove the storage node entity from the database
* Remove the node from the Cassandra cluster

The last one is the tricky part. There are JMX operations we want to invoke to let other nodes in the cluster know that we are permanently removing the node. Then depending the cluster size, we may have to change the replication factor for the cluster and perform maintenance to make sure data is where it belongs in the cluster.

[reply] [−]
Private
Comment 2 Charles Crouch 2013-07-01 15:58:59 EDT

Per 7/1 BZ triage: target jon32

This needs to be implemented as part of the JON3.2 Beta/RHQ4.9

Priority: unspecified → high
Target Release: --- → JON 3.2.0
[reply] [−]
Private
Comment 3 John Sanda 2013-08-22 21:37:22 EDT

Support for undeployment is available with master builds. https://docs.jboss.org/author/display/RHQ/Deploying+Storage+Nodes provides the details of what is involved in the process. Moving to ON_QA.

Status: NEW → ON_QA
Assignee: rhq-maint → jsanda
Target Milestone: --- → GA

Comment 1 Armine Hovsepyan 2013-08-28 14:48:57 UTC

Stopping agent on IP2 and uninventorying platform from server gui leads to Administration-> Topology -> Storage Nodes page dysfunction as well as destabilization of system.

Comment 3 Larry O'Leary 2013-08-29 13:48:43 UTC
As reported in 1002236 this issue also occurs without removing the platform. In the case of 1002236 it may still be related to the storage node not being in inventory. The agent startup and discovery was delayed.

Comment 4 Jirka Kremser 2013-09-02 11:04:33 UTC
branch:  master
link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=56e34b687
time:    2013-09-02 13:00:07 +0200
commit:  56e34b687112afad9b6e972dab37ed774736adb3
author:  Jirka Kremser - jkremser
message: [BZ 1002174] - uninventory storage node for stopped agent destabilizes
         system and leads to Administration-> Topology -> Storage Nodes
         page dysfunction - Adding yet another confirmation box when
         uninventorying the platform or storage node.

In the notification we inform the user that he/she should run the undeploy op. on the storage node first.

Comment 5 Armine Hovsepyan 2013-09-02 16:15:01 UTC
verified, thank you.

screen-shots attached

Comment 6 Armine Hovsepyan 2013-09-02 16:15:37 UTC
Created attachment 792925 [details]
cass-server-uninventory.png

Comment 7 Armine Hovsepyan 2013-09-02 16:15:52 UTC
Created attachment 792926 [details]
platform-uninventory.png

Comment 8 Armine Hovsepyan 2013-09-02 16:16:58 UTC
Created attachment 792927 [details]
storage-node-uninventory.png

Comment 9 Heiko W. Rupp 2013-09-24 19:08:50 UTC
Bulk closing of RHQ 4.9 verified items


Note You need to log in before you can comment on or make changes to this bug.