Bug 1093822

Summary: RFE (offshot of JON3-42) : Allow automatic uninventory of dead / missing / removed resources
Product: [JBoss] JBoss Operations Network Reporter: Jay Shaughnessy <jshaughn>
Component: Core ServerAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Armine Hovsepyan <ahovsepy>
Severity: high Docs Contact:
Priority: high    
Version: JON 3.2CC: ahovsepy, genman, hrupp, loleary, mfoley
Target Milestone: DR01   
Target Release: JON 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-11 14:03:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1070326    
Bug Blocks:    
Attachments:
Description Flags
old-version
none
new-version none

Description Jay Shaughnessy 2014-05-02 18:31:45 UTC
This is similar to the RFE reported in Bug 871974, which asks for a new AvailType, named REMOVED, indicating a resource that is physically missing, as opposed to being DOWN, which indicates that a resource exists but is rejecting a connection.

There are two reasons why this RFE will see action prior to any work on 871974:

1) There is general reluctance to add another full-blown availability type.  It has many touch-points, requiring a decent amount of end-to-end work and actually impacts performance in certain ways.  It also is one more thing the end user needs to understand and deal with.

2) It still requires that REMOVED resources be handled in some manual fashion.

What we've heard from several users is that they dislike the way removed (i.e. dead) resources remain in inventory and can cause resource clutter and/or confusion until manually removed. They just want them to go away.

The premise of this work is similar to 871974, it calls for plugin's getAvailability() implementations to be able to distinguish between DOWN and DEAD (aka REMOVED in the other BZ).

But here, DEAD availability type will never be stored or tracked.  It acts as a flag that can trigger automatic uninventory for resources of enabled types.  All types are initially not enabled to avoid automatic removal of resources a user wants to keep.

For resources reported as DEAD, but not of a type enabled for auto-uninventory, the availType is downgraded to DOWN for storage, history, alerting, etc.

Comment 1 Jay Shaughnessy 2014-05-02 18:38:24 UTC
This RFE was the result of discussion in Bug 1070326.

Comment 2 Jay Shaughnessy 2014-05-02 21:05:53 UTC
*** Bug 871974 has been marked as a duplicate of this bug. ***

Comment 3 Jay Shaughnessy 2014-05-03 03:30:35 UTC
For review, Pull Request 27: https://github.com/rhq-project/rhq/pull/27

Comment 4 Elias Ross 2014-05-03 04:19:51 UTC
Just as a note, I did implement something similar to DEAD/REMOVED using measurement traits. The removed resource would mark itself as 'disabled'. Automatic removal took place using a CLI script fired from a cron tab. But something similar could also be written as a server plugin. Actually I hope that the removal process is written as a server plugin as it makes it easier to develop and change, if needed.

Comment 5 Jay Shaughnessy 2014-05-05 13:39:21 UTC
Actually, it's not as a server plugin.  It's built into availability report handling.  if enabled on the type the DEAD resources are immediately uninventoried using the standard uninventory code, otherwise the avail is downgraded to DOWN.  The benefit here is that we never have to maintain a server-side resource with any flag or special state, and we don't have any special "reaper" code.  The downside is that it precludes any ability to inspect or specially process the dead resources.  But that wasn't really the point.  The mechanism is dedicated to removing nuisance resources.

Comment 6 Jay Shaughnessy 2014-05-06 18:33:05 UTC
Merged into Master:

commit c52f35cea19db3b0b5cb3754b91464380fbdde1d
Author: Jay Shaughnessy <jshaughn>
Date:   Fri May 2 15:13:28 2014 -0400
    Initial commit for consideration.  Plugin components can now report DEAD
    availability for resources that seem physically removed from the system. DEAD
    resources can then be configured for automatic uninventory via a new Admin
    GUI view.

commit 78df7de3618ac64f5e7595d54fa1ad30121e4f36
Author: Jay Shaughnessy <jshaughn>
Date:   Tue May 6 10:57:03 2014 -0400
    Changing "DEAD" to "MISSING" in response to terminology feedback.

commit d039b7ea8a84fa800523fd602e076d0ef0e49561
Author: Jay Shaughnessy <jshaughn>
Date:   Tue May 6 13:19:32 2014 -0400
    Fix a few merge issues.

Comment 7 Mike Foley 2014-05-15 19:36:51 UTC
i am going to set this back to assigned.  the exit criteria for JON RFEs is a developer demo (with QE, GSS, and PM invited)

Comment 10 Jay Shaughnessy 2014-05-23 19:03:49 UTC
After the feature review there were a variety of requested improvements, this commit satisfies those requests.


commit 4790e9f07e82fd56e976758fb91e58752d3ef318
Author: Jay Shaughnessy <jshaughn>
Date:   Fri May 23 14:58:31 2014 -0400

    Incorporate review feedback:
    - add plugin-descriptor flag for types supporting the MISSING avail type
    - use that flag to disable non-supporting types in the admin view for
      setting the policy.
    - add ability to automatically ignore missing resources, in addition to
      the initial commit which provided automatic uninventory.
    - lots of end-to-end changes to support the new MissingPolicy setting.

Comment 12 Simeon Pinder 2014-07-31 15:51:37 UTC
Moving to ON_QA as available to test with brew build of DR01: https://brewweb.devel.redhat.com//buildinfo?buildID=373993

Comment 13 Armine Hovsepyan 2014-08-05 16:10:58 UTC
Created attachment 924254 [details]
old-version

Comment 14 Armine Hovsepyan 2014-08-05 16:11:16 UTC
Created attachment 924255 [details]
new-version

Comment 15 Armine Hovsepyan 2014-08-05 16:14:05 UTC
missing resource is un-inventoried or ignored. Keeping the issue as a main BZ for #1070326 and it's dependent bugs.

Comment 16 Mike Foley 2014-09-09 19:10:32 UTC
tcms testcase run

https://tcms.engineering.redhat.com/run/152183/?from_plan=14082

this testcase run shows 4 failed tests.  bzs were logged for those 4 failed tests.  those bzs were fixed/closed.

ergo, this umbrella issue is "test complete"