Bug 884598

Summary: Alert definitions are missing after JBoss is imported in JON
Product: [Other] RHQ Project Reporter: bkramer <bkramer>
Component: Core ServerAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED NOTABUG QA Contact: Mike Foley <mfoley>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4CC: bkramer, hrupp, jshaughn, loleary
Target Milestone: ---   
Target Release: RHQ 4.13   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 884593
: 884593 (view as bug list) Environment:
Last Closed: 2014-09-03 19:28:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 884593    

Description bkramer 2012-12-06 11:15:57 UTC
Description of problem:
It can happen that the JBoss instance is imported in the JON (using CLI) but it does not have Alert definitions specified when checking JON UI -> JBoss instance -> Alerts -> Definitions.

Version-Release number of selected component (if applicable):
JON 3.1.1

How reproducible:
Always

Steps to Reproduce:

1. Start JON Server;
2. From JON UI navigate to Administration -> Alert Definition Templates -> Servers -> press Edit for JBossAS5 plugin -> add Alert templates;
3. Start JON Agent;
4. Start JBoss instance;
5. Let JON Agent discover JBoss instance (so it's visible in the Discovery Queue);
6. Shut down JON Agent;
7. Execute addJBoss.js script [1] from the JON CLI to inventory all JBoss instances that are in the discovery queue;
8. Confirm that JON UI has got in its inventory JBoss instance with availability UNKNOWN (as it's Agent is DOWN); 
9. From JON UI navigate to JBoss instance that should have alerts definitions defioned -> Alerts -> Definitions and confirm that they are not there.
10. Start JON Agent with --cleanconfig option;
11. Once the Agent is up and running, check the availability of the JBoss instance and confirm that it is GREEN.
12. From JON UI navigate to JBoss instance -> Alerts -> Definitions and confirm that this page is still empty.
  
Actual results:
Alert definitions page for imported JBoss instance is empty.

Expected results:
Alert Definitions page for imported JBoss instance contains alert definitions defined by the Alert definitions Template for JBossAS5.

Additional info:
If JON agent is re-started without --cleanconfig option, JBoss instance that was discovered and imported would become GREEN, but also it would have alert definitions added.

[1] https://access.redhat.com/knowledge/solutions/68975

Comment 1 Jay Shaughnessy 2013-09-11 21:07:57 UTC
This sounds like it may be a duplicate of Bug 969535.  I recommend testing this again with RHQ 4.8 or later.

If the test still fails please let me know.

Comment 2 bkramer 2013-09-13 09:23:15 UTC
(In reply to Jay Shaughnessy from comment #1)
> This sounds like it may be a duplicate of Bug 969535.  I recommend testing
> this again with RHQ 4.8 or later.
> 
> If the test still fails please let me know.

I run the test using RHQ 4.9.0 and the same issue still exists - alert definition from the alert template is not added to the JBoss resource when the agent is restarted with --cleanconfig. If --cleanconfig is not used (just regular restart) then alert definition is added.

Comment 3 Jay Shaughnessy 2014-02-07 21:38:05 UTC
Starting the agent --cleanconfig implicitly performs --purgedata, which deletes the agent's data directory.  By doing so the agent now has to get its inventory from the server.  That means that the resources are all considered previously UNKNOWN by the agent.  When bringing up the agent normally the resources would already be known to the agent, and be in the NEW state.  In the latter case we can see that the the resources have moved from NEW to COMMITTED when syncing with the server are startup.  As such we perform newly-committed resource actions, including alert template application.

For UNKNOWN resources we don't know the previous state and basically assume that a COMMITTED resource was previously committed and should not have the newly-committed actions applied.

This is a fair assumption as this will be the case in any scenario other the the current scenario, where not only was the agent down at import time but was also brought up --cleanconfig (or --purgedata).

The problem with changing this behavior is that if we apply the templates we could be undoing work performed by the users, who may have altered the previously applied templates and now would get them again just due to an agent restart.

The workaround here would just be to uninventory the problematic resource(s) and re-import them with the agent running, or with the agent later brought up normally.

Unless there is some common use-case for the scenario causing this problem I think this should be closed as won't fix, and the workaround used as necessary.

Comment 4 Heiko W. Rupp 2014-05-08 14:42:47 UTC
Bump the target version now that 4.11 is out.

Comment 5 Jay Shaughnessy 2014-07-07 17:44:08 UTC
I still think this should be closed/wontfix but for now bumping to 4.13 for further triage.

Comment 6 Jay Shaughnessy 2014-09-03 19:28:56 UTC
I'm closing this as Not A Bug.  It's a deep corner case, the behavior is as expected, and it has reasonable workarounds:
- uninventory and re-import
- manually update the template and have it re-applied to the existing inventory.