Bug 1488179

Summary: Heap Used metric disabled after automatic storage node re-inventory
Product: [JBoss] JBoss Operations Network Reporter: Filip Brychta <fbrychta>
Component: Storage Node, Monitoring -- OtherAssignee: Michael Burman <miburman>
Status: CLOSED ERRATA QA Contact: Filip Brychta <fbrychta>
Severity: medium Docs Contact:
Priority: medium    
Version: JON 3.3.8CC: mfoley, pyadav, spinder
Target Milestone: CR01Keywords: Triaged
Target Release: JON 3.3.10   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-16 03:16:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
re-invertoried platform none

Description Filip Brychta 2017-09-04 14:55:48 UTC
Description of problem:
This bz was discovered as part of bz#1421186. The Heap Used metric is disabled after automatic storage node re-inventory. 



Version-Release number of selected component (if applicable):
JON 3.3.9.DR01

How reproducible:
Always

Steps to Reproduce:
1. install JON 3.3.8
2. go to Administration->Storage Nodes->Your storage node
3. uninventory platform which runs storage node
4. wait until the storage node resource is auto inventoried again
5. check navigate to Administration->Storage Nodes->Your storage node

Actual results:
After step 2. all metrics are available
After step 5 the Heap Used metric shows NaN because the metric is disabled (visible on Monitoring->Schedules tab of storage node -> JVM -> Memory Subsystem resource)

Expected results:
All metrics enabled before uninvenotry should be enabled after re-inventory.

Additional info:
Note that the metric is not enabled in metric collection template for JVM memory system for RHQ storage node.
Why is it enabled on the resource after installation when it's disabled in the template?

Comment 1 Filip Brychta 2017-09-08 10:50:50 UTC
Checked that StorageNodeMaintenanceOperationsFailure alert definition is created correctly after automatic re-inventory

Comment 2 Michael Burman 2017-11-02 12:24:35 UTC
It's because it is enabled by the AlertDefinitionServerPlugin, that creates the alerts. When it creates the alerts, it enables the necessary metrics.

Since the templates are already in the RHQ, there's no need to rerun this code and thus nothing enables them.

Comment 3 Michael Burman 2017-11-02 15:08:40 UTC
Another note.. this is correctly enabled in the template when using master (RHQ).

Comment 4 Michael Burman 2017-11-14 13:00:45 UTC
On standard installation (from the QE machines), the following RHQ CLI script makes everything work (this is what the plugin does):

var rt = ResourceTypeCriteria()
rt.addFilterPluginName("RHQStorage")
rt.addFilterName("VM Memory System")
rt.fetchMetricDefinitions(true)
var defs = ResourceTypeManager.findResourceTypesByCriteria(rt)
var rtVM = defs.get(0)

var ite = rtVM.getMetricDefinitions().iterator()
while(ite.hasNext()) { a = ite.next(); if(a.name == "{HeapMemoryUsage.used}") { rtIds[0] = a.id } }
MeasurementScheduleManager.enableSchedulesForResourceType(rtIds, false)

So it appears there's nothing wrong in the functions themselves in JON 3.3.9.

Comment 5 Michael Burman 2017-11-14 14:23:14 UTC
I guess I should've checked this before spending all this time to track things. I can't repeat this bug on my own machine.

If I unpack JON 3.3.0.GA and then use update 09 on top of that and install, it will work correctly. Maybe there's something in the QE VM scripts, but clearly not in the default installation.

Comment 6 Michael Burman 2017-11-22 14:26:18 UTC
Back to this one. The main cause is plugin update. So when the plugin is updated, it restores the original settings (from the plugin). And then our serverplugin (alertdef-rhq) has a setting that by default it will not reset properties if the alerts were already defined.

Fixing this requires overriding any user modified settings or making the storage node to override JMX plugin for these properties.

Comment 7 Michael Burman 2017-11-22 15:35:14 UTC
Fixed in the master:

commit 0607f52e542de726cee5955b8c5a5e70b30a225e (HEAD -> master)
Author: Michael Burman <miburman>
Date:   Wed Nov 22 17:34:32 2017 +0200

    [BZ 1488179] Force collection intervals to be updated even if alert definitions are not replaced

Comment 9 Simeon Pinder 2017-12-29 12:11:59 UTC
Moving to ON_QA as available for test with the latest build:

JON 3.3.10 DR01 artifacts are available for test from here:
http://download.eng.bos.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/164/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip
 *Note: jon-server-patch-3.3.0.GA.zip maps to DR01 build of
 jon-server-3.3.0.GA-update-10.zip.

https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=635136

Comment 10 Prachi 2018-01-04 10:20:02 UTC
Verified

All metrics enabled after re-inventory.

Comment 11 Prachi 2018-01-04 10:20:59 UTC
Created attachment 1376766 [details]
re-invertoried platform

Comment 12 Filip Brychta 2018-01-08 20:22:48 UTC
I still see it JON 3.3.10 -> moving back to assigned.

Prachi, could you please retest? In attached screen shot in comment11 I can see NaN values for Heap Used metric. Is it the correct screen shot?

Comment 13 Filip Brychta 2018-01-09 11:48:10 UTC
The fix should be in modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-serverplugins/serverplugin-alertdef-rhq-*.jar

but this file is not part of the update10:
find jon-server-3.3.0.GA-update-10/ -name '*serverplugin-alertdef-rhq*'
jon-server-3.3.0.GA-update-10/jon-server-updates/docs/jon-licenses/org.rhq,serverplugin-alertdef-rhq,4.12.0.JON330GA,GNU Lesser General Public License v2 (or 2.1) or later.txt
jon-server-3.3.0.GA-update-10/jon-server-updates/docs/jon-licenses/org.rhq,serverplugin-alertdef-rhq,4.12.0.JON330GA,GNU General Public License v2.0.txt

Simeon, could you please check why the fix did not make it to DR01?

Comment 15 Filip Brychta 2018-01-09 15:14:01 UTC
I also noticed that the serverplugin-alertdef-rhq-4.12.0.JON330GA.jar file is in .old dir:
jon-server-3.3.0.GA/.patched/3.3.0.GA-update-10_01-09-18_09-48-35/.old/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-serverplugins/serverplugin-alertdef-rhq-4.12.0.JON330GA.jar

Simeon, are you aware of the reason why it's on remove list?

Comment 18 Simeon Pinder 2018-01-30 15:37:13 UTC
Moving to ON_QA.

JON 3.3.10 CR01 artifacts are available for test from here:
http://download.eng.bos.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/166/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip
 *Note: jon-server-patch-3.3.0.GA.zip maps to CR01 build of
 jon-server-3.3.0.GA-update-10.zip.

Comment 19 Filip Brychta 2018-01-30 16:28:06 UTC
Verified in:
Version :	
3.3.0.GA Update 10
Build Number :	
800d329:2f3e0db

Comment 22 errata-xmlrpc 2018-02-16 03:16:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0325