Description of problem: This bz was discovered as part of bz#1421186. The Heap Used metric is disabled after automatic storage node re-inventory. Version-Release number of selected component (if applicable): JON 3.3.9.DR01 How reproducible: Always Steps to Reproduce: 1. install JON 3.3.8 2. go to Administration->Storage Nodes->Your storage node 3. uninventory platform which runs storage node 4. wait until the storage node resource is auto inventoried again 5. check navigate to Administration->Storage Nodes->Your storage node Actual results: After step 2. all metrics are available After step 5 the Heap Used metric shows NaN because the metric is disabled (visible on Monitoring->Schedules tab of storage node -> JVM -> Memory Subsystem resource) Expected results: All metrics enabled before uninvenotry should be enabled after re-inventory. Additional info: Note that the metric is not enabled in metric collection template for JVM memory system for RHQ storage node. Why is it enabled on the resource after installation when it's disabled in the template?
Checked that StorageNodeMaintenanceOperationsFailure alert definition is created correctly after automatic re-inventory
It's because it is enabled by the AlertDefinitionServerPlugin, that creates the alerts. When it creates the alerts, it enables the necessary metrics. Since the templates are already in the RHQ, there's no need to rerun this code and thus nothing enables them.
Another note.. this is correctly enabled in the template when using master (RHQ).
On standard installation (from the QE machines), the following RHQ CLI script makes everything work (this is what the plugin does): var rt = ResourceTypeCriteria() rt.addFilterPluginName("RHQStorage") rt.addFilterName("VM Memory System") rt.fetchMetricDefinitions(true) var defs = ResourceTypeManager.findResourceTypesByCriteria(rt) var rtVM = defs.get(0) var ite = rtVM.getMetricDefinitions().iterator() while(ite.hasNext()) { a = ite.next(); if(a.name == "{HeapMemoryUsage.used}") { rtIds[0] = a.id } } MeasurementScheduleManager.enableSchedulesForResourceType(rtIds, false) So it appears there's nothing wrong in the functions themselves in JON 3.3.9.
I guess I should've checked this before spending all this time to track things. I can't repeat this bug on my own machine. If I unpack JON 3.3.0.GA and then use update 09 on top of that and install, it will work correctly. Maybe there's something in the QE VM scripts, but clearly not in the default installation.
Back to this one. The main cause is plugin update. So when the plugin is updated, it restores the original settings (from the plugin). And then our serverplugin (alertdef-rhq) has a setting that by default it will not reset properties if the alerts were already defined. Fixing this requires overriding any user modified settings or making the storage node to override JMX plugin for these properties.
Fixed in the master: commit 0607f52e542de726cee5955b8c5a5e70b30a225e (HEAD -> master) Author: Michael Burman <miburman> Date: Wed Nov 22 17:34:32 2017 +0200 [BZ 1488179] Force collection intervals to be updated even if alert definitions are not replaced
Moving to ON_QA as available for test with the latest build: JON 3.3.10 DR01 artifacts are available for test from here: http://download.eng.bos.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/164/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip *Note: jon-server-patch-3.3.0.GA.zip maps to DR01 build of jon-server-3.3.0.GA-update-10.zip. https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=635136
Verified All metrics enabled after re-inventory.
Created attachment 1376766 [details] re-invertoried platform
I still see it JON 3.3.10 -> moving back to assigned. Prachi, could you please retest? In attached screen shot in comment11 I can see NaN values for Heap Used metric. Is it the correct screen shot?
The fix should be in modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-serverplugins/serverplugin-alertdef-rhq-*.jar but this file is not part of the update10: find jon-server-3.3.0.GA-update-10/ -name '*serverplugin-alertdef-rhq*' jon-server-3.3.0.GA-update-10/jon-server-updates/docs/jon-licenses/org.rhq,serverplugin-alertdef-rhq,4.12.0.JON330GA,GNU Lesser General Public License v2 (or 2.1) or later.txt jon-server-3.3.0.GA-update-10/jon-server-updates/docs/jon-licenses/org.rhq,serverplugin-alertdef-rhq,4.12.0.JON330GA,GNU General Public License v2.0.txt Simeon, could you please check why the fix did not make it to DR01?
I also noticed that the serverplugin-alertdef-rhq-4.12.0.JON330GA.jar file is in .old dir: jon-server-3.3.0.GA/.patched/3.3.0.GA-update-10_01-09-18_09-48-35/.old/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-serverplugins/serverplugin-alertdef-rhq-4.12.0.JON330GA.jar Simeon, are you aware of the reason why it's on remove list?
Moving to ON_QA. JON 3.3.10 CR01 artifacts are available for test from here: http://download.eng.bos.redhat.com/brewroot/packages/org.jboss.on-jboss-on-parent/3.3.0.GA/166/maven/org/jboss/on/jon-server-patch/3.3.0.GA/jon-server-patch-3.3.0.GA.zip *Note: jon-server-patch-3.3.0.GA.zip maps to CR01 build of jon-server-3.3.0.GA-update-10.zip.
Verified in: Version : 3.3.0.GA Update 10 Build Number : 800d329:2f3e0db
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0325