It was discovered that the "Anti Entropy Sessions" Internal Server Metric statistic was being monitored when it should have logically been excluded. The Anti Entropy Sessions metric was being discovered only after a Repair operation was initiated, and was marked as DOWN when the JON server was restarted until another Repair operation was run. The fix adds a missing policy to the "Anty Entropy Sessions" type (a single bean), which now allows users to configure the behaviour of a missing resource after a Cassandra or Storage Node server restart. This added policy allows users to change the default (DOWN) convention of MISSING to something more suited to their use case. This bean is particularly useful for long running repair jobs, because it provides important telemetry for the progress of the repair job. Even if the bean disappears after a C* restart, it will be visible again to JON as soon as the repair job is invoked. Users can now collect Anti Entropy metrics correctly between server restarts.
Description of problem:
First time, when JBoss ON is installed and started, *Anti Entropy Sessions* resource will not be discovered and will not be listed under Storage Node's Internal Server Metrics subfolder. However, execution of *Repair* operation will generate *Anti Entropy Sessions" which will be discovered after JBoss ON Agent's full discovery. The status of this metric will be UP.
If we stop the storage node and start it again, previously discovered *Anti Entropy Session* will not exist any more and because of that it will be marked as DOWN in JBoss ON UI, until next "Repair" operation...
Version-Release number of selected component (if applicable):
JBoss ON 3.2
Steps to Reproduce:
1. Install JBoss ON 3.2;
2. In JBoss ON UI navigate to Inventory -> Servers -> RHQ Storage Node -> Internal Server Metrics and confirm that *Anti Entropy Sessions* are not listed;
3. For this storage node, execute "Repair" operation;
4. Execute "discovery -f" on the JBoss ON Agent command line;
5. Execute "avail --force" on the JBoss ON Agent command line;
6. Navigate as before to Internal Server Metrics for this storage node and confirm that *Anti Entropy Sessions* are discovered and UP;
7. Stop storage node: ./rhqctl stop --storage;
8. Start storage node: ./rhqctl start --storage;
9. Navigate to the Internal Server Metrics for this storage node and confirm that *Anti Entropy Sessions* resource is down.
Anti Entropy Sessions resource is discovered and monitored.
We should not monitor Anti Entropy Sessions resource as it will go down after every storage node restart.
*** Bug 1084055 has been marked as a duplicate of this bug. ***
Release/jon3.2.x commit 93a3cfa3650dfe52b9f022973244404cff96d325
Author: Stefan Negrea <firstname.lastname@example.org>
Date: Mon Sep 22 12:21:47 2014 -0500
(cherry picked from commit 2dbe6c57daaae87b5ce6890ba0da22c5976c6a7d)
Signed-off-by: Jay Shaughnessy <email@example.com>
Added missing policy to the "Anty Entropy Sessions" type(a single bean) to allow users to configure the behaviour of a missing resourece after a Cassandra or Storage Node server restart. With the missing policy the users now have the option to change the default (DOWN) conversion of MISSING to something more suited for their use case.
There is a lot of value in having this bean in inventory. For long running repair jobs, it provides important telemetry for the progress of the repair job. Even if the bean dissapers after a C* restart, it will be visible again to JON as soon as the repair job is invoked.
Moving to ON_QA as available for test with build: