Bug 1139765 - metrics_index and Anti Entropy Sessions resources are down after upgrade to jon3.3.er2
Summary: metrics_index and Anti Entropy Sessions resources are down after upgrade to j...
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Upgrade
Version: JON 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ER04
: JON 3.3.0
Assignee: Stefan Negrea
QA Contact: Filip Brychta
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-09 15:13 UTC by Filip Brychta
Modified: 2014-12-11 14:02 UTC (History)
5 users (show)

(edit)
Four resources (metrics_index, one_hour_metrics, six_hour_metrics, twenty_four_hour_metrics) are marked as down on the dashboard after upgrade. Some resources may no longer be present (on purpose) during the update and now show as missing. The resource' recorded data is still present, and can be viewed through the missing resources page.
Clone Of:
(edit)
Last Closed: 2014-12-11 14:02:36 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 1084056 None None None Never

Internal Trackers: 1084056

Description Filip Brychta 2014-09-09 15:13:55 UTC
Description of problem:
$Summary

Version-Release number of selected component (if applicable):
Version :	
3.3.0.ER02
Build Number :	
4fbb183:7da54e2

How reproducible:
2/2

Steps to Reproduce:
1. install jon3.2.0.GA
2. upgrade to jon3.3.ER2


Actual results:
RHQ Storage node child resources metrics_index and Anti Entropy Sessions are down

Expected results:
All resources are up

Additional info:
Found only following for metrics_index in agent.log:
2014-09-09 09:58:48,817 WARN  [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10211, uuid=fc8821b1-8943-4f26-8636-f9a20f09c6ef, type={RHQStorage}ColumnFamily, key=metrics_index, name=metrics_index, parent=rhq] - cause: java.lang.IllegalStateException:EMS bean was null for Resource with type [ResourceType[id=0, name=ColumnFamily, plugin=RHQStorage, category=Service]] and key [metrics_index].


There is nothing in logs regarding Anti Entropy Sessions resource.

Any hint?

Comment 2 Filip Brychta 2014-09-22 07:40:38 UTC
From another upgrade run:
2014-09-19 07:13:28,919 WARN  [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10015, uuid=83bc1c30-4fec-4f3c-a9ac-582b7b7d1cbd, type={RHQStorage}ColumnFamily, key=six_hour_metrics, name=six_hour_metrics, parent=rhq] - cause: java.lang.IllegalStateException:EMS bean was null for Resource with type [ResourceType[id=0, name=ColumnFamily, plugin=RHQStorage, category=Service]] and key [six_hour_metrics].
2014-09-19 07:13:28,927 WARN  [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10016, uuid=a37f6446-9502-4212-bd44-f66f4dc04ea3, type={RHQStorage}ColumnFamily, key=twenty_four_hour_metrics, name=twenty_four_hour_metrics, parent=rhq] - cause: java.lang.IllegalStateException:EMS bean was null for Resource with type [ResourceType[id=0, name=ColumnFamily, plugin=RHQStorage, category=Service]] and key [twenty_four_hour_metrics].
2014-09-19 07:13:28,929 WARN  [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10017, uuid=209dd23f-1b5f-42af-aafa-2f1c1bfc2776, type={RHQStorage}ColumnFamily, key=metrics_index, name=metrics_index, parent=rhq] - cause: java.lang.IllegalStateException:EMS bean was null for Resource with type [ResourceType[id=0, name=ColumnFamily, plugin=RHQStorage, category=Service]] and key [metrics_index].
2014-09-19 07:13:29,005 WARN  [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10019, uuid=9d0125b8-dd3d-45e9-ac3c-1a22cbb9af28, type={RHQStorage}ColumnFamily, key=one_hour_metrics, name=one_hour_metrics, parent=rhq] - cause: java.lang.IllegalStateException:EMS bean was null for Resource with type [ResourceType[id=0, name=ColumnFamily, plugin=RHQStorage, category=Service]] and key [one_hour_metrics].

Comment 4 Stefan Negrea 2014-09-27 01:13:22 UTC
Comment #3 is correct, Bug 1084056 solves the issues with the Anty Entropy Session state once the bean dissapears following a storage node or C* restart. However, these warnings might be present in logs until the resource gets into a MISSING state.

Comment 5 Simeon Pinder 2014-10-01 21:33:09 UTC
Moving to ON_QA as available for test with build:
https://brewweb.devel.redhat.com/buildinfo?buildID=388959

Comment 6 Filip Brychta 2014-10-08 14:46:45 UTC
metrics_index, one_hour_metrics, six_hour_metrics,twenty_four_hour_metrics resources are still down after upgrade to ER04. I can provide logs if it helps

Comment 7 Stefan Negrea 2014-10-08 16:42:21 UTC
Those column families are no longer part of JON 3.3 due to storage node schema changes. 

The result of your testing is normal. The only fix applied in the context of this BZ was the change in availablity reporting for the "Anti Entropy Sessions" resource.

Comment 8 Filip Brychta 2014-10-09 08:27:40 UTC
Ok, thanks Stefan.
Is there going to be any note in upgrade manual about this?
User will see all those resources down on his default dashboard after upgrade and definitely will be wondering why.

Comment 10 Stefan Negrea 2014-10-20 14:12:48 UTC
No need to mention this in any documentation. The default behaviour did not change at all from the previous release. The only difference is that users now have more options to act on these resources. However, there is no reason to include this in any documentation because it follows the general pattern of MISSING resources and the plugin in cause is not mentioned in documentation as a user tool.

Comment 11 Heiko W. Rupp 2014-10-20 16:01:58 UTC
(In reply to Filip Brychta from comment #8)

> Is there going to be any note in upgrade manual about this?
> User will see all those resources down on his default dashboard after
> upgrade and definitely will be wondering why.

I think we should leave it as is. This late in the game, I don't think we should change the plugin descriptor.

It may make sense to point out in the release notes that some resources may have gone missing (on purpose) during the update and now show as missing, but that we did not automatically remove them because users may still want to have a look at recorded data and that they can / should proceed as described in the "missing resources" section.

Comment 12 Filip Brychta 2014-10-20 16:09:05 UTC
Here is my point:
1 - user has no resources marked as down visible on his dashboard before the upgrade
2 - after the upgrade, user will see 4 resources (metrics_index, one_hour_metrics, six_hour_metrics,twenty_four_hour_metrics) marked as down on his dashboard
3 - many users (me included) will think that something went wrong during the upgrade and they will contact GSS or file a bz etc.

So we should at least let the users know that this is expected to avoid confusion.

Comment 13 Filip Brychta 2014-11-10 14:53:19 UTC
This is verified but it should be included in relsease notes.

Comment 14 Jared MORGAN 2014-11-13 00:07:29 UTC
I need a sanity check on the RN content for this issue, Filip. Can you ack for me

Comment 15 Filip Brychta 2014-11-13 12:13:25 UTC
Jared, I think it would be useful to add a note that it's possible to safely remove those resources.


Note You need to log in before you can comment on or make changes to this bug.