Bug 1139765
Summary: | metrics_index and Anti Entropy Sessions resources are down after upgrade to jon3.3.er2 | ||
---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | Filip Brychta <fbrychta> |
Component: | Upgrade | Assignee: | Stefan Negrea <snegrea> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Filip Brychta <fbrychta> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | JON 3.3.0 | CC: | fbrychta, hrupp, jmorgan, snegrea, tsegismo |
Target Milestone: | ER04 | ||
Target Release: | JON 3.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
Four resources (metrics_index, one_hour_metrics, six_hour_metrics, twenty_four_hour_metrics) are marked as down on the dashboard after upgrade. Some resources may no longer be present (on purpose) during the update and now show as missing. The resource' recorded data is still present, and can be viewed through the missing resources page.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2014-12-11 14:02:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Filip Brychta
2014-09-09 15:13:55 UTC
From another upgrade run: 2014-09-19 07:13:28,919 WARN [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10015, uuid=83bc1c30-4fec-4f3c-a9ac-582b7b7d1cbd, type={RHQStorage}ColumnFamily, key=six_hour_metrics, name=six_hour_metrics, parent=rhq] - cause: java.lang.IllegalStateException:EMS bean was null for Resource with type [ResourceType[id=0, name=ColumnFamily, plugin=RHQStorage, category=Service]] and key [six_hour_metrics]. 2014-09-19 07:13:28,927 WARN [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10016, uuid=a37f6446-9502-4212-bd44-f66f4dc04ea3, type={RHQStorage}ColumnFamily, key=twenty_four_hour_metrics, name=twenty_four_hour_metrics, parent=rhq] - cause: java.lang.IllegalStateException:EMS bean was null for Resource with type [ResourceType[id=0, name=ColumnFamily, plugin=RHQStorage, category=Service]] and key [twenty_four_hour_metrics]. 2014-09-19 07:13:28,929 WARN [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10017, uuid=209dd23f-1b5f-42af-aafa-2f1c1bfc2776, type={RHQStorage}ColumnFamily, key=metrics_index, name=metrics_index, parent=rhq] - cause: java.lang.IllegalStateException:EMS bean was null for Resource with type [ResourceType[id=0, name=ColumnFamily, plugin=RHQStorage, category=Service]] and key [metrics_index]. 2014-09-19 07:13:29,005 WARN [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementCollectorRunner)- Failure to collect measurement data for Resource[id=10019, uuid=9d0125b8-dd3d-45e9-ac3c-1a22cbb9af28, type={RHQStorage}ColumnFamily, key=one_hour_metrics, name=one_hour_metrics, parent=rhq] - cause: java.lang.IllegalStateException:EMS bean was null for Resource with type [ResourceType[id=0, name=ColumnFamily, plugin=RHQStorage, category=Service]] and key [one_hour_metrics]. Comment #3 is correct, Bug 1084056 solves the issues with the Anty Entropy Session state once the bean dissapears following a storage node or C* restart. However, these warnings might be present in logs until the resource gets into a MISSING state. Moving to ON_QA as available for test with build: https://brewweb.devel.redhat.com/buildinfo?buildID=388959 metrics_index, one_hour_metrics, six_hour_metrics,twenty_four_hour_metrics resources are still down after upgrade to ER04. I can provide logs if it helps Those column families are no longer part of JON 3.3 due to storage node schema changes. The result of your testing is normal. The only fix applied in the context of this BZ was the change in availablity reporting for the "Anti Entropy Sessions" resource. Ok, thanks Stefan. Is there going to be any note in upgrade manual about this? User will see all those resources down on his default dashboard after upgrade and definitely will be wondering why. No need to mention this in any documentation. The default behaviour did not change at all from the previous release. The only difference is that users now have more options to act on these resources. However, there is no reason to include this in any documentation because it follows the general pattern of MISSING resources and the plugin in cause is not mentioned in documentation as a user tool. (In reply to Filip Brychta from comment #8) > Is there going to be any note in upgrade manual about this? > User will see all those resources down on his default dashboard after > upgrade and definitely will be wondering why. I think we should leave it as is. This late in the game, I don't think we should change the plugin descriptor. It may make sense to point out in the release notes that some resources may have gone missing (on purpose) during the update and now show as missing, but that we did not automatically remove them because users may still want to have a look at recorded data and that they can / should proceed as described in the "missing resources" section. Here is my point: 1 - user has no resources marked as down visible on his dashboard before the upgrade 2 - after the upgrade, user will see 4 resources (metrics_index, one_hour_metrics, six_hour_metrics,twenty_four_hour_metrics) marked as down on his dashboard 3 - many users (me included) will think that something went wrong during the upgrade and they will contact GSS or file a bz etc. So we should at least let the users know that this is expected to avoid confusion. This is verified but it should be included in relsease notes. I need a sanity check on the RN content for this issue, Filip. Can you ack for me Jared, I think it would be useful to add a note that it's possible to safely remove those resources. |