Red Hat Bugzilla – Bug 852554
AlertCondition.name is null in database after JON 2.4 to 3.1 upgrade
Last modified: 2016-02-21 19:56:09 EST
Afetr an upgrading from JON 2.4 to JON 3.1, there were AlertCondition entries in the database where the 'name' column was null when it should have been "AVAIL_GOES_DOWN" instead.
Running "select c from org.rhq.core.domain.alert.AlertCondition c where c.name is null" from http://server:7080/admin/test/hibernate.jsp returned several alerts when it should not have returned any.
When viewing the alerts in the UI, this then causes an error reported as:
at Unknown.org_rhq_core_domain_alert_AlertConditionOperator_valueOf__Ljava_lang_String_2Lorg_rhq_core_domain_alert_AlertConditionOperator_2(Unknown Source)
at Unknown.org_rhq_enterprise_gui_coregui_client_alert_AlertFormatUtility_formatAlertConditionForDisplay__Lorg_rhq_core_domain_alert_AlertCondition_2Ljava_lang_String_2(Unknown Source)
at Unknown.org_rhq_enterprise_gui_coregui_client_alert_AlertDataSource_convert__Lorg_rhq_core_domain_alert_Alert_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2(Unknown Source)
at Unknown.org_rhq_enterprise_gui_coregui_client_alert_AlertDataSource_copyValues__Ljava_lang_Object_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2(Unknown Source)
This can be fixed by executing "UPDATE org.rhq.core.domain.alert.AlertCondition c SET c.name='down' WHERE c.name is null" to repair the data if needed.
AlertCondition does not have nullable=false on the name property, however AlertFormatUtility calls equalsIgnoreCase/toUpperCase (depending on version) on it, which fails when null.
The javadoc for Alert.getName() says:
* The name of the condition whose semantics are different based on this condition's category:
* AVAILABILITY: n/a (null)
* THRESHOLD: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
* BASELINE: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
* CHANGE: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
* OR (for calltime alert conditions only) this will be the optional regular expression condition
* (which may be null)
* TRAIT: the name of the trait (TODO: today its the display name, very bad for i18n purposes)
* CONTROL: the name of the operation (not its display name)
* EVENT: the level of event to compare with (DEBUG, INFO, WARN, ERROR, FATAL)
* RESOURCE_CONFIG: n/a (null)
* DRIFT: the name of the drift definition that triggered the drift detection. This is actually a
* regex that allows the user to match more than one drift definition if they so choose.
* (this value may be null, in which case it doesn't matter which drift definition were the ones
* in which the drift was detected)
* RANGE: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
* @return additional information about the condition
I believe that JavaDoc is incorrect for AVAILABILITY, since the value should be AVAIL_GOES_DOWN, AVAIL_GOES_DISABLED, AVAIL_GOES_UNKNOWN, AVAIL_GOES_NOT_UP, or AVAIL_GOES_UP.
Due to RESOURCE_CONFIG not having any value, the field being nullable seems correct.
The major problem is that during the upgrade, AlertCondition.name did not get set when it should have. It mayalso be useful is AlertFormatUtility handled nulls better, but that would only hide the problem since other code may fail with null values too.
1) Install JON 2.4.1
2) Add an agent (embedded is fine)
2) Create a new availability alert on any resource
4) Go to http://localhost:7080/admin/test/browser.jsp?entityClass=org.rhq.core.domain.alert.AlertCondition&key=10001 (with correct key), and see name=null, option=DOWN.
5) Shut downdown
6) Upgrade to 3.1
7) Go back to http://localhost:7080/admin/test/browser.jsp?entityClass=org.rhq.core.domain.alert.AlertCondition&key=10001 (with correct key), and see that it is still name=null, option=DOWN.
8) Go back to the Resource, and then the Alert.
9) See a failure when you view the conditions.
Assigning to Jay to investigate for jon311. Lets understand when this issue occurs, how serious it is, what workarounds there are, and what's the scope of any potential fix. Once we have those we can revisit to see if any change is needed for 311
Commit 73e27be61bea1cb76a5ffa9b76c9410be3783dcd is probably relevant to this. Prior to that commit, the 'option' value when it was of type AVAILABILITY were "DOWN" or "UP" and the 'name' value was unused.
That commit changed it to using a AlertConditionOperator based on the 'name' field instead. When creating a new alert after that change, the 'name' field is set to the AVAILABILITY_ITEMNAME value (despite the JavaDoc say it's unused), which would be the AlertConditionOperator value.
What probably needs to occur is that during an upgrade, AlertConditions for availabilty have their 'name' property set based on the 'option' property (option=DOWN -> name=AVAIL_GOES_DOWN, option=UP -> name=AVAIL_GOES_UP).
I haven't tried to reproduce yet but the analysis seems plausible. I'll look into this.
From what I can see James is right on all counts.
The problem is pretty bad in that after an upgrade to 3.1.0 a user's first login can likely result in the GUI throwing an uncaught exception. This will happen if the Recent Alerts portlet wants to show a previously generated avail change alert. Even if that does not happen, navigating to any alert history view containing avail change alerts may generate the problem.
I have it fixed locally in Master, I also suggest this be fixed in 311.
Release/jon3.1.x commit 5fa64f5e918cda73557bb1dfe7c6c23a648cc9be
From an RHQ perspective this should have affected upgrades to RHQ 4.4. Add
a db upgrade step to convert relevant alert condition rows to the new format.
Cherry-pick of master 35c152e87e280d8a5cd93353892da39067883819
See the steps in https://bugzilla.redhat.com/show_bug.cgi?id=852554#c2.
- you could also use JON3.0.x as the base version.
- i would suggest generating both 'goes down' and 'comes up' alerts for more complete test coverage.
- an easy way to generate these alerts is on a webapp, use the stop/start operations. Use the agent prompt command '> avail -c' to speed up the avail change and alert generation.
- you can upgrade to 3.1 or any version of 3.1.1 prior to the fix to see the issue. The recent alerts portlet should be enough to generate the uncaught exception.
- upgrade to a fixed version to see it work.
Release/jon3.1.x commit 4f97c15f5ecb8d7339ea98a849b8405d4fb32552
fix merge issue
CR2 build is available at https://brewweb.devel.redhat.com/buildinfo?buildID=232185. Moving to ON_QA.
Verified on 3.1.1.CR2 for JON 2.4.1.GA->3.1.1.CR2 and JON 3.0.1.GA->3.1.1.CR2. No errors and both alarms ('goes down' and 'comes up') are being fired correctly.