Bug 852561

Summary: AlertCondition.name is null in database after JON 2.4 to 3.1 upgrade
Product: [Other] RHQ Project Reporter: Larry O'Leary <loleary>
Component: Alerts, DatabaseAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.4CC: hrupp, jlivings, jshaughn, loleary
Target Milestone: ---   
Target Release: RHQ 4.5.0   
Hardware: All   
OS: All   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 852554 Environment:
Last Closed: 2013-09-01 06:04:19 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 852554    

Description Larry O'Leary 2012-08-28 19:40:38 EDT
+++ This bug was initially created as a clone of JON Product Bug #852554 +++

Afetr an upgrading from JON 2.4 to JON 3.1, there were AlertCondition entries in the database where the 'name' column was null when it should have been "AVAIL_GOES_DOWN" instead.

Running "select c from org.rhq.core.domain.alert.AlertCondition c where c.name is null" from http://server:7080/admin/test/hibernate.jsp returned several alerts when it should not have returned any.

When viewing the alerts in the UI, this then causes an error reported as:
 Unknown.java_lang_IllegalArgumentException_IllegalArgumentException__V(Unknown Source)
   at Unknown.java_lang_Enum_valueOf__Lcom_google_gwt_core_client_JavaScriptObject_2Ljava_lang_String_2Ljava_lang_Enum_2(Unknown Source)
   at Unknown.org_rhq_core_domain_alert_AlertConditionOperator_valueOf__Ljava_lang_String_2Lorg_rhq_core_domain_alert_AlertConditionOperator_2(Unknown Source)
   at Unknown.org_rhq_enterprise_gui_coregui_client_alert_AlertFormatUtility_formatAlertConditionForDisplay__Lorg_rhq_core_domain_alert_AlertCondition_2Ljava_lang_String_2(Unknown Source)
   at Unknown.org_rhq_enterprise_gui_coregui_client_alert_AlertDataSource_convert__Lorg_rhq_core_domain_alert_Alert_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2(Unknown Source)
   at Unknown.org_rhq_enterprise_gui_coregui_client_alert_AlertDataSource_copyValues__Ljava_lang_Object_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2(Unknown Source)
   at Unknown.org_rhq_enterprise_gui_coregui_client_util_RPC

This can be fixed by executing "UPDATE org.rhq.core.domain.alert.AlertCondition c SET c.name='down' WHERE c.name is null" to repair the data if needed.

--- Additional comment from jlivings@redhat.com on 2012-08-28 18:49:56 EDT ---

AlertCondition[0] does not have nullable=false on the name property, however AlertFormatUtility[1] calls equalsIgnoreCase/toUpperCase (depending on version) on it, which fails when null.

The javadoc for Alert.getName() says:
     * The name of the condition whose semantics are different based on this condition's category:
     * AVAILABILITY: n/a (null)
     * THRESHOLD: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
     * BASELINE: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
     * CHANGE: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
     *         OR (for calltime alert conditions only) this will be the optional regular expression condition
     *         (which may be null)
     * TRAIT: the name of the trait (TODO: today its the display name, very bad for i18n purposes)
     * CONTROL: the name of the operation (not its display name)
     * EVENT: the level of event to compare with (DEBUG, INFO, WARN, ERROR, FATAL)
     * RESOURCE_CONFIG: n/a (null)
     * DRIFT: the name of the drift definition that triggered the drift detection. This is actually a
     *        regex that allows the user to match more than one drift definition if they so choose.
     *        (this value may be null, in which case it doesn't matter which drift definition were the ones
     *         in which the drift was detected) 
     * RANGE: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
     * @return additional information about the condition

I believe that JavaDoc is incorrect for AVAILABILITY, since the value should be AVAIL_GOES_DOWN, AVAIL_GOES_DISABLED, AVAIL_GOES_UNKNOWN, AVAIL_GOES_NOT_UP, or AVAIL_GOES_UP.

Due to RESOURCE_CONFIG not having any value, the field being nullable seems correct.

The major problem is that during the upgrade, AlertCondition.name did not get set when it should have. It mayalso be useful is AlertFormatUtility handled nulls better, but that would only hide the problem since other code may fail with null values too.

[0] modules/core/domain/src/main/java/org/rhq/core/domain/alert/AlertCondition.java
[1] modules/enterprise/gui/coregui/src/main/java/org/rhq/enterprise/gui/coregui/client/alert/AlertFormatUtility.java
Comment 1 Jay Shaughnessy 2012-08-29 15:55:01 EDT
Master commit 35c152e87e280d8a5cd93353892da39067883819

From an RHQ perspective this should have affected upgrades to RHQ 4.4.  Add
a db upgrade step to convert relevant alert condition rows to the new format.

Test Notes:
This is a db upgrade fix, so ideally this would be tested on postgres and oracle.  The dev work was done on postgres. I did validate the upgrade SQL on Oracle.

To test:
See the steps in https://bugzilla.redhat.com/show_bug.cgi?id=852554#c2.
Test Notes:
- you could also use JON3.0.x as the base version.
- i would suggest generating both 'goes down' and 'comes up' alerts for more complete test coverage.
- an easy way to generate these alerts is on a webapp, use the stop/start operations. Use the agent prompt command '> avail -c' to speed up the avail change and alert generation.
- you can upgrade to 3.1 or any version of 3.1.1 prior to the fix to see the issue.  The recent alerts portlet should be enough to generate the uncaught exception.
- upgrade to a fixed version to see it work.
- for RHQ, start with 4.3, upgrade to 4.4 for the problem. Upgrade to 4.5 for the fix.
Comment 2 Jay Shaughnessy 2012-08-30 11:26:06 EDT
master commit 26941dd6fb00043842ad5d4e6786b1f5ed391382
fix merge issue
Comment 3 Heiko W. Rupp 2013-09-01 06:04:19 EDT
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.