Bug 852554

Summary: AlertCondition.name is null in database after JON 2.4 to 3.1 upgrade
Product: [JBoss] JBoss Operations Network Reporter: James Livingston <jlivings>
Component: Database, Monitoring - AlertsAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.1.0CC: fbrychta, hrupp, joallen, jsanda, loleary, myarboro
Target Milestone: CR02   
Target Release: JON 3.1.1   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 852561 (view as bug list) Environment:
Last Closed: 2013-09-11 10:59:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 852561    
Bug Blocks:    

Description James Livingston 2012-08-28 22:34:37 UTC
Afetr an upgrading from JON 2.4 to JON 3.1, there were AlertCondition entries in the database where the 'name' column was null when it should have been "AVAIL_GOES_DOWN" instead.

Running "select c from org.rhq.core.domain.alert.AlertCondition c where c.name is null" from http://server:7080/admin/test/hibernate.jsp returned several alerts when it should not have returned any.


When viewing the alerts in the UI, this then causes an error reported as:
 Unknown.java_lang_IllegalArgumentException_IllegalArgumentException__V(Unknown Source)
   at Unknown.java_lang_Enum_valueOf__Lcom_google_gwt_core_client_JavaScriptObject_2Ljava_lang_String_2Ljava_lang_Enum_2(Unknown Source)
   at Unknown.org_rhq_core_domain_alert_AlertConditionOperator_valueOf__Ljava_lang_String_2Lorg_rhq_core_domain_alert_AlertConditionOperator_2(Unknown Source)
   at Unknown.org_rhq_enterprise_gui_coregui_client_alert_AlertFormatUtility_formatAlertConditionForDisplay__Lorg_rhq_core_domain_alert_AlertCondition_2Ljava_lang_String_2(Unknown Source)
   at Unknown.org_rhq_enterprise_gui_coregui_client_alert_AlertDataSource_convert__Lorg_rhq_core_domain_alert_Alert_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2(Unknown Source)
   at Unknown.org_rhq_enterprise_gui_coregui_client_alert_AlertDataSource_copyValues__Ljava_lang_Object_2Lcom_smartgwt_client_widgets_grid_ListGridRecord_2(Unknown Source)
   at Unknown.org_rhq_enterprise_gui_coregui_client_util_RPC



This can be fixed by executing "UPDATE org.rhq.core.domain.alert.AlertCondition c SET c.name='down' WHERE c.name is null" to repair the data if needed.

Comment 1 James Livingston 2012-08-28 22:49:56 UTC
AlertCondition[0] does not have nullable=false on the name property, however AlertFormatUtility[1] calls equalsIgnoreCase/toUpperCase (depending on version) on it, which fails when null.


The javadoc for Alert.getName() says:
    /**
     * The name of the condition whose semantics are different based on this condition's category:
     * 
     * AVAILABILITY: n/a (null)
     * THRESHOLD: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
     * BASELINE: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
     * CHANGE: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
     *         OR (for calltime alert conditions only) this will be the optional regular expression condition
     *         (which may be null)
     * TRAIT: the name of the trait (TODO: today its the display name, very bad for i18n purposes)
     * CONTROL: the name of the operation (not its display name)
     * EVENT: the level of event to compare with (DEBUG, INFO, WARN, ERROR, FATAL)
     * RESOURCE_CONFIG: n/a (null)
     * DRIFT: the name of the drift definition that triggered the drift detection. This is actually a
     *        regex that allows the user to match more than one drift definition if they so choose.
     *        (this value may be null, in which case it doesn't matter which drift definition were the ones
     *         in which the drift was detected) 
     * RANGE: the name of the metric (TODO: today its the display name, very bad for i18n purposes)
     * 
     * @return additional information about the condition
     */


I believe that JavaDoc is incorrect for AVAILABILITY, since the value should be AVAIL_GOES_DOWN, AVAIL_GOES_DISABLED, AVAIL_GOES_UNKNOWN, AVAIL_GOES_NOT_UP, or AVAIL_GOES_UP.


Due to RESOURCE_CONFIG not having any value, the field being nullable seems correct.

The major problem is that during the upgrade, AlertCondition.name did not get set when it should have. It mayalso be useful is AlertFormatUtility handled nulls better, but that would only hide the problem since other code may fail with null values too.



[0] modules/core/domain/src/main/java/org/rhq/core/domain/alert/AlertCondition.java
[1] modules/enterprise/gui/coregui/src/main/java/org/rhq/enterprise/gui/coregui/client/alert/AlertFormatUtility.java

Comment 2 James Livingston 2012-08-29 00:21:26 UTC
To reproduce:

1) Install JON 2.4.1
2) Add an agent (embedded is fine)
2) Create a new availability alert on any resource
4) Go to http://localhost:7080/admin/test/browser.jsp?entityClass=org.rhq.core.domain.alert.AlertCondition&key=10001 (with correct key), and see name=null, option=DOWN.
5) Shut downdown
6) Upgrade to 3.1
7) Go back to http://localhost:7080/admin/test/browser.jsp?entityClass=org.rhq.core.domain.alert.AlertCondition&key=10001 (with correct key), and see that it is still name=null, option=DOWN.
8) Go back to the Resource, and then the Alert.
9) See a failure when you view the conditions.

Comment 3 Charles Crouch 2012-08-29 00:22:03 UTC
Assigning to Jay to investigate for jon311. Lets understand when this issue occurs, how serious it is, what workarounds there are, and what's the scope of any potential fix. Once we have those we can revisit to see if any change is needed for 311

Comment 4 James Livingston 2012-08-29 00:41:30 UTC
Commit 73e27be61bea1cb76a5ffa9b76c9410be3783dcd is probably relevant to this. Prior to that commit, the 'option' value when  it was of type AVAILABILITY were "DOWN" or "UP" and the 'name' value was unused.

That commit changed it to using a AlertConditionOperator based on the 'name' field instead. When creating a new alert after that change, the 'name' field is set to the AVAILABILITY_ITEMNAME value (despite the JavaDoc say it's unused), which would be the AlertConditionOperator value.



What probably needs to occur is that during an upgrade, AlertConditions for availabilty have their 'name' property set based on the 'option' property (option=DOWN -> name=AVAIL_GOES_DOWN, option=UP -> name=AVAIL_GOES_UP).

Comment 5 Jay Shaughnessy 2012-08-29 13:03:22 UTC
I haven't tried to reproduce yet but the analysis seems plausible.  I'll look into this.

Comment 6 Jay Shaughnessy 2012-08-29 19:25:02 UTC
From what I can see James is right on all counts.

The problem is pretty bad in that after an upgrade to 3.1.0 a user's first login can likely result in the GUI throwing an uncaught exception.  This will happen if the Recent Alerts portlet wants to show a previously generated avail change alert.  Even if that does not happen, navigating to any alert history view containing avail change alerts may generate the problem.

I have it fixed locally in Master, I also suggest this be fixed in 311.

Comment 7 Jay Shaughnessy 2012-08-29 20:06:58 UTC

Release/jon3.1.x commit 5fa64f5e918cda73557bb1dfe7c6c23a648cc9be

From an RHQ perspective this should have affected upgrades to RHQ 4.4.  Add
a db upgrade step to convert relevant alert condition rows to the new format.

Conflicts:

    modules/core/dbutils/pom.xml
    modules/core/dbutils/src/main/scripts/dbupgrade/db-upgrade.xml

Cherry-pick of master 35c152e87e280d8a5cd93353892da39067883819


To test:
See the steps in https://bugzilla.redhat.com/show_bug.cgi?id=852554#c2.
Test Notes:
- you could also use JON3.0.x as the base version.
- i would suggest generating both 'goes down' and 'comes up' alerts for more complete test coverage.
- an easy way to generate these alerts is on a webapp, use the stop/start operations. Use the agent prompt command '> avail -c' to speed up the avail change and alert generation.
- you can upgrade to 3.1 or any version of 3.1.1 prior to the fix to see the issue.  The recent alerts portlet should be enough to generate the uncaught exception.
- upgrade to a fixed version to see it work.

Comment 8 Jay Shaughnessy 2012-08-30 15:25:12 UTC

Release/jon3.1.x commit 4f97c15f5ecb8d7339ea98a849b8405d4fb32552
fix merge issue

Comment 9 John Sanda 2012-09-06 05:27:17 UTC
CR2 build is available at https://brewweb.devel.redhat.com/buildinfo?buildID=232185. Moving to ON_QA.

Comment 10 Filip Brychta 2012-09-10 14:41:23 UTC
Verified on 3.1.1.CR2 for JON 2.4.1.GA->3.1.1.CR2 and JON 3.0.1.GA->3.1.1.CR2. No errors and both alarms ('goes down' and 'comes up') are being fired correctly.