Bug 607672

Summary: alert notification log upgrade bug
Product: [Other] RHQ Project Reporter: Joseph Marques <jmarques>
Component: InstallerAssignee: Joseph Marques <jmarques>
Status: CLOSED CURRENTRELEASE QA Contact: Rajan Timaniya <rtimaniy>
Severity: medium Docs Contact:
Priority: urgent    
Version: 3.0.0CC: rtimaniy
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-12 16:59:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 593121    
Attachments:
Description Flags
server log none

Description Joseph Marques 2010-06-24 14:39:53 UTC
Description of problem:

When choosing the upgrade (a.k.a "keep existing data") option in the installer, the pre-existing alert history is destroyed.  Audit trails should be kept consistent through upgrades, which is what this fix will address.

Steps to Reproduce:
1. Install any previous version
2. Create an alert definition, add alert notifications of all types - subjects, roles, emails, operation invocations (note: SNMP-based notifications don't become part of the audit trail, so that don't need to be tested here)
3. Trigger that alert definition to create at least one alert, and verify audit trail exists for subjects, roles, emails, operations for the created alerts
  
Actual results:

All audit trail information for alert notifications is lost

Expected results:

Audit trail information should be transformed into the new data structures for the pluggable alert senders.  The data will look differently, but it should contain the same information.

Comment 1 Joseph Marques 2010-06-24 14:53:21 UTC
commit d9c7b5f73abdf141dce71b4c938e6b4718954e18
Author: Joseph Marques <joseph>
Date:   Thu Jun 24 10:39:21 2010 -0400

    BZ-607672: support upgrading alert notification history

Comment 2 Corey Welton 2010-06-24 15:27:01 UTC
Assigning to rajan as part of standard upgrade testing.  Will be pinging jweiss to see what, if anything, can be done to augment automation to assure audit trail remains after upgrade.

Comment 3 Rajan Timaniya 2010-06-29 11:10:16 UTC
Tested on JON 2.4 GA_QA (tag-jon-release build #43)
http://hudson-qe.rhq.rdu.redhat.com:8080/view/JON/job/tag-jon-release/43/

QE environment:
JON installed with HA
   1)JON server and agent installed on RHEL 5
     Java: Sun JDK1.6
   2)JON server and agent installed on WIN-2003
     Java: Sun JDK1.6
Database: Oracle 10g
Browser: Mozilla Firefox 3.0.19

Steps:
1) Install JON 2.3.1 with Oracle 10g
2) Create alert for Linux platform, alert details:
	Alert Properties 
		Priority:  	 !!! - High 
		Active:  	  YES
	Condition Set 
		If Condition:  	 Total Memory > 0.1B
		OR If Condition: Free Memory > 0.1B
		Dampening Rule:  Each time condition set is true
		Action Filters:  Disable alert until re-enabled manually or by recovery alert : false 
	Notification Actions 
		Resource Operations - View Process List on this resource
		Direct Emails - rtimaniy
		System Roles - Role1, Role2, Role3, User Role, All Resources Role
		System Users - rhqadmin, rajan, rajan1, rajan2, rajan3
3) Wait for alert notification 
4) Upgrade JON 2.3.1 to JON 2.4 GA_QE (tag-jon-release build #43) with "keep existing data"
5) Refer server log

Observation:
Alert notification log (audit trail information) transformed into the new data structures but during upgrade there are exceptions in server log.

2010-06-29 14:58:44,846 INFO  [org.jboss.web.tomcat.service.TomcatDeployer] deploy, ctxPath=/content, warUrl=.../deploy/rhq.ear/rhq-content_http.war/
2010-06-29 14:58:45,122 ERROR [org.rhq.enterprise.server.alert.AlertManagerBean] Failed to send all notifications for Alert[id=10735]
java.lang.NullPointerException
	at org.rhq.enterprise.server.alert.AlertManagerBean.getAlertPluginManager(AlertManagerBean.java:748)
	at org.rhq.enterprise.server.alert.AlertManagerBean.getAlertSender(AlertManagerBean.java:755)
	at org.rhq.enterprise.server.alert.AlertManagerBean.sendAlertNotifications(AlertManagerBean.java:669)
	at org.rhq.enterprise.server.alert.AlertManagerBean.fireAlert(AlertManagerBean.java:639)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	...

2010-06-29 15:01:06,951 ERROR [org.rhq.enterprise.server.operation.OperationManagerBean] A scheduled operation has an invalid name - did a plugin change its operation metadata? : ResourceOperationScheduleComposite: operation-job-id=[rhq-resource-10289-735590344-1277802419406_=_rhq-resource-10289], operation-name=[null], operation-next-fire-time=[Tue Jun 29 14:36:59 IST 2010], resource-id=[10289], resource-name=[rajantest.usersys.redhat.com], resource-type-name=[Linux]
java.lang.NullPointerException
	at org.rhq.enterprise.server.operation.OperationManagerBean.getResourceOperationSchedule(OperationManagerBean.java:420)
	at org.rhq.enterprise.server.operation.OperationManagerBean.getResourceOperationSchedule(OperationManagerBean.java:459)
	at org.rhq.enterprise.server.operation.OperationManagerBean.findCurrentlyScheduledResourceOperations(OperationManagerBean.java:1421)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Comment 4 Rajan Timaniya 2010-06-29 11:16:51 UTC
Created attachment 427648 [details]
server log

Comment 5 Joseph Marques 2010-06-29 21:12:24 UTC
commit eee5ad2d562b8377c670f44a07fa1a0d59257c95
Author: Joseph Marques <joseph>
Date:   Tue Jun 29 17:08:05 2010 -0400

BZ-607672: add more silience during notfication processing
    
* gracefully handle cases when server-plugin plugin container is in one of various initializing states
* do not bomb during alert notification sender if server-side plugin container hasn't started completely

Comment 6 Joseph Marques 2010-06-29 21:19:35 UTC
The fix I just committed should prevent the first null pointer exception you saw while trying to process alerts:

java.lang.NullPointerException
 	at org.rhq.enterprise.server.alert.AlertManagerBean.getAlertPluginManager(AlertManagerBean.java:748)
 	at org.rhq.enterprise.server.alert.AlertManagerBean.getAlertSender(AlertManagerBean.java:755)
 	at org.rhq.enterprise.server.alert.AlertManagerBean.sendAlertNotifications(AlertManagerBean.java:669)
 	at org.rhq.enterprise.server.alert.AlertManagerBean.fireAlert(AlertManagerBean.java:639)
 	at org.rhq.enterprise.server.alert.AlertDampeningManagerBean.processEventType(AlertDampeningManagerBean.java:171)
 	at org.rhq.enterprise.server.alert.AlertConditionLogManagerBean.checkForCompletedAlertConditionSet(AlertConditionLogManagerBean.java:196)
 	at org.rhq.enterprise.server.alert.engine.jms.AlertConditionConsumerBean.onMessage(AlertConditionConsumerBean.java:93) 
        at org.rhq.enterprise.server.alert.CachedConditionManagerBean.processCachedConditionMessage(CachedConditionManagerBean.java:82)

However, the second null point exception seems to indicate that you did not fill in all of the required information when creating the operation-based notification:

ResourceOperationScheduleComposite:
operation-job-id=[rhq-resource-10289-735590344-1277802419406_=_rhq-resource-10289],
operation-name=[null], operation-next-fire-time=[Tue Jun 29 14:36:59 IST 2010],
resource-id=[10289], resource-name=[rajantest.usersys.redhat.com],
resource-type-name=[Linux]

Notice how the "operation-name" parameter is null.  This would indicate to me you probably forgot to press the save button after selecting which operation you wanted to invoke.  I'll work on tidying up the error message to be cleaner, but please confirm you pressed SAVE after selecting the operation prior to upgrade.

Comment 7 Rajan Timaniya 2010-06-30 09:58:42 UTC
Verified on JON2.4 GA_QA (tag-jon-release build #44)
Revision: 10751

Alert notification log (audit trail information) transformed into the new data
structures and there isn't any exception/error in server log.

Comment 8 Corey Welton 2010-08-12 16:59:30 UTC
Mass-closure of verified bugs against JON.