Bug 846846

Summary: NPE in AS5 plugin
Product: [Other] RHQ Project Reporter: Stefan Negrea <snegrea>
Component: PluginsAssignee: Stefan Negrea <snegrea>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 4.4CC: hrupp
Target Milestone: ---   
Target Release: RHQ 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 837510 Environment:
Last Closed: 2013-09-01 06:02:37 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 837510    
Bug Blocks:    

Description Stefan Negrea 2012-08-08 17:12:44 EDT
+++ This bug was initially created as a clone of Bug #837510 +++

From agent.log

2012-07-04 09:14:09,225 WARN  [ResourceContainer.invoker.daemon-2] 
(org.rhq.plugins.jbossas5.WebApplicationContextComponent)- Failed to determine 
whether the web app context localhost is clustered or not.
java.lang.RuntimeException: Failed to load [ComponentType{type=MBean, 
subtype=WebApplicationManager}] ManagedComponent 
	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.NullPointerException
	... 10 more

--- Additional comment from hrupp@redhat.com on 2012-07-04 03:24:26 EDT ---

To add: as5 was down at this time, so it is a bit of corner case, but the plugin should nevertheless not throw a NPE.

--- Additional comment from snegrea@redhat.com on 2012-08-08 16:57:39 EDT ---

This is clearly an edge case because it happens only if the AS5 server is inventoried but the managing component is not started yet. A possible way to get into this state is to inventory the AS5 server, stop the AS5 server, and then restart the RHQ server. 

I inspected the code very carefully. The errors are just printed to the logs to show something is not right but they do not bubble anywhere. So from that perspective the code is good because it just logs the unusual circumstance and does not prevent the component from getting started.

However, there is one scenario that is not correctly handled by the existing code. If an application is clustered and the scenario presented above happens, when the AS5 server gets online again, the plugin code will not recheck the clustered property, thus leaving the default value for the property (which is not clustered). The clustered flag is used only for a trait and not used by any other external code. Also the value is never refreshed. The operation to check for the clustered flag is very expensive, as it involves calls to the profile service.

To fix this, the code that checks whether an application is clustered or not will be relocated outside of the start method. It will run on the first metrics collection (done only on started components that are available). The getter method that returns the cluster setting will return false until such time. The getter method will be marked as deprecated and left in the code to avoid potential problems with external plugins that rely on the content system. Also, the trait will not be reported by the agent until the property is retrieve successfully from the AS5 server.
Comment 2 Heiko W. Rupp 2013-09-01 06:02:37 EDT
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.