From agent.log 2012-07-04 09:14:09,225 WARN [ResourceContainer.invoker.daemon-2] (org.rhq.plugins.jbossas5.WebApplicationContextComponent)- Failed to determine whether the web app context localhost is clustered or not. java.lang.RuntimeException: Failed to load [ComponentType{type=MBean, subtype=WebApplicationManager}] ManagedComponent [jboss.web:host=localhost,path=/jbossws,type=Manager]. at org.rhq.plugins.jbossas5.ManagedComponentComponent.getManagedComponent(ManagedComponentComponent.java:462) at org.rhq.plugins.jbossas5.WebApplicationContextComponent.start(WebApplicationContextComponent.java:102) at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocationThread.call(ResourceContainer.java:634) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.NullPointerException at org.rhq.plugins.jbossas5.ManagedComponentComponent.getManagedComponent(ManagedComponentComponent.java:459) ... 10 more
To add: as5 was down at this time, so it is a bit of corner case, but the plugin should nevertheless not throw a NPE.
This is clearly an edge case because it happens only if the AS5 server is inventoried but the managing component is not started yet. A possible way to get into this state is to inventory the AS5 server, stop the AS5 server, and then restart the RHQ server. I inspected the code very carefully. The errors are just printed to the logs to show something is not right but they do not bubble anywhere. So from that perspective the code is good because it just logs the unusual circumstance and does not prevent the component from getting started. However, there is one scenario that is not correctly handled by the existing code. If an application is clustered and the scenario presented above happens, when the AS5 server gets online again, the plugin code will not recheck the clustered property, thus leaving the default value for the property (which is not clustered). The clustered flag is used only for a trait and not used by any other external code. Also the value is never refreshed. The operation to check for the clustered flag is very expensive, as it involves calls to the profile service. To fix this, the code that checks whether an application is clustered or not will be relocated outside of the start method. It will run on the first metrics collection (done only on started components that are available). The getter method that returns the cluster setting will return false until such time. The getter method will be marked as deprecated and left in the code to avoid potential problems with external plugins that rely on the content system. Also, the trait will not be reported by the agent until the property is retrieve successfully from the AS5 server.
release/jon3.1.x branch commit: http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.1.x&id=422438fa3b751562769cfe03be70e794376855dc
Moving to ON_QA since JON 3.1.1 ER2 build is availble - https://brewweb.devel.redhat.com/buildinfo?buildID=228250
verified with the scenario Stefan mentioned.
Bulk closing of old issues in VERIFIED state.