Bug 998842

Summary: RHQ 4.9 AS7 plugin - NullPointerException on ASConnection.newInstanceForServerPluginConfiguration
Product: [Other] RHQ Project Reporter: Stian Lund <stianlund+bugzilla>
Component: Agent, Plugin Container, PluginsAssignee: Thomas Segismont <tsegismo>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: medium Docs Contact:
Priority: high    
Version: 4.8CC: hrupp
Target Milestone: ---Keywords: Reopened
Target Release: RHQ 4.10   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-23 12:29:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1012435    
Attachments:
Description Flags
RHQ Agent Logs
none
RHQ Agent Logs2 none

Description Stian Lund 2013-08-20 08:00:32 UTC
Created attachment 788363 [details]
RHQ Agent Logs

Description of problem:

After installing RHQ 4.9 Snapshot from a working install of 4.8, all JBoss AS7 servers are state Unavailable. 

The RHQ Server itself, running EAP 6.1.Alpha is the only AS7 available - this uses the default rhqadmin user for authentication.

The AS7 servers use a management connection with user+password connected to AD. This was working before and CLI works with auth so the connection is not management. Also this happens on 45 AS7 servers on various hosts.

Looking at the RHQ Agent logs I fint the following exception:

2013-08-20 09:20:49,824 ERROR [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Failed to start component for Resource[id=18225, uuid=52993bc2-49ee-4432-abbb-9c0a7b1defe7, type={JBossAS7}JBossAS7 Standalone Server, key=hostConfig: /opt/jboss/server/tms-backend-ver/configuration/standalone.xml, name=TMS-Backend-Ver (10690), parent=d26apvl005.test.local] from synchronized merge.
org.rhq.core.clientapi.agent.PluginContainerException: Failed to start component for resource Resource[id=18225, uuid=52993bc2-49ee-4432-abbb-9c0a7b1defe7, type={JBossAS7}JBossAS7 Standalone Server, key=hostConfig: /opt/jboss/server/tms-backend-ver/configuration/standalone.xml, name=TMS-Backend-Ver (10690), parent=d26apvl005.test.local].
	at org.rhq.core.pc.inventory.InventoryManager.activateResource(InventoryManager.java:1868)
	at org.rhq.core.pc.inventory.InventoryManager.refreshResourceComponentState(InventoryManager.java:3304)
	at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2872)
	at org.rhq.core.pc.inventory.InventoryManager.processSyncInfo(InventoryManager.java:2879)
	at org.rhq.core.pc.inventory.InventoryManager.synchInventory(InventoryManager.java:1169)
	at org.rhq.core.pc.inventory.InventoryManager.synchInventory(InventoryManager.java:1138)
	at org.rhq.core.pc.inventory.InventoryManager.handleReport(InventoryManager.java:1120)
	at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.call(AutoDiscoveryExecutor.java:130)
	at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.run(AutoDiscoveryExecutor.java:92)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException
	at org.rhq.modules.plugins.jbossas7.ASConnection.newInstanceForServerPluginConfiguration(ASConnection.java:230)
	at org.rhq.modules.plugins.jbossas7.BaseServerComponent.start(BaseServerComponent.java:98)
	at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocation.call(ResourceContainer.java:654)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	... 3 more

Version-Release number of selected component (if applicable):

RHQ 4.9 snapshot of august 8.
RHEL 6.4 - 2.6.32-358.6.1.el6.x86_64 #1 SMP Fri Mar 29 16:51:51 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

Additional info:

I am reporting this even though it's a snapshot to avoid this error coming in 4.9 final. Hopefully more should have seen this by now, although I can't find any references on BZ.

Snapshot was installed to test fixes for bug with new charting, see:
https://bugzilla.redhat.com/show_bug.cgi?id=980765

Comment 1 Thomas Segismont 2013-08-20 20:34:04 UTC
Hi Stian,

This is really strange given the code I read at line 230 of ASConnection.

Can you please run this query against your database and paste the results?

select p.name, p.string_value from rhq_config_property as p, rhq_resource r where p.configuration_id = r.plugin_configuration_id and r.id = 18225

Thanks

Comment 2 Stian Lund 2013-08-21 07:40:08 UTC
Hello Thomas.

I have tried to run the SQL, but can't make it work properly, not very good at SQL syntax I'm afraid...

However, 
select * from rhq_config_property where id = 18225
returns 0 rows.

Also there is no column called plugin_configuration_id?
[SELECT - 0 row(s), 0.000 secs]  [Error Code: 904, SQL State: 42000]  ORA-00904: "PLUGIN_CONFIGURATION_ID": invalid identifier

I just use DBVisualizer to do this, so I am a bit limited as to what I can do - the Oracle guys are the masters of SQL, I might hear with them :)

Comment 3 Thomas Segismont 2013-08-21 07:52:27 UTC
(In reply to Stian Lund from comment #2)
> select * from rhq_config_property where id = 18225
> returns 0 rows.

Yes, because 18225 is the id of your resource, not of a config property.

> Also there is no column called plugin_configuration_id?
> [SELECT - 0 row(s), 0.000 secs]  [Error Code: 904, SQL State: 42000] 
> ORA-00904: "PLUGIN_CONFIGURATION_ID": invalid identifier

plugin_configuration_id is a column of the rhq_resource table, not of rhq_config_property.

Can you try:

select plugin_configuration_id from rhq_resource where id = 18225

And then:

select name, string_value from rhq_config_property where configuration_id = [plugin_configuration_id]

Thanks

Comment 4 Stian Lund 2013-08-21 08:04:19 UTC
Hey, thanks for the quick reply :)

I figured it out, I removed the "as" from your select, then it returns this:

NAME	STRING_VALUE
baseDir	/opt/jboss/server/tms-backend-ver
configDir	/opt/jboss/server/tms-backend-ver/configuration
domainHost	(null)
homeDir	/opt/jboss
hostConfigFile	/opt/jboss/server/tms-backend-ver/configuration/standalone.xml
hostXmlFileName	standalone.xml
hostname	0.0.0.0
logDir	/opt/jboss/server/tms-backend-ver/log
logEventSources	(null)
managementConnectionTimeout	5000
password	xxx
port	10690
productType	AS
realm	ManagementRealm
startScript	/opt/jboss/bin/standalone.sh
startScriptArgs	(null)
startScriptEnv	RUN_CONF=/opt/jboss/server/tms-backend-ver/configuration/standalone.conf
JAVA_HOME=/opt/jboss/java
JBOSS_HOME=/opt/jboss
JBOSS_BASE_DIR=/opt/jboss/server/tms-backend-ver
JBOSS_LOG_DIR=/opt/jboss/server/tms-backend-ver/log
JBOSS_PIDFILE=/opt/jboss/server/tms-backend-ver/log/tms-backend-ver.pid
LAUNCH_JBOSS_IN_BACKGROUND=1
PATH=/bin:/usr/bin
startScriptPrefix	/usr/bin/nohup
user	srvjboss

Comment 5 Thomas Segismont 2013-08-21 08:55:44 UTC
I was thinking of a missing property in your resource plugin config but it looks good.

Can you launch your agent with remote debug options and set a break point at line 430 of ASConnection class?

Comment 6 Stian Lund 2013-08-21 09:01:30 UTC
Remote debug, is this the RHQ_AGENT_DEBUG property?

Not sure what you mean by setting a break point at line 430, what tool do I use for this?

Comment 7 Stian Lund 2013-08-21 13:07:05 UTC
Created attachment 788867 [details]
RHQ Agent Logs2

Comment 8 Stian Lund 2013-08-21 13:07:20 UTC
Hi,
I've tried enabling debugging and setting the additional properties for remote debugging, but our environment is a bit complex so it's not easy to connect to any port on Linux from where I run Eclipse. I keep getting disconnected but it's the first time I'm trying this.

I would have to install Eclipse on Linux to connect to localhost.

But at least the debugging creates a lot of logging so maybe these are useful, I am attaching them in case.

Comment 9 Thomas Segismont 2013-08-21 14:12:40 UTC
There's nothing new in these logs unfortunately. Given that your configuration in database is good, this NPE should not be thrown. Maybe your agent local data was screwed up for some reason.

Can you restart your agent from scratch (start it with '-gLu' see https://docs.jboss.org/author/display/RHQ/RHQ+Agent+Command+Line+Options)

Comment 10 Stian Lund 2013-08-22 08:19:44 UTC
Hi Thomas, an update:

- I tried to do start with -gLu - it actually seems to work and AS7 resources are now stated as UP, so that's good.
- I also tried just doing -u (purgedata) - this also works, so I don't have to do a complete reconfigure.

However - there has got to be an easier way - we have over 40 hosts in this environment, and even more in production, and having to run this command on *every* server when upgrading to 4.9 would be a hell of a job. Kind of pointless to have auto-update of agents when we have to do that :(

Is there a way to do the purgedata through the RHQ UI, on all grouped agents?

Is there a way to figure out what causes this bug as well?

I also notice another problem, in that the Agent will not shut down without a kill, it just logs "waiting for [1] threads to die...". Is this known?

Comment 11 Stian Lund 2013-08-22 08:39:02 UTC
I also get a lot of these in the log now, from all the AS7 resources that have been set UP.

2013-08-22 10:19:21,589 INFO  [ConfigurationManager.threadpool-1] (rhq.core.pc.configuration.ConfigurationCheckExecutor)- New configuration version detected on resource: Resource[id=19602, uuid=cba13e64-2054-47b9-82b8-9506bb6ea64d, type={JBossAS7}JBossAS7 Standalone Server, key=hostConfig: /opt/jboss/server/iaweb-solr/configuration/standalone.xml, name=IAWeb-Solr (10090), parent=d26apvl024.test.local]

Comment 12 Thomas Segismont 2013-08-22 08:57:58 UTC
(In reply to Stian Lund from comment #10)
> Hi Thomas, an update:
> 
> - I tried to do start with -gLu - it actually seems to work and AS7
> resources are now stated as UP, so that's good.
> - I also tried just doing -u (purgedata) - this also works, so I don't have
> to do a complete reconfigure.
> 
> However - there has got to be an easier way - we have over 40 hosts in this
> environment, and even more in production, and having to run this command on
> *every* server when upgrading to 4.9 would be a hell of a job. Kind of
> pointless to have auto-update of agents when we have to do that :(
> 
> Is there a way to do the purgedata through the RHQ UI, on all grouped agents?
> 
> Is there a way to figure out what causes this bug as well?
> 
> I also notice another problem, in that the Agent will not shut down without
> a kill, it just logs "waiting for [1] threads to die...". Is this known?

It was certainly overkill to advice '-gLu' :)

Agent Resource type has this 'Execute Prompt Command' operation. You can schedule a compatible group operation with 'inventory --sync' as parameter.

Comment 13 Stian Lund 2013-08-22 09:16:03 UTC
Ok I will try that.

Are you sure it is advisable to close this as not a bug?

Are you going to inform everyone that runs Jboss AS7 to do this operation on their Agents after installing 4.9?

Comment 14 Thomas Segismont 2013-08-22 09:21:43 UTC
(In reply to Stian Lund from comment #13)
> Are you sure it is advisable to close this as not a bug?

Yes

> Are you going to inform everyone that runs Jboss AS7 to do this operation on
> their Agents after installing 4.9?

No I won't, because agent local data rarely gets screwed up and causes are hard to find in each individual case.

Comment 15 Stian Lund 2013-08-22 09:28:28 UTC
Also, it appears the group command failed on the agents, even though after running, the AS7 Servers are back in inventory with state UP.

The following was in the agent log:

2013-08-22 11:17:03,036 INFO  [ResourceContainer.invoker.nonDaemon-17] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.prompt-command-invoked}Prompt command invoked: [inventory, --sync]
2013-08-22 11:17:03,037 INFO  [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Plugin container is being shutdown...
2013-08-22 11:17:03,037 INFO  [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Plugin container shutdown will wait up to 300 seconds for shut down background threads to terminate.
2013-08-22 11:17:03,043 INFO  [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.content.ContentManager)- Shutting down Content Manager...
2013-08-22 11:17:03,048 ERROR [OperationManager.invoker-1] (rhq.core.pc.inventory.ResourceContainer$ResourceComponentInvocationHandler)- Thread [OperationManager.invoker-1] was interrupted.
2013-08-22 11:17:03,048 WARN  [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Plugin container was interrupted while waiting for an executor service to terminate.
2013-08-22 11:17:03,048 WARN  [OperationManager.invoker-1] (rhq.core.pc.inventory.ResourceContainer$ComponentInvocation)- Invocation has been marked interrupted for method [OperationFacet.invokeOperation] on resource [Resource[id=19608, uuid=67ee411c-95ac-4e78-a42a-d6dbebd37600, type={RHQAgent}RHQ Agent, key=d26apvl025.test.local RHQ Agent, name=RHQ Agent, parent=d26apvl025.test.local, version=4.9.0-SNAPSHOT]]
2013-08-22 11:17:03,053 INFO  [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Shutting down measurement collection...
2013-08-22 11:17:08,059 WARN  [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.inventory.ResourceContainer$ComponentInvocation)- Invocation has been marked interrupted for method [ResourceComponent.stop] on resource [Resource[id=19608, uuid=67ee411c-95ac-4e78-a42a-d6dbebd37600, type={RHQAgent}RHQ Agent, key=d26apvl025.test.local RHQ Agent, name=RHQ Agent, parent=d26apvl025.test.local, version=4.9.0-SNAPSHOT]]
2013-08-22 11:17:08,060 WARN  [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.inventory.InventoryManager)- Plugin Error: Failed to stop component for [Resource[id=19608, uuid=67ee411c-95ac-4e78-a42a-d6dbebd37600, type={RHQAgent}RHQ Agent, key=d26apvl025.test.local RHQ Agent, name=RHQ Agent, parent=d26apvl025.test.local, version=4.9.0-SNAPSHOT]].
org.rhq.core.pc.inventory.TimeoutException: Call to [org.rhq.plugins.agent.AgentServerComponent.stop()] with args [] timed out after 5000 milliseconds - invocation thread will be interrupted.
	at org.rhq.core.pc.inventory.ResourceContainer$ResourceComponentInvocationHandler.invokeInNewThreadWithLock(ResourceContainer.java:583)
	at org.rhq.core.pc.inventory.ResourceContainer$ResourceComponentInvocationHandler.invoke(ResourceContainer.java:549)
	at $Proxy42.stop(Unknown Source)
	at org.rhq.core.pc.inventory.InventoryManager.deactivateResource(InventoryManager.java:2125)
	at org.rhq.core.pc.inventory.InventoryManager.deactivateResource(InventoryManager.java:2119)
	at org.rhq.core.pc.inventory.InventoryManager.persistToDisk(InventoryManager.java:2147)
	at org.rhq.core.pc.inventory.InventoryManager.shutdown(InventoryManager.java:300)
	at org.rhq.core.pc.PluginContainer.shutdown(PluginContainer.java:401)
	at org.rhq.enterprise.agent.AgentMain.shutdownPluginContainer(AgentMain.java:2474)
	at org.rhq.enterprise.agent.promptcmd.PluginContainerPromptCommand.execute(PluginContainerPromptCommand.java:73)
	at org.rhq.enterprise.agent.promptcmd.InventoryPromptCommand.executePCCommand(InventoryPromptCommand.java:283)
	at org.rhq.enterprise.agent.promptcmd.InventoryPromptCommand.syncInventory(InventoryPromptCommand.java:264)
	at org.rhq.enterprise.agent.promptcmd.InventoryPromptCommand.processCommand(InventoryPromptCommand.java:174)
	at org.rhq.enterprise.agent.promptcmd.InventoryPromptCommand.execute(InventoryPromptCommand.java:65)
	at org.rhq.enterprise.agent.AgentMain.executePromptCommand(AgentMain.java:2981)
	at org.rhq.enterprise.agent.AgentMain.executePromptCommand(AgentMain.java:2956)
	at org.rhq.enterprise.agent.AgentManagement.executePromptCommand(AgentManagement.java:358)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
	at org.mc4j.ems.impl.jmx.connection.bean.operation.DOperation.invoke(DOperation.java:110)
	at org.rhq.plugins.agent.AgentServerComponent.invokeOperation(AgentServerComponent.java:180)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocation.call(ResourceContainer.java:654)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
2013-08-22 11:17:08,719 WARN  [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.plugin.PluginClassLoader)- Failed to purge embedded jars directory. Cause: java.io.IOException: Failed to delete file or directory: data/tmp/rhq-ant-bundle-plugin-4.9.0-SNAPSHOT.jar5430676099001543656.classloader
2013-08-22 11:17:08,719 WARN  [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.plugin.PluginClassLoader)- Failed to purge embedded jars directory. Cause: java.io.IOException: Failed to delete file or directory: data/tmp/rhq-jboss-as-7-plugin-4.9.0-SNAPSHOT.jar6876492954842737974.classloader
2013-08-22 11:17:08,721 WARN  [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.plugin.PluginClassLoader)- Failed to purge embedded jars directory. Cause: java.io.IOException: Failed to delete file or directory: data/tmp/rhq-filetemplate-bundle-plugin-4.9.0-SNAPSHOT.jar5754753341609717766.classloader
2013-08-22 11:17:08,723 WARN  [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Timed out after 300 seconds while waiting for shut down background threads to terminate.
2013-08-22 11:17:08,723 INFO  [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Plugin container is now shutdown.
2013-08-22 11:17:11,055 INFO  [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Initializing Plugin Container v4.9.0-SNAPSHOT...

Comment 16 Stian Lund 2013-08-22 09:30:59 UTC
> No I won't, because agent local data rarely gets screwed up and causes are hard to find in each individual case.

Ok fair enough. 

Sorry for sounding a bit harsh, I just am frustrated with all the bugs I'm discovering just because I tried to test a snapshot for fixing another bug.

Hopefully this bug/error/whatever does not make it into 4.9 final for everyone.

cheers,
Stian

Comment 17 Thomas Segismont 2013-09-26 16:41:34 UTC
Fixed in master

commit 8cfdad217914a513f98f2a474e367b1374eb23ea
Author: Thomas Segismont <tsegismo>
Date:   Thu Sep 26 18:37:49 2013 +0200

The bug was introduced with the implementation of BZ974876 "[RFE] ResourceUpgradeFacet should support upgrade of plugin config properties"

In DiscoveryBossBean, upgrade responses were filled with current resource plugin configuration even if the upgrade request had nothing to update in it. But plugin config was loaded lazily and when going through the server services marshalling layer, it was turned into null. As a consequence, the server still had the right configuration, but the agent data was wrong.

Comment 18 Heiko W. Rupp 2014-04-23 12:29:33 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.