Bug 998842
Summary: | RHQ 4.9 AS7 plugin - NullPointerException on ASConnection.newInstanceForServerPluginConfiguration | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Stian Lund <stianlund+bugzilla> | ||||||
Component: | Agent, Plugin Container, Plugins | Assignee: | Thomas Segismont <tsegismo> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.8 | CC: | hrupp | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | RHQ 4.10 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-04-23 12:29:33 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1012435 | ||||||||
Attachments: |
|
Description
Stian Lund
2013-08-20 08:00:32 UTC
Hi Stian, This is really strange given the code I read at line 230 of ASConnection. Can you please run this query against your database and paste the results? select p.name, p.string_value from rhq_config_property as p, rhq_resource r where p.configuration_id = r.plugin_configuration_id and r.id = 18225 Thanks Hello Thomas. I have tried to run the SQL, but can't make it work properly, not very good at SQL syntax I'm afraid... However, select * from rhq_config_property where id = 18225 returns 0 rows. Also there is no column called plugin_configuration_id? [SELECT - 0 row(s), 0.000 secs] [Error Code: 904, SQL State: 42000] ORA-00904: "PLUGIN_CONFIGURATION_ID": invalid identifier I just use DBVisualizer to do this, so I am a bit limited as to what I can do - the Oracle guys are the masters of SQL, I might hear with them :) (In reply to Stian Lund from comment #2) > select * from rhq_config_property where id = 18225 > returns 0 rows. Yes, because 18225 is the id of your resource, not of a config property. > Also there is no column called plugin_configuration_id? > [SELECT - 0 row(s), 0.000 secs] [Error Code: 904, SQL State: 42000] > ORA-00904: "PLUGIN_CONFIGURATION_ID": invalid identifier plugin_configuration_id is a column of the rhq_resource table, not of rhq_config_property. Can you try: select plugin_configuration_id from rhq_resource where id = 18225 And then: select name, string_value from rhq_config_property where configuration_id = [plugin_configuration_id] Thanks Hey, thanks for the quick reply :) I figured it out, I removed the "as" from your select, then it returns this: NAME STRING_VALUE baseDir /opt/jboss/server/tms-backend-ver configDir /opt/jboss/server/tms-backend-ver/configuration domainHost (null) homeDir /opt/jboss hostConfigFile /opt/jboss/server/tms-backend-ver/configuration/standalone.xml hostXmlFileName standalone.xml hostname 0.0.0.0 logDir /opt/jboss/server/tms-backend-ver/log logEventSources (null) managementConnectionTimeout 5000 password xxx port 10690 productType AS realm ManagementRealm startScript /opt/jboss/bin/standalone.sh startScriptArgs (null) startScriptEnv RUN_CONF=/opt/jboss/server/tms-backend-ver/configuration/standalone.conf JAVA_HOME=/opt/jboss/java JBOSS_HOME=/opt/jboss JBOSS_BASE_DIR=/opt/jboss/server/tms-backend-ver JBOSS_LOG_DIR=/opt/jboss/server/tms-backend-ver/log JBOSS_PIDFILE=/opt/jboss/server/tms-backend-ver/log/tms-backend-ver.pid LAUNCH_JBOSS_IN_BACKGROUND=1 PATH=/bin:/usr/bin startScriptPrefix /usr/bin/nohup user srvjboss I was thinking of a missing property in your resource plugin config but it looks good. Can you launch your agent with remote debug options and set a break point at line 430 of ASConnection class? Remote debug, is this the RHQ_AGENT_DEBUG property? Not sure what you mean by setting a break point at line 430, what tool do I use for this? Created attachment 788867 [details]
RHQ Agent Logs2
Hi, I've tried enabling debugging and setting the additional properties for remote debugging, but our environment is a bit complex so it's not easy to connect to any port on Linux from where I run Eclipse. I keep getting disconnected but it's the first time I'm trying this. I would have to install Eclipse on Linux to connect to localhost. But at least the debugging creates a lot of logging so maybe these are useful, I am attaching them in case. There's nothing new in these logs unfortunately. Given that your configuration in database is good, this NPE should not be thrown. Maybe your agent local data was screwed up for some reason. Can you restart your agent from scratch (start it with '-gLu' see https://docs.jboss.org/author/display/RHQ/RHQ+Agent+Command+Line+Options) Hi Thomas, an update: - I tried to do start with -gLu - it actually seems to work and AS7 resources are now stated as UP, so that's good. - I also tried just doing -u (purgedata) - this also works, so I don't have to do a complete reconfigure. However - there has got to be an easier way - we have over 40 hosts in this environment, and even more in production, and having to run this command on *every* server when upgrading to 4.9 would be a hell of a job. Kind of pointless to have auto-update of agents when we have to do that :( Is there a way to do the purgedata through the RHQ UI, on all grouped agents? Is there a way to figure out what causes this bug as well? I also notice another problem, in that the Agent will not shut down without a kill, it just logs "waiting for [1] threads to die...". Is this known? I also get a lot of these in the log now, from all the AS7 resources that have been set UP. 2013-08-22 10:19:21,589 INFO [ConfigurationManager.threadpool-1] (rhq.core.pc.configuration.ConfigurationCheckExecutor)- New configuration version detected on resource: Resource[id=19602, uuid=cba13e64-2054-47b9-82b8-9506bb6ea64d, type={JBossAS7}JBossAS7 Standalone Server, key=hostConfig: /opt/jboss/server/iaweb-solr/configuration/standalone.xml, name=IAWeb-Solr (10090), parent=d26apvl024.test.local] (In reply to Stian Lund from comment #10) > Hi Thomas, an update: > > - I tried to do start with -gLu - it actually seems to work and AS7 > resources are now stated as UP, so that's good. > - I also tried just doing -u (purgedata) - this also works, so I don't have > to do a complete reconfigure. > > However - there has got to be an easier way - we have over 40 hosts in this > environment, and even more in production, and having to run this command on > *every* server when upgrading to 4.9 would be a hell of a job. Kind of > pointless to have auto-update of agents when we have to do that :( > > Is there a way to do the purgedata through the RHQ UI, on all grouped agents? > > Is there a way to figure out what causes this bug as well? > > I also notice another problem, in that the Agent will not shut down without > a kill, it just logs "waiting for [1] threads to die...". Is this known? It was certainly overkill to advice '-gLu' :) Agent Resource type has this 'Execute Prompt Command' operation. You can schedule a compatible group operation with 'inventory --sync' as parameter. Ok I will try that. Are you sure it is advisable to close this as not a bug? Are you going to inform everyone that runs Jboss AS7 to do this operation on their Agents after installing 4.9? (In reply to Stian Lund from comment #13) > Are you sure it is advisable to close this as not a bug? Yes > Are you going to inform everyone that runs Jboss AS7 to do this operation on > their Agents after installing 4.9? No I won't, because agent local data rarely gets screwed up and causes are hard to find in each individual case. Also, it appears the group command failed on the agents, even though after running, the AS7 Servers are back in inventory with state UP. The following was in the agent log: 2013-08-22 11:17:03,036 INFO [ResourceContainer.invoker.nonDaemon-17] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.prompt-command-invoked}Prompt command invoked: [inventory, --sync] 2013-08-22 11:17:03,037 INFO [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Plugin container is being shutdown... 2013-08-22 11:17:03,037 INFO [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Plugin container shutdown will wait up to 300 seconds for shut down background threads to terminate. 2013-08-22 11:17:03,043 INFO [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.content.ContentManager)- Shutting down Content Manager... 2013-08-22 11:17:03,048 ERROR [OperationManager.invoker-1] (rhq.core.pc.inventory.ResourceContainer$ResourceComponentInvocationHandler)- Thread [OperationManager.invoker-1] was interrupted. 2013-08-22 11:17:03,048 WARN [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Plugin container was interrupted while waiting for an executor service to terminate. 2013-08-22 11:17:03,048 WARN [OperationManager.invoker-1] (rhq.core.pc.inventory.ResourceContainer$ComponentInvocation)- Invocation has been marked interrupted for method [OperationFacet.invokeOperation] on resource [Resource[id=19608, uuid=67ee411c-95ac-4e78-a42a-d6dbebd37600, type={RHQAgent}RHQ Agent, key=d26apvl025.test.local RHQ Agent, name=RHQ Agent, parent=d26apvl025.test.local, version=4.9.0-SNAPSHOT]] 2013-08-22 11:17:03,053 INFO [MeasurementManager.collector-1] (rhq.core.pc.measurement.MeasurementManager)- Shutting down measurement collection... 2013-08-22 11:17:08,059 WARN [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.inventory.ResourceContainer$ComponentInvocation)- Invocation has been marked interrupted for method [ResourceComponent.stop] on resource [Resource[id=19608, uuid=67ee411c-95ac-4e78-a42a-d6dbebd37600, type={RHQAgent}RHQ Agent, key=d26apvl025.test.local RHQ Agent, name=RHQ Agent, parent=d26apvl025.test.local, version=4.9.0-SNAPSHOT]] 2013-08-22 11:17:08,060 WARN [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.inventory.InventoryManager)- Plugin Error: Failed to stop component for [Resource[id=19608, uuid=67ee411c-95ac-4e78-a42a-d6dbebd37600, type={RHQAgent}RHQ Agent, key=d26apvl025.test.local RHQ Agent, name=RHQ Agent, parent=d26apvl025.test.local, version=4.9.0-SNAPSHOT]]. org.rhq.core.pc.inventory.TimeoutException: Call to [org.rhq.plugins.agent.AgentServerComponent.stop()] with args [] timed out after 5000 milliseconds - invocation thread will be interrupted. at org.rhq.core.pc.inventory.ResourceContainer$ResourceComponentInvocationHandler.invokeInNewThreadWithLock(ResourceContainer.java:583) at org.rhq.core.pc.inventory.ResourceContainer$ResourceComponentInvocationHandler.invoke(ResourceContainer.java:549) at $Proxy42.stop(Unknown Source) at org.rhq.core.pc.inventory.InventoryManager.deactivateResource(InventoryManager.java:2125) at org.rhq.core.pc.inventory.InventoryManager.deactivateResource(InventoryManager.java:2119) at org.rhq.core.pc.inventory.InventoryManager.persistToDisk(InventoryManager.java:2147) at org.rhq.core.pc.inventory.InventoryManager.shutdown(InventoryManager.java:300) at org.rhq.core.pc.PluginContainer.shutdown(PluginContainer.java:401) at org.rhq.enterprise.agent.AgentMain.shutdownPluginContainer(AgentMain.java:2474) at org.rhq.enterprise.agent.promptcmd.PluginContainerPromptCommand.execute(PluginContainerPromptCommand.java:73) at org.rhq.enterprise.agent.promptcmd.InventoryPromptCommand.executePCCommand(InventoryPromptCommand.java:283) at org.rhq.enterprise.agent.promptcmd.InventoryPromptCommand.syncInventory(InventoryPromptCommand.java:264) at org.rhq.enterprise.agent.promptcmd.InventoryPromptCommand.processCommand(InventoryPromptCommand.java:174) at org.rhq.enterprise.agent.promptcmd.InventoryPromptCommand.execute(InventoryPromptCommand.java:65) at org.rhq.enterprise.agent.AgentMain.executePromptCommand(AgentMain.java:2981) at org.rhq.enterprise.agent.AgentMain.executePromptCommand(AgentMain.java:2956) at org.rhq.enterprise.agent.AgentManagement.executePromptCommand(AgentManagement.java:358) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at org.mc4j.ems.impl.jmx.connection.bean.operation.DOperation.invoke(DOperation.java:110) at org.rhq.plugins.agent.AgentServerComponent.invokeOperation(AgentServerComponent.java:180) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocation.call(ResourceContainer.java:654) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-08-22 11:17:08,719 WARN [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.plugin.PluginClassLoader)- Failed to purge embedded jars directory. Cause: java.io.IOException: Failed to delete file or directory: data/tmp/rhq-ant-bundle-plugin-4.9.0-SNAPSHOT.jar5430676099001543656.classloader 2013-08-22 11:17:08,719 WARN [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.plugin.PluginClassLoader)- Failed to purge embedded jars directory. Cause: java.io.IOException: Failed to delete file or directory: data/tmp/rhq-jboss-as-7-plugin-4.9.0-SNAPSHOT.jar6876492954842737974.classloader 2013-08-22 11:17:08,721 WARN [ResourceContainer.invoker.nonDaemon-17] (rhq.core.pc.plugin.PluginClassLoader)- Failed to purge embedded jars directory. Cause: java.io.IOException: Failed to delete file or directory: data/tmp/rhq-filetemplate-bundle-plugin-4.9.0-SNAPSHOT.jar5754753341609717766.classloader 2013-08-22 11:17:08,723 WARN [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Timed out after 300 seconds while waiting for shut down background threads to terminate. 2013-08-22 11:17:08,723 INFO [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Plugin container is now shutdown. 2013-08-22 11:17:11,055 INFO [ResourceContainer.invoker.nonDaemon-17] (org.rhq.core.pc.PluginContainer)- Initializing Plugin Container v4.9.0-SNAPSHOT... > No I won't, because agent local data rarely gets screwed up and causes are hard to find in each individual case.
Ok fair enough.
Sorry for sounding a bit harsh, I just am frustrated with all the bugs I'm discovering just because I tried to test a snapshot for fixing another bug.
Hopefully this bug/error/whatever does not make it into 4.9 final for everyone.
cheers,
Stian
Fixed in master commit 8cfdad217914a513f98f2a474e367b1374eb23ea Author: Thomas Segismont <tsegismo> Date: Thu Sep 26 18:37:49 2013 +0200 The bug was introduced with the implementation of BZ974876 "[RFE] ResourceUpgradeFacet should support upgrade of plugin config properties" In DiscoveryBossBean, upgrade responses were filled with current resource plugin configuration even if the upgrade request had nothing to update in it. But plugin config was loaded lazily and when going through the server services marshalling layer, it was turned into null. As a consequence, the server still had the right configuration, but the agent data was wrong. Bulk closing of 4.10 issues. If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10. |