Bug 535021 (RHQ-1760) - Group config update when agent is down, never completes, blocks further config edits
Summary: Group config update when agent is down, never completes, blocks further confi...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: RHQ-1760
Product: RHQ Project
Classification: Other
Component: Configuration
Version: 1.2
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: Ian Springer
QA Contact: Jeff Weiss
URL: http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:
Blocks: RHQ-1386
TreeView+ depends on / blocked
 
Reported: 2009-03-10 18:51 UTC by Jeff Weiss
Modified: 2014-11-09 22:49 UTC (History)
2 users (show)

Fixed In Version: 1.2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
rev3351
Last Closed:
Embargoed:
jweiss: archived+


Attachments (Terms of Use)

Description Jeff Weiss 2009-03-10 18:51:00 UTC
How to repeat:

Go to a group configuration page (I used datasource compat group, all on same AS server).  Edit the config, before submitting, stop the agent managing that AS server.  Submit the config change.  What happens is it stays "in progress" forever.  It should time out after a short period, 5-10 minutes?  The "Current" page will show an error saying the config update is in progress.


From server log:
2009-03-10 13:54:59,662 INFO  [org.rhq.enterprise.server.core.AgentManagerBean] Agent with name [witte.usersys.redhat.com] just went down
2009-03-10 13:55:28,080 ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[updateResourceConfiguration], targetInterfaceName=org.rhq.core.clientapi.agent.configuration.ConfigurationAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000]
2009-03-10 13:55:28,247 ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[updateResourceConfiguration], targetInterfaceName=org.rhq.core.clientapi.agent.configuration.ConfigurationAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000]
2009-03-10 13:55:28,388 ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[updateResourceConfiguration], targetInterfaceName=org.rhq.core.clientapi.agent.configuration.ConfigurationAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000]
2009-03-10 13:55:28,458 ERROR [org.rhq.enterprise.server.configuration.job.AggregateResourceConfigurationUpdateJob] Failed to execute one or more Resource Configuration updates that were part of a group update - details: none available
2009-03-10 13:55:39,182 ERROR [STDERR] java.lang.Exception: Current group Resource configuration for org.rhq.core.domain.resource.group.ResourceGroup@5a629453 cannot be calculated, because a group Resource configuration update is currently in progress. 
2009-03-10 13:55:39,183 ERROR [STDERR]  at org.rhq.enterprise.server.configuration.ConfigurationManagerBean.ensureNoResourceConfigurationUpdatesInProgress(ConfigurationManagerBean.java:558)
2009-03-10 13:55:39,183 ERROR [STDERR]  at org.rhq.enterprise.server.configuration.ConfigurationManagerBean.getResourceConfigurationsForCompatibleGroup(ConfigurationManagerBean.java:508)
2009-03-10 13:55:39,183 ERROR [STDERR]  at sun.reflect.GeneratedMethodAccessor2879.invoke(Unknown Source)
2009-03-10 13:55:39,183 ERROR [STDERR]  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2009-03-10 13:55:39,184 ERROR [STDERR]  at java.lang.reflect.Method.invoke(Method.java:597)
2009-03-10 13:55:39,184 ERROR [STDERR]  at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:112)
2009-03-10 13:55:39,184 ERROR [STDERR]  at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:166)
2009-03-10 13:55:39,184 ERROR [STDERR]  at org.rhq.enterprise.server.common.TransactionInterruptInterceptor.addCheckedActionToTransactionManager(TransactionInterruptInterceptor.java:77)
2009-03-10 13:55:39,184 ERROR [STDERR]  at sun.reflect.GeneratedMethodAccessor157.invoke(Unknown Source)
2009-03-10 13:55:39,184 ERROR [STDERR]  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2009-03-10 13:55:39,184 ERROR [STDERR]  at java.lang.reflect.Method.invoke(Method.java:597)


Comment 1 Jeff Weiss 2009-03-10 19:04:11 UTC
This also seems to permanently disable any further viewing or updating of the config for that group, even when the agent comes back.  Upping priority.

Comment 2 Jeff Weiss 2009-03-10 19:41:49 UTC
Workaround:  delete the in-progress event history item.  lowering priority.

Comment 3 Ian Springer 2009-03-16 19:56:35 UTC
The Server has a reaper job that should time out defunct group config updates (i.e. updates where the Agent went down in the middle of the update) after 10-20 minutes. If you see the update still in the history after the Agent has been down for > 20 minutes, then there's an issue. Otherwise, things are working as intended.
 

Comment 4 Jeff Weiss 2009-03-18 18:07:21 UTC
I know i waited at least 30 minutes before, I just repeated the test, it's been 1 hour 20 minutes now, still "In Progress".

It's definitely broken.

Comment 5 Ian Springer 2009-03-18 19:19:10 UTC
r3451 fixes this - the CheckForTimedOutConfigUpdatesJob Quartz job will now time out group config update requests that were initiated more than 11 minutes ago.


Comment 6 Jeff Weiss 2009-03-24 19:01:52 UTC
I repeated the procedure on rev3509, still says in progress after more than 15 minutes.

Comment 7 Jeff Weiss 2009-03-24 19:07:19 UTC
<ips> jweiss: in worst case, it may take as long as 22 mins for the server to time out the group request; if it takes longer than that, something's wrong

It took over 15 minutes, so this did actually pass.  closing.

Comment 8 Red Hat Bugzilla 2009-11-10 20:46:05 UTC
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1760
This bug is related to RHQ-1632



Note You need to log in before you can comment on or make changes to this bug.