How to repeat: Go to a group configuration page (I used datasource compat group, all on same AS server). Edit the config, before submitting, stop the agent managing that AS server. Submit the config change. What happens is it stays "in progress" forever. It should time out after a short period, 5-10 minutes? The "Current" page will show an error saying the config update is in progress. From server log: 2009-03-10 13:54:59,662 INFO [org.rhq.enterprise.server.core.AgentManagerBean] Agent with name [witte.usersys.redhat.com] just went down 2009-03-10 13:55:28,080 ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[updateResourceConfiguration], targetInterfaceName=org.rhq.core.clientapi.agent.configuration.ConfigurationAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] 2009-03-10 13:55:28,247 ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[updateResourceConfiguration], targetInterfaceName=org.rhq.core.clientapi.agent.configuration.ConfigurationAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] 2009-03-10 13:55:28,388 ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[updateResourceConfiguration], targetInterfaceName=org.rhq.core.clientapi.agent.configuration.ConfigurationAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://10.11.231.18:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] 2009-03-10 13:55:28,458 ERROR [org.rhq.enterprise.server.configuration.job.AggregateResourceConfigurationUpdateJob] Failed to execute one or more Resource Configuration updates that were part of a group update - details: none available 2009-03-10 13:55:39,182 ERROR [STDERR] java.lang.Exception: Current group Resource configuration for org.rhq.core.domain.resource.group.ResourceGroup@5a629453 cannot be calculated, because a group Resource configuration update is currently in progress. 2009-03-10 13:55:39,183 ERROR [STDERR] at org.rhq.enterprise.server.configuration.ConfigurationManagerBean.ensureNoResourceConfigurationUpdatesInProgress(ConfigurationManagerBean.java:558) 2009-03-10 13:55:39,183 ERROR [STDERR] at org.rhq.enterprise.server.configuration.ConfigurationManagerBean.getResourceConfigurationsForCompatibleGroup(ConfigurationManagerBean.java:508) 2009-03-10 13:55:39,183 ERROR [STDERR] at sun.reflect.GeneratedMethodAccessor2879.invoke(Unknown Source) 2009-03-10 13:55:39,183 ERROR [STDERR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-10 13:55:39,184 ERROR [STDERR] at java.lang.reflect.Method.invoke(Method.java:597) 2009-03-10 13:55:39,184 ERROR [STDERR] at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:112) 2009-03-10 13:55:39,184 ERROR [STDERR] at org.jboss.ejb3.interceptor.InvocationContextImpl.proceed(InvocationContextImpl.java:166) 2009-03-10 13:55:39,184 ERROR [STDERR] at org.rhq.enterprise.server.common.TransactionInterruptInterceptor.addCheckedActionToTransactionManager(TransactionInterruptInterceptor.java:77) 2009-03-10 13:55:39,184 ERROR [STDERR] at sun.reflect.GeneratedMethodAccessor157.invoke(Unknown Source) 2009-03-10 13:55:39,184 ERROR [STDERR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 2009-03-10 13:55:39,184 ERROR [STDERR] at java.lang.reflect.Method.invoke(Method.java:597)
This also seems to permanently disable any further viewing or updating of the config for that group, even when the agent comes back. Upping priority.
Workaround: delete the in-progress event history item. lowering priority.
The Server has a reaper job that should time out defunct group config updates (i.e. updates where the Agent went down in the middle of the update) after 10-20 minutes. If you see the update still in the history after the Agent has been down for > 20 minutes, then there's an issue. Otherwise, things are working as intended.
I know i waited at least 30 minutes before, I just repeated the test, it's been 1 hour 20 minutes now, still "In Progress". It's definitely broken.
r3451 fixes this - the CheckForTimedOutConfigUpdatesJob Quartz job will now time out group config update requests that were initiated more than 11 minutes ago.
I repeated the procedure on rev3509, still says in progress after more than 15 minutes.
<ips> jweiss: in worst case, it may take as long as 22 mins for the server to time out the group request; if it takes longer than that, something's wrong It took over 15 minutes, so this did actually pass. closing.
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1760 This bug is related to RHQ-1632