Bug 1015825

Summary: Agent upon shutdown become a zombie - continues to run but never connects or quits
Product: [Other] RHQ Project Reporter: Elias Ross <genman>
Component: AgentAssignee: John Mazzitelli <mazz>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.9CC: genman, hrupp, mazz
Target Milestone: GA   
Target Release: RHQ 4.10   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-23 12:30:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Agent stuck stack trace
none
Agent patch
none
Additional fix for the agent; if the agent is really stuck, do kill -9 it
none
More fixes for InterruptedException bad handling
none
Rebased patch against 306c7fc31bf none

Description Elias Ross 2013-10-05 21:32:19 UTC
Created attachment 808286 [details]
Agent stuck stack trace

Some of my agents are running but never connect. They log the following statements over and over again. Restarting the agent manually fixes the issue but is problematic. The agent should exit cleanly rather than get into this sort of state. Attached is a stack trace.

2013-10-05 21:11:37,027 ERROR [ClientCommandSenderTask Timer Thread #531] (JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize 
callback has failed. It will be tried again. Cause: Initialize callback lock could not be acquired
2013-10-05 21:11:37,028 ERROR [ClientCommandSenderTask Thread #47] (ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepo
jo]; cmd-in-response=[false]; config=[{rhq.agent-name=xxx, rhq.retry=3, rhq.security-token=n/V6lvdpgycMgyKwE5n+7kxD+KRNk4v03yP0gus/8iw8UtXgqotne7Z7clyp
5cOW0Vk=, rhq.externalizable-strategy=AGENT, rhq.send-throttle=true, rhq.guaranteed-delivery=true}]; params=[{invocation=NameBasedInvocation[mergeMeasurementReport], targetInterf
aceName=org.rhq.core.clientapi.server.measurement.MeasurementServerService}]]. Cause: java.util.concurrent.TimeoutException:null. Cause: java.util.concurrent.TimeoutException
2013-10-05 21:11:37,028 WARN  [ClientCommandSenderTask Thread #47] (ClientCommandSenderTask)- {ClientCommandSenderTask.queuing-failed-command}The command that failed has its guar
anteed-delivery flag set so it is being queued again
2013-10-05 21:11:51,864 ERROR [ClientCommandSenderTask Timer Thread #534] (JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize 
callback has failed. It will be tried again. Cause: Initialize callback lock could not be acquired
2013-10-05 21:11:51,864 ERROR [ClientCommandSenderTask Thread #43] (ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepo
jo]; cmd-in-response=[false]; config=[{rhq.agent-name=xxx, rhq.externalizable-strategy=AGENT, rhq.security-token=n/V6lvdpgycMgyKwE5n+7kxD+KRNk4v03yP0gu
s/8iw8UtXgqotne7Z7clyp5cOW0Vk=, rhq.retry=1, rhq.guaranteed-delivery=true, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[mergeMeasurementReport], targetInterf
aceName=org.rhq.core.clientapi.server.measurement.MeasurementServerService}]]. Cause: java.util.concurrent.TimeoutException:null. Cause: java.util.concurrent.TimeoutException
2013-10-05 21:11:51,865 WARN  [ClientCommandSenderTask Thread #43] (ClientCommandSenderTask)- {ClientCommandSenderTask.queuing-failed-command}The command that failed has its guar
anteed-delivery flag set so it is being queued again
2013-10-05 21:12:05,868 ERROR [ClientCommandSenderTask Timer Thread #538] (JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize 
callback has failed. It will be tried again. Cause: Initialize callback lock could not be acquired
2013-10-05 21:12:05,869 ERROR [ClientCommandSenderTask Thread #45] (ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepo
jo]; cmd-in-response=[false]; config=[{rhq.agent-name=xxxx, rhq.retry=3, rhq.security-token=n/V6lvdpgycMgyKwE5n+7kxD+KRNk4v03yP0gus/8iw8UtXgqotne7Z7clyp
5cOW0Vk=, rhq.externalizable-strategy=AGENT, rhq.send-throttle=true, rhq.guaranteed-delivery=true}]; params=[{invocation=NameBasedInvocation[mergeMeasurementReport], targetInterf
aceName=org.rhq.core.clientapi.server.measurement.MeasurementServerService}]]. Cause: java.util.concurrent.TimeoutException:null. Cause: java.util.concurrent.TimeoutException
...
2013-10-05 21:12:05,869 WARN  [ClientCommandSenderTask Thread #46] (ClientCommandSenderTask)- {ClientCommandSenderTask.queuing-failed-command}The command that failed has its guaranteed-delivery flag set so it is being queued again

Comment 1 Elias Ross 2013-11-04 22:50:44 UTC
Created attachment 819402 [details]
Agent patch

Comment 2 Elias Ross 2013-11-04 22:51:31 UTC
Created attachment 819403 [details]
Additional fix for the agent; if the agent is really stuck, do kill -9 it

Comment 3 Elias Ross 2013-11-04 22:52:22 UTC
Created attachment 819404 [details]
More fixes for InterruptedException bad handling

Comment 4 John Mazzitelli 2013-11-19 22:09:16 UTC
I can't apply

From 1e4da959c236d82ecb9fed789c554fc4cc336056 Mon Sep 17 00:00:00 2001
From: Elias Ross <elias_ross>
Date: Sat, 5 Oct 2013 15:30:47 -0700
Subject: [PATCH 1/3] BZ 1015734 - clean up shutdown of agent

cleanly to master via git am. conflicts.

I can apply the other 2. Any change Elias you can fix the patch for master? I'm looking at this now.

Comment 5 John Mazzitelli 2013-11-20 13:34:51 UTC
Elias - see comment #4 - I was going to take a look at this and merge into master if it passes all tests and looks good. But I can't apply PATCH 1/3 via git am - can you see if you can build another patch that can apply to master?

Comment 6 Elias Ross 2013-11-20 16:33:27 UTC
Created attachment 826739 [details]
Rebased patch against 306c7fc31bf

Comment 7 John Mazzitelli 2013-11-20 22:45:22 UTC
tested and peer reviewed the patches, git commit to master the following:

f92d05fc24bb26d9c7f62e5d17872c1c02262496
b462f1538b5d050c3efeeee8b73a84b111d6d60e
4a3eab4dbc78d89f0148ddac5c007c10b0853c71

Comment 8 Heiko W. Rupp 2014-04-23 12:30:43 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.