Bug 1015825 - Agent upon shutdown become a zombie - continues to run but never connects or quits
Agent upon shutdown become a zombie - continues to run but never connects or ...
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Agent (Show other bugs)
4.9
Unspecified Linux
unspecified Severity unspecified (vote)
: GA
: RHQ 4.10
Assigned To: John Mazzitelli
Mike Foley
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 17:32 EDT by Elias Ross
Modified: 2014-04-23 08:30 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-04-23 08:30:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Agent stuck stack trace (34.04 KB, text/plain)
2013-10-05 17:32 EDT, Elias Ross
no flags Details
Agent patch (75.30 KB, patch)
2013-11-04 17:50 EST, Elias Ross
no flags Details | Diff
Additional fix for the agent; if the agent is really stuck, do kill -9 it (2.21 KB, application/mbox)
2013-11-04 17:51 EST, Elias Ross
no flags Details
More fixes for InterruptedException bad handling (18.57 KB, application/mbox)
2013-11-04 17:52 EST, Elias Ross
no flags Details
Rebased patch against 306c7fc31bf (75.33 KB, application/mbox)
2013-11-20 11:33 EST, Elias Ross
no flags Details

  None (edit)
Description Elias Ross 2013-10-05 17:32:19 EDT
Created attachment 808286 [details]
Agent stuck stack trace

Some of my agents are running but never connect. They log the following statements over and over again. Restarting the agent manually fixes the issue but is problematic. The agent should exit cleanly rather than get into this sort of state. Attached is a stack trace.

2013-10-05 21:11:37,027 ERROR [ClientCommandSenderTask Timer Thread #531] (JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize 
callback has failed. It will be tried again. Cause: Initialize callback lock could not be acquired
2013-10-05 21:11:37,028 ERROR [ClientCommandSenderTask Thread #47] (ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepo
jo]; cmd-in-response=[false]; config=[{rhq.agent-name=xxx, rhq.retry=3, rhq.security-token=n/V6lvdpgycMgyKwE5n+7kxD+KRNk4v03yP0gus/8iw8UtXgqotne7Z7clyp
5cOW0Vk=, rhq.externalizable-strategy=AGENT, rhq.send-throttle=true, rhq.guaranteed-delivery=true}]; params=[{invocation=NameBasedInvocation[mergeMeasurementReport], targetInterf
aceName=org.rhq.core.clientapi.server.measurement.MeasurementServerService}]]. Cause: java.util.concurrent.TimeoutException:null. Cause: java.util.concurrent.TimeoutException
2013-10-05 21:11:37,028 WARN  [ClientCommandSenderTask Thread #47] (ClientCommandSenderTask)- {ClientCommandSenderTask.queuing-failed-command}The command that failed has its guar
anteed-delivery flag set so it is being queued again
2013-10-05 21:11:51,864 ERROR [ClientCommandSenderTask Timer Thread #534] (JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize 
callback has failed. It will be tried again. Cause: Initialize callback lock could not be acquired
2013-10-05 21:11:51,864 ERROR [ClientCommandSenderTask Thread #43] (ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepo
jo]; cmd-in-response=[false]; config=[{rhq.agent-name=xxx, rhq.externalizable-strategy=AGENT, rhq.security-token=n/V6lvdpgycMgyKwE5n+7kxD+KRNk4v03yP0gu
s/8iw8UtXgqotne7Z7clyp5cOW0Vk=, rhq.retry=1, rhq.guaranteed-delivery=true, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[mergeMeasurementReport], targetInterf
aceName=org.rhq.core.clientapi.server.measurement.MeasurementServerService}]]. Cause: java.util.concurrent.TimeoutException:null. Cause: java.util.concurrent.TimeoutException
2013-10-05 21:11:51,865 WARN  [ClientCommandSenderTask Thread #43] (ClientCommandSenderTask)- {ClientCommandSenderTask.queuing-failed-command}The command that failed has its guar
anteed-delivery flag set so it is being queued again
2013-10-05 21:12:05,868 ERROR [ClientCommandSenderTask Timer Thread #538] (JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize 
callback has failed. It will be tried again. Cause: Initialize callback lock could not be acquired
2013-10-05 21:12:05,869 ERROR [ClientCommandSenderTask Thread #45] (ClientCommandSenderTask)- {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepo
jo]; cmd-in-response=[false]; config=[{rhq.agent-name=xxxx, rhq.retry=3, rhq.security-token=n/V6lvdpgycMgyKwE5n+7kxD+KRNk4v03yP0gus/8iw8UtXgqotne7Z7clyp
5cOW0Vk=, rhq.externalizable-strategy=AGENT, rhq.send-throttle=true, rhq.guaranteed-delivery=true}]; params=[{invocation=NameBasedInvocation[mergeMeasurementReport], targetInterf
aceName=org.rhq.core.clientapi.server.measurement.MeasurementServerService}]]. Cause: java.util.concurrent.TimeoutException:null. Cause: java.util.concurrent.TimeoutException
...
2013-10-05 21:12:05,869 WARN  [ClientCommandSenderTask Thread #46] (ClientCommandSenderTask)- {ClientCommandSenderTask.queuing-failed-command}The command that failed has its guaranteed-delivery flag set so it is being queued again
Comment 1 Elias Ross 2013-11-04 17:50:44 EST
Created attachment 819402 [details]
Agent patch
Comment 2 Elias Ross 2013-11-04 17:51:31 EST
Created attachment 819403 [details]
Additional fix for the agent; if the agent is really stuck, do kill -9 it
Comment 3 Elias Ross 2013-11-04 17:52:22 EST
Created attachment 819404 [details]
More fixes for InterruptedException bad handling
Comment 4 John Mazzitelli 2013-11-19 17:09:16 EST
I can't apply

From 1e4da959c236d82ecb9fed789c554fc4cc336056 Mon Sep 17 00:00:00 2001
From: Elias Ross <elias_ross@apple.com>
Date: Sat, 5 Oct 2013 15:30:47 -0700
Subject: [PATCH 1/3] BZ 1015734 - clean up shutdown of agent

cleanly to master via git am. conflicts.

I can apply the other 2. Any change Elias you can fix the patch for master? I'm looking at this now.
Comment 5 John Mazzitelli 2013-11-20 08:34:51 EST
Elias - see comment #4 - I was going to take a look at this and merge into master if it passes all tests and looks good. But I can't apply PATCH 1/3 via git am - can you see if you can build another patch that can apply to master?
Comment 6 Elias Ross 2013-11-20 11:33:27 EST
Created attachment 826739 [details]
Rebased patch against 306c7fc31bf
Comment 7 John Mazzitelli 2013-11-20 17:45:22 EST
tested and peer reviewed the patches, git commit to master the following:

f92d05fc24bb26d9c7f62e5d17872c1c02262496
b462f1538b5d050c3efeeee8b73a84b111d6d60e
4a3eab4dbc78d89f0148ddac5c007c10b0853c71
Comment 8 Heiko W. Rupp 2014-04-23 08:30:43 EDT
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.

Note You need to log in before you can comment on or make changes to this bug.