Bug 1012289 - Upgraded rhq agent is started before the old agent is stopped when upgrading from JON3.1.2.GA to JON3.2.ER1
Summary: Upgraded rhq agent is started before the old agent is stopped when upgrading ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Upgrade
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ER04
: JON 3.2.0
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1010354 1012435
TreeView+ depends on / blocked
 
Reported: 2013-09-26 08:28 UTC by Filip Brychta
Modified: 2014-01-02 20:39 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-10-14 15:38:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1013674 0 unspecified CLOSED Upgraded agent was writing to rhq-agent-OLD/logs/agent.log after upgrade from JON3.1.0.GA to JON3.2.ER1 (only on local a... 2021-02-22 00:41:40 UTC

Internal Links: 1013674

Description Filip Brychta 2013-09-26 08:28:19 UTC
Description of problem:
After upgrade from JON3.1.2.GA to JON3.2.ER1, there were 2 agent processes running. Local old rhq agent process was still running and upgraded agent was complaining about port collision. This lasted for a few minutes, then the old agent was stopped and new agent was correctly started and registered with the server.
It seems that the installer doesn't wait for the old agent process to fully shutdown before starting the new agent.

Version-Release number of selected component (if applicable):
JON3.2.ER1

How reproducible:
2/2

Steps to Reproduce:
1. JON3.1.2.GA server and agent are installed and running
2. stop JON3.1.2.GA server (the agent is still running)
3. upgrade to JON3.2.ER1 
./rhqctl upgrade --from-server-dir /home/hudson/jon-server-3.1.2.GA/ --run-data-migrator do-it --storage-data-root-dir /home/hudson/ 


Actual results:
Two agent processes were running for ~ 5 minutes

Expected results:
Upgraded agent is started after the old agent is fully shutdown.

Comment 1 Jay Shaughnessy 2013-10-07 17:46:30 UTC
As far as I can tell the stop command is issued for the old agent.  The issue here, I think, is that when the server is down the agent has trouble shutting down  because it has to wait on some server messages timing out.

I'm not exactly sure how to force a wait here.

But, two things to note:
1) With the latest lifecycle changes to the install and upgrade commands the agent will no longer start automatically after upgrade.  It will only start if the --start option is specified.
2) The documentation does instruct users to shut down agents prior to install or upgrade.

One possibility is to actually exit the upgrade if the agent is running (i.e. if the pid file is present, although on Windows there is no pid file).  I think perhaps since the default behavior should now avoid any issue like this that we may just be able to close this issue.  

Asking Filip to review the above and decide on whether to proceed.

Comment 2 Filip Brychta 2013-10-08 10:56:53 UTC
I'm not sure what is the best approach here. But at least some kind of warning before installation would be nice, because this issue probably causes bz 1013674 which is quite unpleasant.

Plus looking at https://docs.jboss.org/author/display/RHQ/Upgrading+the+Server#UpgradingtheServer-Stopagentsinstalledwith{{rhqctl}}andwaitforthemtofullyshutdown the sentence 'Stop agents installed with rhqctl' is a bit confusing, because agents in previous JON versions are not installed via rhqctl. Documentation should clearly note, that agent running on the same machine as RHQ server should be stopped before the installation manually.

I maybe missed something JON specific because i followed the upgrade manual for RHQ.

Comment 3 Heiko W. Rupp 2013-10-14 10:34:15 UTC
I think this may be related to this one thread not dying issue, which is fixed in 3.2, but not in 3.1.

Could we wait in the rhqctl upgrade command for a few seconds and then just kill the old agent away?

Comment 4 John Mazzitelli 2013-10-14 15:38:41 UTC
see bug #1018887 that will make sure this is doc'ed

Comment 5 Larry O'Leary 2013-10-14 16:04:01 UTC
Re-opening as documentation is not the way to handle product bugs.

It seems that this is a legitimate issue that needs to be handled by the upgrade/installer. If we can't do that for 3.2 then this needs to be done as a post 3.2 task and identified as a KNOWN ISSUE for the 3.2 release.

Comment 6 Jay Shaughnessy 2013-10-14 16:46:07 UTC
I have updated the wiki documentation to hopefully be more clear and to instruct pre-48 upgrades to always specify --from-agent-dir.

Addtionally, instead of stopping the agent if it is still running (meaning they did not follow the upgrade doco), issue an rhq-agent-wrapper kill as opossed to a stop(on linux).  This should avoid the shutdown hang issue in jon 3.1.x.


release/jon3.2.x commit 061879db171f311a5f58d12a14a505a3d1014f99

- perform an agent kill as opposed to a stop when upgrading or reverting a failed install

Comment 7 Jay Shaughnessy 2013-10-14 16:47:02 UTC
I have updated the wiki documentation to hopefully be more clear and to instruct pre-48 upgrades to always specify --from-agent-dir.

Addtionally, instead of stopping the agent if it is still running (meaning they did not follow the upgrade doco), issue an rhq-agent-wrapper kill as opossed to a stop(on linux).  This should avoid the shutdown hang issue in jon 3.1.x.


release/jon3.2.x commit 061879db171f311a5f58d12a14a505a3d1014f99

- perform an agent kill as opposed to a stop when upgrading or reverting a failed install

Comment 8 Simeon Pinder 2013-10-24 04:09:33 UTC
Moving to ON_QA for testing in the next build.

Comment 9 Filip Brychta 2013-10-30 13:07:40 UTC
Verified on
Version: 3.2.0.ER4
Build Number: e413566:057b211


Note You need to log in before you can comment on or make changes to this bug.