Bug 1012289
Summary: | Upgraded rhq agent is started before the old agent is stopped when upgrading from JON3.1.2.GA to JON3.2.ER1 | ||
---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | Filip Brychta <fbrychta> |
Component: | Upgrade | Assignee: | Jay Shaughnessy <jshaughn> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | JON 3.2 | CC: | fbrychta, hrupp, jshaughn, loleary, mazz, myarboro |
Target Milestone: | ER04 | Keywords: | Reopened |
Target Release: | JON 3.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-10-14 15:38:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1010354, 1012435 |
Description
Filip Brychta
2013-09-26 08:28:19 UTC
As far as I can tell the stop command is issued for the old agent. The issue here, I think, is that when the server is down the agent has trouble shutting down because it has to wait on some server messages timing out. I'm not exactly sure how to force a wait here. But, two things to note: 1) With the latest lifecycle changes to the install and upgrade commands the agent will no longer start automatically after upgrade. It will only start if the --start option is specified. 2) The documentation does instruct users to shut down agents prior to install or upgrade. One possibility is to actually exit the upgrade if the agent is running (i.e. if the pid file is present, although on Windows there is no pid file). I think perhaps since the default behavior should now avoid any issue like this that we may just be able to close this issue. Asking Filip to review the above and decide on whether to proceed. I'm not sure what is the best approach here. But at least some kind of warning before installation would be nice, because this issue probably causes bz 1013674 which is quite unpleasant. Plus looking at https://docs.jboss.org/author/display/RHQ/Upgrading+the+Server#UpgradingtheServer-Stopagentsinstalledwith{{rhqctl}}andwaitforthemtofullyshutdown the sentence 'Stop agents installed with rhqctl' is a bit confusing, because agents in previous JON versions are not installed via rhqctl. Documentation should clearly note, that agent running on the same machine as RHQ server should be stopped before the installation manually. I maybe missed something JON specific because i followed the upgrade manual for RHQ. I think this may be related to this one thread not dying issue, which is fixed in 3.2, but not in 3.1. Could we wait in the rhqctl upgrade command for a few seconds and then just kill the old agent away? see bug #1018887 that will make sure this is doc'ed Re-opening as documentation is not the way to handle product bugs. It seems that this is a legitimate issue that needs to be handled by the upgrade/installer. If we can't do that for 3.2 then this needs to be done as a post 3.2 task and identified as a KNOWN ISSUE for the 3.2 release. I have updated the wiki documentation to hopefully be more clear and to instruct pre-48 upgrades to always specify --from-agent-dir. Addtionally, instead of stopping the agent if it is still running (meaning they did not follow the upgrade doco), issue an rhq-agent-wrapper kill as opossed to a stop(on linux). This should avoid the shutdown hang issue in jon 3.1.x. release/jon3.2.x commit 061879db171f311a5f58d12a14a505a3d1014f99 - perform an agent kill as opposed to a stop when upgrading or reverting a failed install I have updated the wiki documentation to hopefully be more clear and to instruct pre-48 upgrades to always specify --from-agent-dir. Addtionally, instead of stopping the agent if it is still running (meaning they did not follow the upgrade doco), issue an rhq-agent-wrapper kill as opossed to a stop(on linux). This should avoid the shutdown hang issue in jon 3.1.x. release/jon3.2.x commit 061879db171f311a5f58d12a14a505a3d1014f99 - perform an agent kill as opposed to a stop when upgrading or reverting a failed install Moving to ON_QA for testing in the next build. Verified on Version: 3.2.0.ER4 Build Number: e413566:057b211 |