Bug 976882 - Agent upgrade fails on slow server startup
Agent upgrade fails on slow server startup
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Agent (Show other bugs)
4.7
Unspecified Unspecified
urgent Severity urgent (vote)
: ---
: RHQ 4.8
Assigned To: RHQ Project Maintainer
Mike Foley
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-21 14:26 EDT by Stefan Negrea
Modified: 2013-09-11 05:53 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-11 05:53:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Agent log file (4.72 MB, text/x-log)
2013-06-21 14:26 EDT, Stefan Negrea
no flags Details
storage_node_95 (168.66 KB, image/png)
2013-06-24 10:36 EDT, Armine Hovsepyan
no flags Details
storage_node_106 (156.33 KB, image/png)
2013-06-24 10:36 EDT, Armine Hovsepyan
no flags Details
rhqctl_upgrade-agent_logs (619.81 KB, image/png)
2013-06-24 10:37 EDT, Armine Hovsepyan
no flags Details

  None (edit)
Description Stefan Negrea 2013-06-21 14:26:58 EDT
Created attachment 763933 [details]
Agent log file

Description of problem:
Upgrade from pre-4.8 to 4.8 fails when the server is slow to startup. The agent gives up on waiting the server to become available and installs old plugins causing all sorts of issues. The problem does not get corrected when the server starts properly.


How reproducible:
Every time on environments where the server is slow to startup (eg. slower processor, disk, network).

Steps to Reproduce:
1. Install a pre-4.8 RHQ environment
2. Upgrade to RHQ 4.8 
3. Check to see the RHQ Storage Node is imported

Actual results:
The agent starts but with errors. Please see attached log file. The newly installed RHQ Storage Node is not discovered and inventoried by the agent.

Expected results:
The upgrade succeeds, the agent starts and the newly installed RHQ Storage Node is inventoried automatically.


Additional info:
This issue can be fixed by clearing all plugins from the agent during the upgrade process.
Comment 1 John Mazzitelli 2013-06-21 14:30:23 EDT
I think this stems from the fact that when you upgrade an agent, the new agent gets the old agent's plugins. We don't want this. We should leave the plugins directory empty in the new agent and make it download the new plugins from the new server.

So in rhq-agent-update-build.xml, we need to remove these lines:

-      <!-- if there are plugins, keep them -->
-      <echo>Copy existing plugins from the old agent to the new agent</echo>
-      <copy todir="${_update.tmp.dir}/rhq-agent/plugins">
-        <fileset dir="${rhq.agent.update.update-agent-dir}/plugins"/>
-      </copy>

Now, when the new agent starts, it can't start the PC until it downloads the new plugins from the new server.
Comment 2 John Mazzitelli 2013-06-21 14:33:28 EDT
pushed to master: a1ae22c
Comment 3 Armine Hovsepyan 2013-06-24 10:32:16 EDT
verified.

upgrade from 4.5.1 in 10.16.23.95 and 10.16.23.106 went well - storage node was discovered and auto-inventoried, no more exceptions in the log.

Please get screenshots attached.
Comment 4 Armine Hovsepyan 2013-06-24 10:36:12 EDT
Created attachment 764657 [details]
storage_node_95
Comment 5 Armine Hovsepyan 2013-06-24 10:36:37 EDT
Created attachment 764658 [details]
storage_node_106
Comment 6 Armine Hovsepyan 2013-06-24 10:37:14 EDT
Created attachment 764659 [details]
rhqctl_upgrade-agent_logs
Comment 7 Heiko W. Rupp 2013-09-11 05:53:43 EDT
Bulk closing of old issues now that HRQ 4.9 is in front of the door.

If you think the issue has not been solved, then please open a new bug and mention this one in the description.

Note You need to log in before you can comment on or make changes to this bug.