Bug 976882

Summary: Agent upgrade fails on slow server startup
Product: [Other] RHQ Project Reporter: Stefan Negrea <snegrea>
Component: AgentAssignee: RHQ Project Maintainer <rhq-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.7CC: ahovsepy, hrupp, mazz
Target Milestone: ---   
Target Release: RHQ 4.8   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-11 09:53:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Agent log file
none
storage_node_95
none
storage_node_106
none
rhqctl_upgrade-agent_logs none

Description Stefan Negrea 2013-06-21 18:26:58 UTC
Created attachment 763933 [details]
Agent log file

Description of problem:
Upgrade from pre-4.8 to 4.8 fails when the server is slow to startup. The agent gives up on waiting the server to become available and installs old plugins causing all sorts of issues. The problem does not get corrected when the server starts properly.


How reproducible:
Every time on environments where the server is slow to startup (eg. slower processor, disk, network).

Steps to Reproduce:
1. Install a pre-4.8 RHQ environment
2. Upgrade to RHQ 4.8 
3. Check to see the RHQ Storage Node is imported

Actual results:
The agent starts but with errors. Please see attached log file. The newly installed RHQ Storage Node is not discovered and inventoried by the agent.

Expected results:
The upgrade succeeds, the agent starts and the newly installed RHQ Storage Node is inventoried automatically.


Additional info:
This issue can be fixed by clearing all plugins from the agent during the upgrade process.

Comment 1 John Mazzitelli 2013-06-21 18:30:23 UTC
I think this stems from the fact that when you upgrade an agent, the new agent gets the old agent's plugins. We don't want this. We should leave the plugins directory empty in the new agent and make it download the new plugins from the new server.

So in rhq-agent-update-build.xml, we need to remove these lines:

-      <!-- if there are plugins, keep them -->
-      <echo>Copy existing plugins from the old agent to the new agent</echo>
-      <copy todir="${_update.tmp.dir}/rhq-agent/plugins">
-        <fileset dir="${rhq.agent.update.update-agent-dir}/plugins"/>
-      </copy>

Now, when the new agent starts, it can't start the PC until it downloads the new plugins from the new server.

Comment 2 John Mazzitelli 2013-06-21 18:33:28 UTC
pushed to master: a1ae22c

Comment 3 Armine Hovsepyan 2013-06-24 14:32:16 UTC
verified.

upgrade from 4.5.1 in 10.16.23.95 and 10.16.23.106 went well - storage node was discovered and auto-inventoried, no more exceptions in the log.

Please get screenshots attached.

Comment 4 Armine Hovsepyan 2013-06-24 14:36:12 UTC
Created attachment 764657 [details]
storage_node_95

Comment 5 Armine Hovsepyan 2013-06-24 14:36:37 UTC
Created attachment 764658 [details]
storage_node_106

Comment 6 Armine Hovsepyan 2013-06-24 14:37:14 UTC
Created attachment 764659 [details]
rhqctl_upgrade-agent_logs

Comment 7 Heiko W. Rupp 2013-09-11 09:53:43 UTC
Bulk closing of old issues now that HRQ 4.9 is in front of the door.

If you think the issue has not been solved, then please open a new bug and mention this one in the description.