Bug 1110905

Summary: Agent auto update fails on windows
Product: [Other] RHQ Project Reporter: Jay Shaughnessy <jshaughn>
Component: Agent, InstallerAssignee: Jay Shaughnessy <jshaughn>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.11CC: hrupp
Target Milestone: GA   
Target Release: RHQ 4.12   
Hardware: Unspecified   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-15 11:36:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jay Shaughnessy 2014-06-18 18:24:31 UTC
When auto-updating a remote agent on Windows the auto-update seems to fail pretty regularly.  It gets most of the way but never completes, and leaves an unusable agent behind.

Comment 1 Jay Shaughnessy 2014-06-18 18:26:10 UTC
This may be a duplicate of Bug 588381, which is very old and has been closed.  But actually this may be an old problem. Either way, we'll now track the issue here.

Comment 2 Jay Shaughnessy 2014-06-18 18:36:51 UTC
Looking at the logs the issue seems to be file locking.  Locked files and/or directories ruin the update's ability to copy the legacy agent files to proper locations.

This is expected to happen if the agent is running interactively, or if files are being held unexpectedly, typically interactively, in command windows or editors for example.  

But this issue is different.  The agent is running as a service, and no files are held/locked due to interactive access as described above. It seems that the legacy agent has not fully stopped running before the update begins.  Looking at log files I see:

rhq-agent-wrapper.log:

 INFO   | jvm 1    | 2014/06/18 12:38:12 | Now executing agent update - if all
 goes well, this is the last you will hear of this agent: [ProcessExecution:
 executable=[c:\Program Files (x86)\Java\jdk1.7.0_45\bin\java.exe], args=[[-jar,
 C:\app\rhq\rhq-enterprise-agent-4.11.0.jar, --pause=20000, --
 update=C:\app\rhq\rhq-agent]], env-vars=[null], working-dir=[C:\app\rhq], wait=
 [0], capture-output=[false], kill-on-timeout=[false], executable-is-command=
 [true]]
 STATUS | wrapper  | 2014/06/18 12:39:14 | <-- Wrapper Stopped

   *** notice that the wrapper stopped at 12:39:14, a full 62s after agent
   *** update starts.

rhq-agent-update.log:

 Wed Jun 18 12:38:12 EDT 2014: Pausing for [20000] milliseconds...
 Wed Jun 18 12:38:32 EDT 2014: Done pausing. Continuing with the update.
 ======================================
 ANT target [backup-agent]
 Wed Jun 18 12:38:32 EDT 2014
 ======================================

  *** notice that agent update starts at 12:38:12, which is the same time we see
  *** in the first log, so that matches up correctly.  It then pauses 20s before
  *** continuing with the update, presumably to give time for the agent to
  *** finish shutting down.


So, it takes 62s to shut down but we wait only 20s, that's 42s, minimally, where file locking can ruin the upgrade.  And in fact that is exactly what happens:

 [update] [move] 
 jar:file:/C:/app/rhq/rhq-enterprise-agent-4.11.0.jar!/rhq-agent-update-
 build.xml:216: Unable to delete file C:\app\rhq\rhq-agent\lib\rhq-enterprise-
 comm-4.10.0.jar
	at org.apache.tools.ant.taskdefs.Move.moveFile(Move.java:208)


...and so on

Comment 3 Jay Shaughnessy 2014-06-18 18:42:10 UTC
master commit 03d1375a2e294c5de117def7707ef7b929114689
Author: Jay Shaughnessy <jshaughn>
Date:   Wed Jun 18 14:40:31 2014 -0400

    Add a longer pause on windows between stopping the legacy agent and
    starting the update.  On Windows the wrapper must terminate and windows
    must release file locks, this seems to account for some significant
    extra time needed.

Comment 4 Heiko W. Rupp 2014-12-15 11:36:19 UTC
Bulk close of items fixed in RHQ 4.12

If you think this is not solved, then please open a *new* BZ and link to this one.