Bug 1110905 - Agent auto update fails on windows
Summary: Agent auto update fails on windows
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Agent, Installer
Version: 4.11
Hardware: Unspecified
OS: Windows
unspecified
urgent vote
Target Milestone: GA
: RHQ 4.12
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-18 18:24 UTC by Jay Shaughnessy
Modified: 2014-12-15 11:36 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-15 11:36:19 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 588381 None None None Never

Internal Links: 588381

Description Jay Shaughnessy 2014-06-18 18:24:31 UTC
When auto-updating a remote agent on Windows the auto-update seems to fail pretty regularly.  It gets most of the way but never completes, and leaves an unusable agent behind.

Comment 1 Jay Shaughnessy 2014-06-18 18:26:10 UTC
This may be a duplicate of Bug 588381, which is very old and has been closed.  But actually this may be an old problem. Either way, we'll now track the issue here.

Comment 2 Jay Shaughnessy 2014-06-18 18:36:51 UTC
Looking at the logs the issue seems to be file locking.  Locked files and/or directories ruin the update's ability to copy the legacy agent files to proper locations.

This is expected to happen if the agent is running interactively, or if files are being held unexpectedly, typically interactively, in command windows or editors for example.  

But this issue is different.  The agent is running as a service, and no files are held/locked due to interactive access as described above. It seems that the legacy agent has not fully stopped running before the update begins.  Looking at log files I see:

rhq-agent-wrapper.log:

 INFO   | jvm 1    | 2014/06/18 12:38:12 | Now executing agent update - if all
 goes well, this is the last you will hear of this agent: [ProcessExecution:
 executable=[c:\Program Files (x86)\Java\jdk1.7.0_45\bin\java.exe], args=[[-jar,
 C:\app\rhq\rhq-enterprise-agent-4.11.0.jar, --pause=20000, --
 update=C:\app\rhq\rhq-agent]], env-vars=[null], working-dir=[C:\app\rhq], wait=
 [0], capture-output=[false], kill-on-timeout=[false], executable-is-command=
 [true]]
 STATUS | wrapper  | 2014/06/18 12:39:14 | <-- Wrapper Stopped

   *** notice that the wrapper stopped at 12:39:14, a full 62s after agent
   *** update starts.

rhq-agent-update.log:

 Wed Jun 18 12:38:12 EDT 2014: Pausing for [20000] milliseconds...
 Wed Jun 18 12:38:32 EDT 2014: Done pausing. Continuing with the update.
 ======================================
 ANT target [backup-agent]
 Wed Jun 18 12:38:32 EDT 2014
 ======================================

  *** notice that agent update starts at 12:38:12, which is the same time we see
  *** in the first log, so that matches up correctly.  It then pauses 20s before
  *** continuing with the update, presumably to give time for the agent to
  *** finish shutting down.


So, it takes 62s to shut down but we wait only 20s, that's 42s, minimally, where file locking can ruin the upgrade.  And in fact that is exactly what happens:

 [update] [move] 
 jar:file:/C:/app/rhq/rhq-enterprise-agent-4.11.0.jar!/rhq-agent-update-
 build.xml:216: Unable to delete file C:\app\rhq\rhq-agent\lib\rhq-enterprise-
 comm-4.10.0.jar
	at org.apache.tools.ant.taskdefs.Move.moveFile(Move.java:208)


...and so on

Comment 3 Jay Shaughnessy 2014-06-18 18:42:10 UTC
master commit 03d1375a2e294c5de117def7707ef7b929114689
Author: Jay Shaughnessy <jshaughn@redhat.com>
Date:   Wed Jun 18 14:40:31 2014 -0400

    Add a longer pause on windows between stopping the legacy agent and
    starting the update.  On Windows the wrapper must terminate and windows
    must release file locks, this seems to account for some significant
    extra time needed.

Comment 4 Heiko W. Rupp 2014-12-15 11:36:19 UTC
Bulk close of items fixed in RHQ 4.12

If you think this is not solved, then please open a *new* BZ and link to this one.


Note You need to log in before you can comment on or make changes to this bug.