When auto-updating a remote agent on Windows the auto-update seems to fail pretty regularly. It gets most of the way but never completes, and leaves an unusable agent behind.
This may be a duplicate of Bug 588381, which is very old and has been closed. But actually this may be an old problem. Either way, we'll now track the issue here.
Looking at the logs the issue seems to be file locking. Locked files and/or directories ruin the update's ability to copy the legacy agent files to proper locations.
This is expected to happen if the agent is running interactively, or if files are being held unexpectedly, typically interactively, in command windows or editors for example.
But this issue is different. The agent is running as a service, and no files are held/locked due to interactive access as described above. It seems that the legacy agent has not fully stopped running before the update begins. Looking at log files I see:
INFO | jvm 1 | 2014/06/18 12:38:12 | Now executing agent update - if all
goes well, this is the last you will hear of this agent: [ProcessExecution:
executable=[c:\Program Files (x86)\Java\jdk1.7.0_45\bin\java.exe], args=[[-jar,
C:\app\rhq\rhq-enterprise-agent-4.11.0.jar, --pause=20000, --
update=C:\app\rhq\rhq-agent]], env-vars=[null], working-dir=[C:\app\rhq], wait=
, capture-output=[false], kill-on-timeout=[false], executable-is-command=
STATUS | wrapper | 2014/06/18 12:39:14 | <-- Wrapper Stopped
*** notice that the wrapper stopped at 12:39:14, a full 62s after agent
*** update starts.
Wed Jun 18 12:38:12 EDT 2014: Pausing for  milliseconds...
Wed Jun 18 12:38:32 EDT 2014: Done pausing. Continuing with the update.
ANT target [backup-agent]
Wed Jun 18 12:38:32 EDT 2014
*** notice that agent update starts at 12:38:12, which is the same time we see
*** in the first log, so that matches up correctly. It then pauses 20s before
*** continuing with the update, presumably to give time for the agent to
*** finish shutting down.
So, it takes 62s to shut down but we wait only 20s, that's 42s, minimally, where file locking can ruin the upgrade. And in fact that is exactly what happens:
build.xml:216: Unable to delete file C:\app\rhq\rhq-agent\lib\rhq-enterprise-
...and so on
master commit 03d1375a2e294c5de117def7707ef7b929114689
Author: Jay Shaughnessy <email@example.com>
Date: Wed Jun 18 14:40:31 2014 -0400
Add a longer pause on windows between stopping the legacy agent and
starting the update. On Windows the wrapper must terminate and windows
must release file locks, this seems to account for some significant
extra time needed.
Bulk close of items fixed in RHQ 4.12
If you think this is not solved, then please open a *new* BZ and link to this one.