Bug 1110905
Summary: | Agent auto update fails on windows | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Jay Shaughnessy <jshaughn> |
Component: | Agent, Installer | Assignee: | Jay Shaughnessy <jshaughn> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.11 | CC: | hrupp |
Target Milestone: | GA | ||
Target Release: | RHQ 4.12 | ||
Hardware: | Unspecified | ||
OS: | Windows | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-12-15 11:36:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jay Shaughnessy
2014-06-18 18:24:31 UTC
This may be a duplicate of Bug 588381, which is very old and has been closed. But actually this may be an old problem. Either way, we'll now track the issue here. Looking at the logs the issue seems to be file locking. Locked files and/or directories ruin the update's ability to copy the legacy agent files to proper locations. This is expected to happen if the agent is running interactively, or if files are being held unexpectedly, typically interactively, in command windows or editors for example. But this issue is different. The agent is running as a service, and no files are held/locked due to interactive access as described above. It seems that the legacy agent has not fully stopped running before the update begins. Looking at log files I see: rhq-agent-wrapper.log: INFO | jvm 1 | 2014/06/18 12:38:12 | Now executing agent update - if all goes well, this is the last you will hear of this agent: [ProcessExecution: executable=[c:\Program Files (x86)\Java\jdk1.7.0_45\bin\java.exe], args=[[-jar, C:\app\rhq\rhq-enterprise-agent-4.11.0.jar, --pause=20000, -- update=C:\app\rhq\rhq-agent]], env-vars=[null], working-dir=[C:\app\rhq], wait= [0], capture-output=[false], kill-on-timeout=[false], executable-is-command= [true]] STATUS | wrapper | 2014/06/18 12:39:14 | <-- Wrapper Stopped *** notice that the wrapper stopped at 12:39:14, a full 62s after agent *** update starts. rhq-agent-update.log: Wed Jun 18 12:38:12 EDT 2014: Pausing for [20000] milliseconds... Wed Jun 18 12:38:32 EDT 2014: Done pausing. Continuing with the update. ====================================== ANT target [backup-agent] Wed Jun 18 12:38:32 EDT 2014 ====================================== *** notice that agent update starts at 12:38:12, which is the same time we see *** in the first log, so that matches up correctly. It then pauses 20s before *** continuing with the update, presumably to give time for the agent to *** finish shutting down. So, it takes 62s to shut down but we wait only 20s, that's 42s, minimally, where file locking can ruin the upgrade. And in fact that is exactly what happens: [update] [move] jar:file:/C:/app/rhq/rhq-enterprise-agent-4.11.0.jar!/rhq-agent-update- build.xml:216: Unable to delete file C:\app\rhq\rhq-agent\lib\rhq-enterprise- comm-4.10.0.jar at org.apache.tools.ant.taskdefs.Move.moveFile(Move.java:208) ...and so on master commit 03d1375a2e294c5de117def7707ef7b929114689 Author: Jay Shaughnessy <jshaughn> Date: Wed Jun 18 14:40:31 2014 -0400 Add a longer pause on windows between stopping the legacy agent and starting the update. On Windows the wrapper must terminate and windows must release file locks, this seems to account for some significant extra time needed. Bulk close of items fixed in RHQ 4.12 If you think this is not solved, then please open a *new* BZ and link to this one. |