Bug 1118906

Summary: installing in a path that has symlink causes agent upgrade to fail
Product: [Other] RHQ Project Reporter: John Mazzitelli <mazz>
Component: InstallerAssignee: John Mazzitelli <mazz>
Status: ON_QA --- QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.12CC: hrupp
Target Milestone: GA   
Target Release: RHQ 4.13   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Mazzitelli 2014-07-11 21:26:02 UTC
If you do "rhqctl upgrade" with a --from-server-dir as a directory with a symlink, the agent upgrade fails.

When you use the actual hardlink path, the upgrade works.

Comment 1 John Mazzitelli 2014-07-14 20:25:27 UTC
Here's the output of the agent portion of the upgrade - note that I ran the agent update .jar by itself before as a test and it all works. Something must be wrong with the way rhqctl starts the upgrade process:

In this upgrade run, the symlink is /home/mazz/Desktop/rhqtest/rhqlink which points to the actual location of /work2/rhqdeleteme/

16:19:28,550 INFO  [org.rhq.server.control.command.Upgrade] Upgrading RHQ agent located at: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
======================================
ANT target [backup-agent]
Mon Jul 14 16:19:28 EDT 2014
======================================
[backup-agent] [echo] === BACKING UP CURRENT AGENT ===
[backup-agent] [echo] From: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
[backup-agent] [echo] To: /tmp/rhq-agent-update-backup
[backup-agent] [copy] Copying 76 files to /tmp/rhq-agent-update-backup
[backup-agent] [copy] Copied 12 empty directories to 1 empty directory under /tmp/rhq-agent-update-backup
[backup-agent] [echo] === BACKUP OF CURRENT AGENT COMPLETE ===
======================================
ANT target [(default)]
Mon Jul 14 16:19:29 EDT 2014
======================================
[header-for-update] [echo] 
===== RHQ AGENT UPDATE =====
Agent To Be Updated: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
Version: 4.13.0-SNAPSHOT
Build Number: 89a2c43
Jar File: /work2/rhqdeleteme/rhq-server-new/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-agent/rhq-enterprise-agent-4.13.0-SNAPSHOT.jar
[update] [echo] Extract the agent distro zip from the agent update binary, place in temporary update dir
[update] [unjar] Expanding: /work2/rhqdeleteme/rhq-server-new/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-agent/rhq-enterprise-agent-4.13.0-SNAPSHOT.jar into /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43
[update] [echo] Unzip the agent distro into the temporary update directory
[update] [unzip] Expanding: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43/rhq-enterprise-agent-4.13.0-SNAPSHOT.zip into /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43
[update] [echo] Remove the agent distro zip
[update] [delete] Deleting: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43/rhq-enterprise-agent-4.13.0-SNAPSHOT.zip
[update] [echo] 
      At this point, we have the new (but raw) agent extracted to: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43/rhq-agent
      The old, existing agent (the one we are upgrading) is at: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
      
[update] [echo] Copy existing failover list from the old agent to the new agent
[update] [copy] Warning: Could not find file /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/data/failover-list.dat to copy.
[update] [echo] Copy existing keystore and truststore files from the old agent to the new agent
[update] [copy] Warning: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/data not found.
[update] [echo] Copy existing Java Service Wrapper configuration files from the old agent to the new agent
[update] [echo] Copy Windows environment script - keep the old copy in effect and backup the new copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy UNIX environment script - keep the old copy in effect and backup the new copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy Windows wrapper launch script - use the new copy and backup the old copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy UNIX wrapper launch script - use the new copy and backup the old copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy Java Service Wrapper configuration - use the new copy and backup the old copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy custom scripts that may have existed in the agent bin directory
[update] [echo] Ensure any custom scripts that were copied retain execute permissions
[update] [echo] Copy agent configuration file - use the old copy and backup the new copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy log4j configuration file - use the old copy and backup the new copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] 
      Everything is done; _update.tmp.dir/rhq-agent has our new agent.
      The old agent is still intact, albeit with a new update subdirectory with the new agent in it.
      Start moving things around to get the new agent in the old agent's directory.
      If anything fails going forward, bad things will happen because the old agent will be ruined.
      
[update] [echo] Define where we are going to put the old agent - this is where the old agent will be backed up
[update] [echo] Purge any previously backed-up old agent
[update] [echo] Move the old agent to the backup location
[update] [echo] Make sure the location where the new agent is going to be is empty
[update] [echo] Put the new agent in its new location.
[update] [echo] Clean up the temporary location where the new agent was located (but no longer is)
[update] [delete] Deleting directory /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent-OLD/update-89a2c43
[update] [echo] 
      At this point, the new agent should now be at the old agent's location.
      The old agent is backed up to the rhq-agent-OLD directory next to new agent
      
[update] [echo] chmod +x on executables under /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
[update] [echo] The agent is updated and we don't need the backup anymore, so remove the first backup we made
[update] [delete] Deleting directory /tmp/rhq-agent-update-backup
[update] [echo] 
      FINISHED UPDATING THE AGENT SUCCESSFULLY!
      The new agent is now located at: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
      
16:19:31,177 INFO  [org.rhq.server.control.command.Upgrade] The agent installer finished upgrading with exit value 0
16:19:31,205 ERROR [org.rhq.server.control.command.Upgrade] An error occurred while upgrading the agent: Source directory [/home/mazz/Desktop/rhqtest/rhqlink/rhq-agent] does not exist
Trying to stop the RHQ Server...
RHQ Server (pid=16777) is stopping...
RHQ Server has stopped.
Stopping RHQ storage node...
RHQ storage node (pid=16540) is stopping...
RHQ storage node has stopped
16:19:39,306 ERROR [org.rhq.server.control.RHQControl] Source directory [/home/mazz/Desktop/rhqtest/rhqlink/rhq-agent] does not exist
16:19:39,306 WARN  [org.rhq.server.control.command.Upgrade] UNDO: Removing server-installed marker file and management user
16:19:39,307 WARN  [org.rhq.server.control.command.Upgrade] UNDO: Stopping component: --server
16:19:39,308 WARN  [org.rhq.server.control.command.Upgrade] UNDO: Removing new storage node install directory
16:19:39,322 WARN  [org.rhq.server.control.command.Upgrade] UNDO: Stopping component: --storage
16:19:39,323 WARN  [org.rhq.server.control.command.Stop] It appears that the storage node is not installed. The --storage option will be ignored.
16:19:39,323 WARN  [org.rhq.server.control.command.Upgrade] UNDO: Reverting server properties file

Here's the contents of the directory after this was run:

$ ls /home/mazz/Desktop/rhqtest/rhqlink
rhq-agent-OLD  rhq-data  rhq-server-new  rhq-server-old

Comment 2 John Mazzitelli 2014-07-14 20:39:50 UTC
Here's me trying just the agent update binary to do the upgrade - this appears to work, but yet the output looks identical to when rhqctl does it:

$ java -jar agent.jar --launch=false --update=/home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
======================================
ANT target [backup-agent]
Mon Jul 14 16:36:53 EDT 2014
======================================
[backup-agent] [echo] === BACKING UP CURRENT AGENT ===
[backup-agent] [echo] From: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
[backup-agent] [echo] To: /tmp/rhq-agent-update-backup
[backup-agent] [copy] Copying 76 files to /tmp/rhq-agent-update-backup
[backup-agent] [copy] Copied 12 empty directories to 1 empty directory under /tmp/rhq-agent-update-backup
[backup-agent] [echo] === BACKUP OF CURRENT AGENT COMPLETE ===
======================================
ANT target [(default)]
Mon Jul 14 16:36:53 EDT 2014
======================================
[header-for-update] [echo] 
===== RHQ AGENT UPDATE =====
Agent To Be Updated: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
Version: 4.13.0-SNAPSHOT
Build Number: 89a2c43
Jar File: /work2/rhqdeleteme/agent.jar
[update] [echo] Extract the agent distro zip from the agent update binary, place in temporary update dir
[update] [unjar] Expanding: /work2/rhqdeleteme/agent.jar into /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43
[update] [echo] Unzip the agent distro into the temporary update directory
[update] [unzip] Expanding: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43/rhq-enterprise-agent-4.13.0-SNAPSHOT.zip into /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43
[update] [echo] Remove the agent distro zip
[update] [delete] Deleting: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43/rhq-enterprise-agent-4.13.0-SNAPSHOT.zip
[update] [echo] 
      At this point, we have the new (but raw) agent extracted to: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/update-89a2c43/rhq-agent
      The old, existing agent (the one we are upgrading) is at: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
      
[update] [echo] Copy existing failover list from the old agent to the new agent
[update] [copy] Warning: Could not find file /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/data/failover-list.dat to copy.
[update] [echo] Copy existing keystore and truststore files from the old agent to the new agent
[update] [copy] Warning: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent/data not found.
[update] [echo] Copy existing Java Service Wrapper configuration files from the old agent to the new agent
[update] [echo] Copy Windows environment script - keep the old copy in effect and backup the new copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy UNIX environment script - keep the old copy in effect and backup the new copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy Windows wrapper launch script - use the new copy and backup the old copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy UNIX wrapper launch script - use the new copy and backup the old copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy Java Service Wrapper configuration - use the new copy and backup the old copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy custom scripts that may have existed in the agent bin directory
[update] [echo] Ensure any custom scripts that were copied retain execute permissions
[update] [echo] Copy agent configuration file - use the old copy and backup the new copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] Copy log4j configuration file - use the old copy and backup the new copy
[update] [copy-with-backup] old file and new file are the same, nothing to do
[update] [echo] 
      Everything is done; _update.tmp.dir/rhq-agent has our new agent.
      The old agent is still intact, albeit with a new update subdirectory with the new agent in it.
      Start moving things around to get the new agent in the old agent's directory.
      If anything fails going forward, bad things will happen because the old agent will be ruined.
      
[update] [echo] Define where we are going to put the old agent - this is where the old agent will be backed up
[update] [echo] Purge any previously backed-up old agent
[update] [echo] Move the old agent to the backup location
[update] [echo] Make sure the location where the new agent is going to be is empty
[update] [echo] Put the new agent in its new location.
[update] [echo] Clean up the temporary location where the new agent was located (but no longer is)
[update] [delete] Deleting directory /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent-OLD/update-89a2c43
[update] [echo] 
      At this point, the new agent should now be at the old agent's location.
      The old agent is backed up to the rhq-agent-OLD directory next to new agent
      
[update] [echo] chmod +x on executables under /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
[update] [echo] The agent is updated and we don't need the backup anymore, so remove the first backup we made
[update] [delete] Deleting directory /tmp/rhq-agent-update-backup
[update] [echo] 
      FINISHED UPDATING THE AGENT SUCCESSFULLY!
      The new agent is now located at: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
      
$ pwd
/home/mazz/Desktop/rhqtest/rhqlink

$ ls
agent.jar  rhq-agent  rhq-agent-OLD  rhq-agent-update.log  rhq-data  rhq-server-old

(agent.jar is the agent update binary)

Comment 3 John Mazzitelli 2014-07-14 20:47:30 UTC
I think it is the step that prints this out in the agent upgrade ant xml script is where the problem occurs:

[update] [echo] Put the new agent in its new location.

The upgrade location (/home/mazz/Desktop/rhqtest/rhqlink/rhq-agent-OLD/update-89a2c43) should be renamed/moved to /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent but that must be failing somehow.

Comment 4 John Mazzitelli 2014-07-14 21:14:13 UTC
i thought maybe the use of the <move> ant task caused the problem, but this seemed to work fine:

<project name="rhq-agent-update" default="start" basedir=".">
   <property name="the.dir" value="/home/mazz/Desktop/rhqtest/rhqlink/OLD/subOLD/rhq-agent" />
   <property name="the.todir" value="/home/mazz/Desktop/rhqtest/rhqlink/NEW" />

   <target name="start" >
      <move file="${the.dir}"
            tofile="${the.todir}"
            failonerror="true"/>
   </target>
</project>

Comment 5 John Mazzitelli 2014-07-14 21:34:15 UTC
very, very strange. It seems when the ant script for the agent update finishes, the rhq-agent is in the expected location. Somehow, it looks like the rhq-agent directory gets deleted after the ant script finishes.

In the agent update ant script, I output a "ls" of the new agent location, expecting it to fail with "file not found", but instead it gives me a dir listing of the new agent:

      <echo>
      FINISHED UPDATING THE AGENT SUCCESSFULLY!
      The new agent is now located at: ${rhq.agent.update.update-agent-dir}
      </echo>
      <!-- TEST TO SEE IF NEW AGENT IS THERE -->
      <exec executable="ls" >
          <arg line="-l ${rhq.agent.update.update-agent-dir}"/>
      </exec>

What outputs is the following:

[update] [echo] 
      FINISHED UPDATING THE AGENT SUCCESSFULLY!
      The new agent is now located at: /home/mazz/Desktop/rhqtest/rhqlink/rhq-agent
      
[update] [exec] total 16
[update] [exec] drwxrwxr-x 3 mazz mazz 4096 Jul 14 15:24 bin
[update] [exec] drwxrwxr-x 2 mazz mazz 4096 Jul 14 15:24 conf
[update] [exec] drwxrwxr-x 4 mazz mazz 4096 Jul 14 17:31 lib
[update] [exec] drwxrwxr-x 2 mazz mazz 4096 Jul 14 15:24 plugins

But the next two log messages after is rhqctl telling me this - notice it says the agent installer finished OK (exit code=0) but the directory is missing - but as you see above, the directory wasn't missing when the agent install finished:

17:31:48,950 INFO  [org.rhq.server.control.command.Upgrade] The agent installer finished upgrading with exit value 0
17:31:48,962 ERROR [org.rhq.server.control.command.Upgrade] An error occurred while upgrading the agent: Source directory [/home/mazz/Desktop/rhqtest/rhqlink/rhq-agent] does not exist

Comment 6 John Mazzitelli 2014-07-14 21:42:59 UTC
the bug has to be in here somewhere - this is the rhqctl Java code Upgrade.java - this is executed immediately after the agent ant script is - and it just so happens to do file manipulation of the agent install location:

            log.info("The agent installer finished upgrading with exit value " + exitValue);

            // We need to now move the new, updated agent over to the new agent location.
            // renameTo() may fail if we are crossing file system boundaries, so try a true copy as a fallback.
            if (!agentBasedir.equals(oldAgentDir)) {
                FileUtil.purge(agentBasedir, true); // clear the way for the new upgraded agent
                if (!oldAgentDir.renameTo(agentBasedir)) {
                    FileUtil.copyDirectory(oldAgentDir, agentBasedir);

                    // we need to retain the execute bits for the executable scripts and libraries
                    FileVisitor visitor = new FileVisitor() {
                        @Override
                        public void visit(File file) {
                            String filename = file.getName();
                            if (filename.contains(".so") || filename.contains(".sl") || filename.contains(".dylib")) {
                                file.setExecutable(true);
                            } else if (filename.endsWith(".sh")) {
                                file.setExecutable(true);
                            }
                        }
                    };

                    FileUtil.forEachFile(new File(agentBasedir, "bin"), visitor);
                    FileUtil.forEachFile(new File(agentBasedir, "lib"), visitor);
                }
            }

Comment 7 John Mazzitelli 2014-07-15 14:39:21 UTC
This comment is just some debugging info I don't want to lose... 

In the Upgrade code, there is this:

  // We need to now move the new, updated agent over to the new agent location.
  // renameTo() may fail if we are crossing file system boundaries, so try a true copy as a fallback.
  if (!agentBasedir.equals(oldAgentDir)) {

Stepping through, the values for those two variables are:

agentBasedir=/work2/rhqdeleteme/rhq-agent
oldAgentDir=/home/mazz/Desktop/rhqtest/rhqlink/rhq-agent

Comment 8 John Mazzitelli 2014-07-15 14:45:24 UTC
OK, I get it. The bug is the code from the previous comment.

The thinking there was if the directories are different, it means someone was upgrading the agent that was initially installed in a different directory (this would have been the case if you are upgrading from RHQ 4.8- when rhqctl didn't exist and the agents were installed in different locations away from the server). When this happened, we needed to move the agent to the location where rhqctl wanted it to be.

However, this code gets broken if there are symlinks involved because it appears that "agentBasedir" follows the symlink and is the actual directory whereas "oldAgentDir" is the actual symlink. They "appear" different even though they refer to the same place, and because of that, things break.

Comment 9 John Mazzitelli 2014-07-15 16:30:30 UTC
git commit to master: 6eef688