Description of problem: I upgraded JON3.1.0.GA to JON3.2.ER1. Local agent (agent on machine with rhq server) was upgraded but rhq-agent-OLD/logs/agent.log was rewritten. First message from this log is from a point after the upgrade was started. Remote agent old logs were untouched and last message was 'Now executing agent update - if all goes well, this is the last you will hear of this agent:' There were two agent processes running after upgrade. One of them disappeared after ~ 5 minutes. Upgraded agent process was still running but writing to rhq-agent-OLD/logs/agent.log. It was upgraded agent process because using ps i can see it is using 'lib/rhq-core-client-api-4.9.0.JON320ER1.jar' This is most likely caused by bz 1012289 Version-Release number of selected component (if applicable): Version: 3.2.0.ER1 Build Number: 54dd29c:464a643 How reproducible: 1/1 Steps to Reproduce: 1. JON3.1.0.GA server and agent are running 2. unzip jon-server-3.2.0.ER1.zip 3. cd jon-server-3.2.0.ER1/bin/ 4. ./rhqctl upgrade --from-server-dir /home/hudson/jon-server-3.1.0.GA/ --run-data-migrator do-it --storage-data-root-dir /home/hudson/
When you are upgrading from an older JON *and* you have an agent co-located on the same machine as your JON Server, you need to tell the installer where your agent is installed. Looking at your command line from this issue's description: ./rhqctl upgrade --from-server-dir /home/hudson/jon-server-3.1.0.GA/ --run-data-migrator do-it --storage-data-root-dir /home/hudson/ I do not see this. You are missing --from-agent-dir. From the --help documentation: Upgrades RHQ services from an earlier installed version --from-agent-dir <arg> Full path to install directory of the RHQ Agent to be upgraded. Required only if an existing agent exists and is not installed in the default location: <from-server-dir>/../rhq-agent I suspect this is part of the problem.
After paying more attention to the replication procedures, they were missing very important steps. See: https://docs.jboss.org/author/display/RHQ/Upgrading+the+Server where the first step says: "Stop agents installed with rhqctl and wait for them to fully shutdown" So, you must stop the agent that is co-located with the server prior to upgrading. Wait for it to shutdown. The reason why? Because things like this BZ might happen if you don't :)
(In reply to John Mazzitelli from comment #2) Please see bug 1012289, comment 2. I was confused by "Stop agents installed with rhqctl". Previous versions of JON are not installed via rhqctl, so i thought this step doesn't apply to the older co-located agents. I guess both bzs could be fixed just by updating documentation.
see bug #1018887 that will make sure this is doc'ed