Description of problem: All remote agents are correctly upgraded and running after the upgrade, but agent services are invalid and agents can't be stopped or started. Workaround is simple - remove old agent service and install new agent service. Version-Release number of selected component (if applicable): Version : 3.2.0.CR1 Build Number : 6ecd678:d0dc0b6 How reproducible: 2/2 Steps to Reproduce: 1. jon3.1.2.GA is installed with one local agent and several remote agents. Remote agents are running under Administrator user (all env variables RHQ_AGENT_PASSWORD_PROMPT=false, RHQ_AGENT_PASSWORD=****, RHQ_AGENT_RUN_AS_ME=true are set in rhq-agent/bin/rhq-agent-env.bat), only JAVA_HOME is set on all machines (RHQ_AGENT_JAVA_EXE_FILE_PATH or anything rhq java specific is NOT set) 2. stop the jon3.1.2.GA server and local jon agent 3. run upgrade to CR1 (rhqctl upgrade --from-server-dir c:\jon-server-3.1.2.GA --from-agent-dir c:\rhq-agent --run-data-migrator do-it) 4. start it (rhqctl start) Actual results: The jon server and all agents are successfully upgraded and running, but REMOTE agent services are invalid. Windows services utility shows 'RHQ Agent [rhqagent-TESTDAY2]' as stopped but the agent is running. When the agent is killed manually. Attempt to start it via agent service (rhq-agent-wrapper.bat start) fails with following error found in rhq-agent/logs/rhq-agent-wrapper.log: DEBUG | wrapper | 2013/12/04 02:53:41 | Working directory set to: c:\rhq-agent STATUS | wrapper | 2013/12/04 02:53:41 | Starting the RHQ Agent [rhqagent-TESTDAY2] service... DEBUG | wrapper | 2013/12/04 02:53:41 | Working directory set to: c:\rhq-agent STATUS | wrapper | 2013/12/04 02:53:41 | --> Wrapper Started as Service STATUS | wrapper | 2013/12/04 02:53:41 | Java Service Wrapper Community Edition 3.3.1 STATUS | wrapper | 2013/12/04 02:53:41 | Copyright (C) 1999-2008 Tanuki Software, Inc. All Rights Reserved. STATUS | wrapper | 2013/12/04 02:53:41 | http://wrapper.tanukisoftware.org STATUS | wrapper | 2013/12/04 02:53:41 | DEBUG | wrapper | 2013/12/04 02:53:41 | Using tick timer. DEBUG | wrapperp | 2013/12/04 02:53:41 | server listening on port 32001. DEBUG | wrapper | 2013/12/04 02:53:41 | Ping settings: wrapper.ping.interval=30, wrapper.ping.interval.logged=1, wrapper.ping.timeout=45 STATUS | wrapper | 2013/12/04 02:53:41 | Launching a JVM... DEBUG | wrapper | 2013/12/04 02:53:41 | command: "%RHQ_JAVA_EXE_FILE_PATH%" -Dlog4j.configuration=log4j.xml -Xms64m -Xmx128m -Di18nlog.dump-stack-traces=false -Dsigar.nativeLogging=false "-Djava.endorsed.dirs=c:\rhq-agent/lib/endorsed" "-Djava.io.tmpdir=c:\rhq-agent/temp" -Djava.library.path="c:\rhq-agent/bin/wrapper/windows-x86_32;c:\rhq-agent/lib" -classpath "c:\rhq-agent/conf;c:\rhq-agent/bin/wrapper/windows-x86_32/wrapper.jar;c:\rhq-agent/lib/commons-io-1.4.jar;c:\rhq-agent/lib/commons-logging-1.1.0.jboss.jar;c:\rhq-agent/lib/concurrent-1.3.4-jboss-update1.jar;c:\rhq-agent/lib/getopt-1.0.13.jar;c:\rhq-agent/lib/i18nlog-1.0.10.jar;c:\rhq-agent/lib/jboss-common-core-2.2.17.GA.jar;c:\rhq-agent/lib/jboss-jmx-4.2.3.GA.jar;c:\rhq-agent/lib/jboss-logging-3.1.2.GA-redhat-1.jar;c:\rhq-agent/lib/jboss-remoting-2.5.4.SP5.jar;c:\rhq-agent/lib/jboss-serialization-1.0.3.GA.jar;c:\rhq-agent/lib/jline-0.9.94.jar;c:\rhq-agent/lib/log4j-1.2.16.jar;c:\rhq-agent/lib/persistence-api-1.0.jar;c:\rhq-agent/lib/rhq-common-drift-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-client-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-comm-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-domain-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-native-system-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-plugin-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-plugin-container-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-util-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-enterprise-agent-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-enterprise-comm-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/sigar-1.6.5.132-5.jar" -Dwrapper.key="NmECugGJOq0rg1IT" -Dwrapper.port=32001 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.debug="TRUE" -Dwrapper.pid=2632 -Dwrapper.version="3.3.1" -Dwrapper.native_library="wrapper" -Dwrapper.service="TRUE" -Dwrapper.cpu.timeout="10" -Dwrapper.jvmid=1 org.tanukisoftware.wrapper.WrapperSimpleApp org.rhq.enterprise.agent.AgentMain --daemon FATAL | wrapper | 2013/12/04 02:53:41 | Unable to execute Java command. The system cannot find the file specified. (0x2) FATAL | wrapper | 2013/12/04 02:53:41 | "%RHQ_JAVA_EXE_FILE_PATH%" -Dlog4j.configuration=log4j.xml -Xms64m -Xmx128m -Di18nlog.dump-stack-traces=false -Dsigar.nativeLogging=false "-Djava.endorsed.dirs=c:\rhq-agent/lib/endorsed" "-Djava.io.tmpdir=c:\rhq-agent/temp" -Djava.library.path="c:\rhq-agent/bin/wrapper/windows-x86_32;c:\rhq-agent/lib" -classpath "c:\rhq-agent/conf;c:\rhq-agent/bin/wrapper/windows-x86_32/wrapper.jar;c:\rhq-agent/lib/commons-io-1.4.jar;c:\rhq-agent/lib/commons-logging-1.1.0.jboss.jar;c:\rhq-agent/lib/concurrent-1.3.4-jboss-update1.jar;c:\rhq-agent/lib/getopt-1.0.13.jar;c:\rhq-agent/lib/i18nlog-1.0.10.jar;c:\rhq-agent/lib/jboss-common-core-2.2.17.GA.jar;c:\rhq-agent/lib/jboss-jmx-4.2.3.GA.jar;c:\rhq-agent/lib/jboss-logging-3.1.2.GA-redhat-1.jar;c:\rhq-agent/lib/jboss-remoting-2.5.4.SP5.jar;c:\rhq-agent/lib/jboss-serialization-1.0.3.GA.jar;c:\rhq-agent/lib/jline-0.9.94.jar;c:\rhq-agent/lib/log4j-1.2.16.jar;c:\rhq-agent/lib/persistence-api-1.0.jar;c:\rhq-agent/lib/rhq-common-drift-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-client-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-comm-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-domain-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-native-system-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-plugin-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-plugin-container-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-util-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-enterprise-agent-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-enterprise-comm-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/sigar-1.6.5.132-5.jar" -Dwrapper.key="NmECugGJOq0rg1IT" -Dwrapper.port=32001 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.debug="TRUE" -Dwrapper.pid=2632 -Dwrapper.version="3.3.1" -Dwrapper.native_library="wrapper" -Dwrapper.service="TRUE" -Dwrapper.cpu.timeout="10" -Dwrapper.jvmid=1 org.tanukisoftware.wrapper.WrapperSimpleApp org.rhq.enterprise.agent.AgentMain --daemon FATAL | wrapper | 2013/12/04 02:53:41 | Critical error: wait for JVM process failed ERROR | wrapper | 2013/12/04 02:53:43 | The RHQ Agent [rhqagent-TESTDAY2] service was launched, but failed to start. This can be fixed by removing old agent service and creating new one. Old agent service properties: c:\rhq-agent\bin\wrapper\windows-x86_32\wrapper.exe -s c:\rhq-agent\bin\\wrapper\rhq-agent-wrapper.conf set.RHQ_AGENT_HOME=c:\rhq-agent set.RHQ_AGENT_INSTANCE_NAME=rhqagent-TESTDAY2 set.RHQ_AGENT_JAVA_EXE_FILE_PATH=c:\java32b\bin\java.exe set.RHQ_AGENT_OS_PLATFORM=windows-x86_32 set.RHQ_AGENT_WRAPPER_LOG_DIR_PATH=c:\rhq-agent\logs New agent service properties: c:\rhq-agent\bin\wrapper\windows-x86_32\wrapper.exe -s c:\rhq-agent\bin\wrapper\rhq-agent-wrapper.conf set.RHQ_AGENT_HOME=c:\rhq-agent set.RHQ_AGENT_INSTANCE_NAME=rhqagent-TESTDAY2 set.RHQ_JAVA_EXE_FILE_PATH=c:\java32b\bin\java.exe set.RHQ_AGENT_OS_PLATFORM=windows-x86_32 set.RHQ_AGENT_WRAPPER_LOG_DIR_PATH=c:\rhq-agent\logs When you compare previous properties you will find following differences: -s c:\rhq-agent\bin\\ vs. -s c:\rhq-agent\bin\ and set.RHQ_AGENT_JAVA_EXE_FILE_PATH vs.set.RHQ_JAVA_EXE_FILE_PATH So this difference is probably causing the issue. Expected results: Agent service works.
The problem seems to be that "%RHQ_JAVA_EXE_FILE_PATH%" is not getting resolved. I'm not sure, still looking...
release/jon3.2.x commit d19fd2306f60991e839566934c7fbbcf4f692226 Author: Jay Shaughnessy <jshaughn> Date: Thu Dec 5 09:39:49 2013 -0500 This problem resulted from the work in Bug 1016609, when we introduced the use of RHQ_JAVA_EXE_FILE_PATH and deprecated the use of RHQ_AGENT_JAVA_EXE_FILE_PATH (among several simplifications of our env properties). The issue with that change was the fact that existing windows agent services (remote agents, not handled by rhqctl) included set.RHQ_AGENT_JAVA_EXE_FILE_PATH in the service "path to executable", for use by the service wrapper. But the new rhq-agent-wrapper.conf expected RHQ_JAVA_EXE_FILE_PATH to be set when formulating its command string. This was only an issue for existing agents that would be auto-upgraded. Agent auto-upgrade does not update the existing service, it only restarts it after the agent update. So, the "path to executable" remains unchanged and therefore passes only the legacy property. Note that auto-upgrade does the right thing, it should *not* replace the service in order to update its definition. Doing this could lose the RUN_AS password, which may have been set interactively when the agent was initially installed. The solution should be good, the rhq-agent-wrapper.conf has been reverted to use RHQ_AGENT_JAVA_EXE_FILE_PATH. We still completely support the new RHQ env properties, like RHQ_JAVA_EXE_FILE_PATH, but will supply the legacy property at service install time. That gives up backward compatibility while keeping the use of RHQ_AGENT_JAVA_EXE_FILE_PATH internal. Cherry-Pick Master: 719e2c127f24f7467ecc3a371d9797f6598880c0
the fix has been pre-qualified by both dev and qe (Jay and filip). QE just needs to requalify in the final GA bits. hopefully a formality.
Flipping this to ON_QA for testing with latest brew build.
Verified on: Version : 3.2.0.GA Build Number : 7b00246:6d13523