Bug 1038256 - Windows 2008 - Invalid agent service after upgrade (on remote agent machine)
Summary: Windows 2008 - Invalid agent service after upgrade (on remote agent machine)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Upgrade
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: CR02
: JON 3.2.0
Assignee: Jay Shaughnessy
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1010354 1012435
TreeView+ depends on / blocked
 
Reported: 2013-12-04 17:52 UTC by Filip Brychta
Modified: 2014-01-02 20:37 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-01-02 20:37:55 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Filip Brychta 2013-12-04 17:52:02 UTC
Description of problem:
All remote agents are correctly upgraded and running after the upgrade, but agent services are invalid and agents can't be stopped or started.
Workaround is simple - remove old agent service and install new agent service.

Version-Release number of selected component (if applicable):
Version :	
3.2.0.CR1
Build Number :	
6ecd678:d0dc0b6

How reproducible:
2/2

Steps to Reproduce:
1. jon3.1.2.GA is installed with one local agent and several remote agents. Remote agents are running under Administrator user (all env variables RHQ_AGENT_PASSWORD_PROMPT=false, RHQ_AGENT_PASSWORD=****, RHQ_AGENT_RUN_AS_ME=true are set in rhq-agent/bin/rhq-agent-env.bat), only JAVA_HOME is set on all machines (RHQ_AGENT_JAVA_EXE_FILE_PATH or anything rhq java specific is NOT set)
2. stop the jon3.1.2.GA server and local jon agent
3. run upgrade to CR1 (rhqctl upgrade --from-server-dir c:\jon-server-3.1.2.GA --from-agent-dir c:\rhq-agent --run-data-migrator do-it)
4. start it (rhqctl start)

Actual results:
The jon server and all agents are successfully upgraded and running, but REMOTE agent services are invalid. Windows services utility shows 'RHQ Agent [rhqagent-TESTDAY2]' as stopped but the agent is running.
When the agent is killed manually. Attempt to start it via agent service (rhq-agent-wrapper.bat start) fails with following error found in rhq-agent/logs/rhq-agent-wrapper.log:
DEBUG  | wrapper  | 2013/12/04 02:53:41 | Working directory set to: c:\rhq-agent
STATUS | wrapper  | 2013/12/04 02:53:41 | Starting the RHQ Agent [rhqagent-TESTDAY2] service...
DEBUG  | wrapper  | 2013/12/04 02:53:41 | Working directory set to: c:\rhq-agent
STATUS | wrapper  | 2013/12/04 02:53:41 | --> Wrapper Started as Service
STATUS | wrapper  | 2013/12/04 02:53:41 | Java Service Wrapper Community Edition 3.3.1
STATUS | wrapper  | 2013/12/04 02:53:41 |   Copyright (C) 1999-2008 Tanuki Software, Inc.  All Rights Reserved.
STATUS | wrapper  | 2013/12/04 02:53:41 |     http://wrapper.tanukisoftware.org
STATUS | wrapper  | 2013/12/04 02:53:41 |
DEBUG  | wrapper  | 2013/12/04 02:53:41 | Using tick timer.
DEBUG  | wrapperp | 2013/12/04 02:53:41 | server listening on port 32001.
DEBUG  | wrapper  | 2013/12/04 02:53:41 | Ping settings: wrapper.ping.interval=30, wrapper.ping.interval.logged=1, wrapper.ping.timeout=45
STATUS | wrapper  | 2013/12/04 02:53:41 | Launching a JVM...
DEBUG  | wrapper  | 2013/12/04 02:53:41 | command: "%RHQ_JAVA_EXE_FILE_PATH%" -Dlog4j.configuration=log4j.xml -Xms64m -Xmx128m -Di18nlog.dump-stack-traces=false -Dsigar.nativeLogging=false "-Djava.endorsed.dirs=c:\rhq-agent/lib/endorsed" "-Djava.io.tmpdir=c:\rhq-agent/temp" -Djava.library.path="c:\rhq-agent/bin/wrapper/windows-x86_32;c:\rhq-agent/lib" -classpath "c:\rhq-agent/conf;c:\rhq-agent/bin/wrapper/windows-x86_32/wrapper.jar;c:\rhq-agent/lib/commons-io-1.4.jar;c:\rhq-agent/lib/commons-logging-1.1.0.jboss.jar;c:\rhq-agent/lib/concurrent-1.3.4-jboss-update1.jar;c:\rhq-agent/lib/getopt-1.0.13.jar;c:\rhq-agent/lib/i18nlog-1.0.10.jar;c:\rhq-agent/lib/jboss-common-core-2.2.17.GA.jar;c:\rhq-agent/lib/jboss-jmx-4.2.3.GA.jar;c:\rhq-agent/lib/jboss-logging-3.1.2.GA-redhat-1.jar;c:\rhq-agent/lib/jboss-remoting-2.5.4.SP5.jar;c:\rhq-agent/lib/jboss-serialization-1.0.3.GA.jar;c:\rhq-agent/lib/jline-0.9.94.jar;c:\rhq-agent/lib/log4j-1.2.16.jar;c:\rhq-agent/lib/persistence-api-1.0.jar;c:\rhq-agent/lib/rhq-common-drift-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-client-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-comm-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-domain-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-native-system-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-plugin-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-plugin-container-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-util-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-enterprise-agent-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-enterprise-comm-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/sigar-1.6.5.132-5.jar" -Dwrapper.key="NmECugGJOq0rg1IT" -Dwrapper.port=32001 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.debug="TRUE" -Dwrapper.pid=2632 -Dwrapper.version="3.3.1" -Dwrapper.native_library="wrapper" -Dwrapper.service="TRUE" -Dwrapper.cpu.timeout="10" -Dwrapper.jvmid=1 org.tanukisoftware.wrapper.WrapperSimpleApp org.rhq.enterprise.agent.AgentMain --daemon
FATAL  | wrapper  | 2013/12/04 02:53:41 | Unable to execute Java command.  The system cannot find the file specified. (0x2)
FATAL  | wrapper  | 2013/12/04 02:53:41 |     "%RHQ_JAVA_EXE_FILE_PATH%" -Dlog4j.configuration=log4j.xml -Xms64m -Xmx128m -Di18nlog.dump-stack-traces=false -Dsigar.nativeLogging=false "-Djava.endorsed.dirs=c:\rhq-agent/lib/endorsed" "-Djava.io.tmpdir=c:\rhq-agent/temp" -Djava.library.path="c:\rhq-agent/bin/wrapper/windows-x86_32;c:\rhq-agent/lib" -classpath "c:\rhq-agent/conf;c:\rhq-agent/bin/wrapper/windows-x86_32/wrapper.jar;c:\rhq-agent/lib/commons-io-1.4.jar;c:\rhq-agent/lib/commons-logging-1.1.0.jboss.jar;c:\rhq-agent/lib/concurrent-1.3.4-jboss-update1.jar;c:\rhq-agent/lib/getopt-1.0.13.jar;c:\rhq-agent/lib/i18nlog-1.0.10.jar;c:\rhq-agent/lib/jboss-common-core-2.2.17.GA.jar;c:\rhq-agent/lib/jboss-jmx-4.2.3.GA.jar;c:\rhq-agent/lib/jboss-logging-3.1.2.GA-redhat-1.jar;c:\rhq-agent/lib/jboss-remoting-2.5.4.SP5.jar;c:\rhq-agent/lib/jboss-serialization-1.0.3.GA.jar;c:\rhq-agent/lib/jline-0.9.94.jar;c:\rhq-agent/lib/log4j-1.2.16.jar;c:\rhq-agent/lib/persistence-api-1.0.jar;c:\rhq-agent/lib/rhq-common-drift-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-client-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-comm-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-domain-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-native-system-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-plugin-api-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-plugin-container-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-core-util-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-enterprise-agent-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/rhq-enterprise-comm-4.9.0.JON320CR1.jar;c:\rhq-agent/lib/sigar-1.6.5.132-5.jar" -Dwrapper.key="NmECugGJOq0rg1IT" -Dwrapper.port=32001 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.debug="TRUE" -Dwrapper.pid=2632 -Dwrapper.version="3.3.1" -Dwrapper.native_library="wrapper" -Dwrapper.service="TRUE" -Dwrapper.cpu.timeout="10" -Dwrapper.jvmid=1 org.tanukisoftware.wrapper.WrapperSimpleApp org.rhq.enterprise.agent.AgentMain --daemon
FATAL  | wrapper  | 2013/12/04 02:53:41 | Critical error: wait for JVM process failed
ERROR  | wrapper  | 2013/12/04 02:53:43 | The RHQ Agent [rhqagent-TESTDAY2] service was launched, but failed to start.


This can be fixed by removing old agent service and creating new one.
Old agent service properties:
c:\rhq-agent\bin\wrapper\windows-x86_32\wrapper.exe -s c:\rhq-agent\bin\\wrapper\rhq-agent-wrapper.conf set.RHQ_AGENT_HOME=c:\rhq-agent set.RHQ_AGENT_INSTANCE_NAME=rhqagent-TESTDAY2 set.RHQ_AGENT_JAVA_EXE_FILE_PATH=c:\java32b\bin\java.exe set.RHQ_AGENT_OS_PLATFORM=windows-x86_32 set.RHQ_AGENT_WRAPPER_LOG_DIR_PATH=c:\rhq-agent\logs

New agent service properties:
c:\rhq-agent\bin\wrapper\windows-x86_32\wrapper.exe -s c:\rhq-agent\bin\wrapper\rhq-agent-wrapper.conf set.RHQ_AGENT_HOME=c:\rhq-agent set.RHQ_AGENT_INSTANCE_NAME=rhqagent-TESTDAY2 set.RHQ_JAVA_EXE_FILE_PATH=c:\java32b\bin\java.exe set.RHQ_AGENT_OS_PLATFORM=windows-x86_32 set.RHQ_AGENT_WRAPPER_LOG_DIR_PATH=c:\rhq-agent\logs


When you compare previous properties you will find following differences:
-s c:\rhq-agent\bin\\ vs. -s c:\rhq-agent\bin\
and
set.RHQ_AGENT_JAVA_EXE_FILE_PATH vs.set.RHQ_JAVA_EXE_FILE_PATH

So this difference is probably causing the issue.

Expected results:
Agent service works.

Comment 3 Jay Shaughnessy 2013-12-04 21:37:20 UTC
The problem seems to be that "%RHQ_JAVA_EXE_FILE_PATH%" is not getting resolved. I'm not sure, still looking...

Comment 4 Jay Shaughnessy 2013-12-05 14:43:10 UTC
release/jon3.2.x commit d19fd2306f60991e839566934c7fbbcf4f692226
Author: Jay Shaughnessy <jshaughn>
Date:   Thu Dec 5 09:39:49 2013 -0500

This problem resulted from the work in Bug 1016609, when we introduced
the use of RHQ_JAVA_EXE_FILE_PATH and deprecated the use of
RHQ_AGENT_JAVA_EXE_FILE_PATH (among several simplifications of our
env properties).  The issue with that change was the fact that existing
windows agent services (remote agents, not handled by rhqctl) included
set.RHQ_AGENT_JAVA_EXE_FILE_PATH in the service "path to executable", for
use by the service wrapper.  But the new rhq-agent-wrapper.conf expected
RHQ_JAVA_EXE_FILE_PATH to be set when formulating its command string.

This was only an issue for existing agents that would be auto-upgraded. Agent
auto-upgrade does not update the existing service, it only restarts it after
the agent update. So, the "path to executable" remains unchanged and therefore
passes only the legacy property.  Note that auto-upgrade does the right thing,
it should *not* replace the service in order to update its definition. Doing
this could lose the RUN_AS password, which may have been set interactively
when the agent was initially installed.

The solution should be good, the rhq-agent-wrapper.conf has been reverted to
use RHQ_AGENT_JAVA_EXE_FILE_PATH.  We still completely support the
new RHQ env properties, like RHQ_JAVA_EXE_FILE_PATH, but will supply
the legacy property at service install time.  That gives up backward
compatibility while keeping the use of  RHQ_AGENT_JAVA_EXE_FILE_PATH
internal.

  Cherry-Pick Master: 719e2c127f24f7467ecc3a371d9797f6598880c0

Comment 5 Mike Foley 2013-12-05 14:47:26 UTC
the fix has been pre-qualified by both dev and qe (Jay and filip).  QE just needs to requalify in the final GA bits.  hopefully a formality.

Comment 7 Simeon Pinder 2013-12-06 00:38:21 UTC
Flipping this to ON_QA for testing with latest brew build.

Comment 8 Filip Brychta 2013-12-06 07:46:15 UTC
Verified on:
Version :	
3.2.0.GA
Build Number :	
7b00246:6d13523


Note You need to log in before you can comment on or make changes to this bug.