Bug 1035357 - When first attempt to upgrade JON fails, second attempt fails as well even though the original problem is resolved
Summary: When first attempt to upgrade JON fails, second attempt fails as well even th...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Installer
Version: JON 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ER04
: JON 3.3.0
Assignee: John Mazzitelli
QA Contact: Filip Brychta
URL:
Whiteboard:
Depends On:
Blocks: 1010354
TreeView+ depends on / blocked
 
Reported: 2013-11-27 15:49 UTC by Filip Brychta
Modified: 2014-12-11 13:59 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-11 13:59:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
vimdiff screen shot (23.80 KB, image/png)
2013-11-27 15:49 UTC, Filip Brychta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1045496 0 unspecified CLOSED JON 3.2.0 install fails if the installer was terminated previously during the installation of the RHQ EAR subsystem 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1092707 0 unspecified CLOSED If re-install is attempted using a different storage node host name, install fails due to JON installer still trying to ... 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1096923 0 unspecified CLOSED UNDO process on rhqctl install failure leads to missing rhq.installed marker file on re-install 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1139780 0 unspecified CLOSED Storage node is not stopped during reverted upgrade and second invocation causes BindException 2021-02-22 00:41:40 UTC

Internal Links: 1045496 1092707 1096923 1139780

Description Filip Brychta 2013-11-27 15:49:34 UTC
Created attachment 829783 [details]
vimdiff screen shot

Description of problem:
I tried to upgrade JON3.1.2.GA to JON3.2.ER7 and the upgrade process correctly failed when the agent was being upgraded. The failure was expected and installation process rolled back installation. Then I resolved original problem (it was not possible to remove the rhq-agent directory) and ran the upgrade again. This second run failed while running a data migration.

Version-Release number of selected component (if applicable):
Upgrade to JON3.2.ER7

How reproducible:
2/2

Steps to Reproduce:
1. install JON3.1.2.GA
  a) unzip JON3.1.2.GA
  b) rhq-server.bat install
  c) rhq-server.bat start
  d) finish a server installation in web installer
  e) install agent (java -jar rhq-agent.jar --install)
  f) edit rhq-agent\bin>rhq-agent-env.bat (set RHQ_AGENT_RUN_AS_ME=true, set RHQ_AGENT_PASSWORD_PROMPT=false,set RHQ_AGENT_PASSWORD=<your_password>)
  g) run rhq-agent.bat to set up the agent
  h) exit interactive agent mode
  i) run rhq-agent-wrapper.bat install
  j) run rhq-agent-wrapper.bat start
2. wait until the agent is registered with the server
3. stop the agent
4. stop the server and then remove the service (rhq-server.bat remove)
5. open cmd.exe and cd rhq-agent/bin (this will cause the upgrade process to fail)
6. run the upgrade (rhqctl upgrade --from-server-dir c:\jon-server-3.1.2.GA --from-agent-dir c:\rhq-agent --run-data-migrator do-it)
7. upgrade correctly fails because the rhq-agent directory can't be removed
8. resolve the problem (rm -rf rhq-agent; mv rhq-agent-OLD rhq-agent) and close cmd opened in step 5
9. run the upgrade again (rhqctl upgrade --from-server-dir c:\jon-server-3.1.2.GA --from-agent-dir c:\rhq-agent --run-data-migrator do-it)


Actual results:
Upgrade is finished but the data migration fails with following exception:
1000 [main] DEBUG org.rhq.server.metrics.migrator.DataMigratorRunner  - Server c
onfiguration file system property detected. Loading the file: c:\jon-server-3.2.
0.ER7\bin\rhq-server.properties
java.lang.RuntimeException: de-obfuscating db password failed:
        at org.rhq.core.util.obfuscation.PicketBoxObfuscator.decode(PicketBoxObf
uscator.java:75)
        at org.rhq.server.metrics.migrator.DataMigratorRunner.loadConfigurationF
romServerPropertiesFile(DataMigratorRunner.java:362)
        at org.rhq.server.metrics.migrator.DataMigratorRunner.configure(DataMigr
atorRunner.java:287)
        at org.rhq.server.metrics.migrator.DataMigratorRunner.main(DataMigratorR
unner.java:170)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.jboss.modules.Module.run(Module.java:270)
        at org.jboss.modules.Main.main(Main.java:411)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.rhq.core.util.obfuscation.PicketBoxObfuscator.decode(PicketBoxObf
uscator.java:72)
        ... 9 more
Caused by: java.lang.NumberFormatException: Zero length BigInteger
        at java.math.BigInteger.<init>(BigInteger.java:296)
        at org.picketbox.datasource.security.SecureIdentityLoginModule.decode(Se
cureIdentityLoginModule.java:170)
        ... 14 more
21:46:24,106 INFO  [org.rhq.server.control.command.Upgrade] The data migrator fi
nished with exit value 0


This exception is caused by unset properties in rhq-server.properties.
See attached vimdiff screen shot to see difference between rhq-server.properties after step 9 and correct rhq-server.properties. Snapshot of this correct rhq-server.properties was taken after step 6 but before step 7 (before the upgrage was rolled back).

So for some reason second upgrade (step 9) didn't updated rhq-server.properties correctly.


Expected results:
Data migration works

Comment 3 Larry O'Leary 2014-07-07 23:38:33 UTC
This issue is not Windows specific. Windows only demonstrates how easy it is to cause the initial install to fail due to files still being in use. 

The result is a corrupted JBoss ON installation due to inadequate revert/recovery.

Comment 4 Jay Shaughnessy 2014-09-05 18:40:54 UTC
There has been a decent amount of work done in the installer and the recovery stuff in the 3.3 timeframe.  Please re-test this against ER03 and we'll go from there. Thanks.

Comment 5 Filip Brychta 2014-09-10 10:43:55 UTC
Rollback works on linux but it still fails on windows.
Simple scenario which fails on windows:
1- install jon3.2.0.GA
2- try to upgrade to jon3.3.er2

During step 2 you will hit bz1128151.
Second attempt to upgrade ends with:
c:\jon-server-3.3.0.ER02\bin>rhqctl upgrade --from-server-dir c:\jon-server-3.2.
0.GA
11:32:36,974 INFO  [org.jboss.modules] JBoss Modules version 1.3.3.Final-redhat-
1
11:32:37,317 INFO  [org.rhq.server.control.command.Upgrade] Stopping any running
 RHQ components...
11:32:37,317 WARN  [org.rhq.server.control.command.Upgrade] RHQ is already insta
lled so upgrade can not be performed.
The RHQ Server [rhqserver-WIN-2008] service was not running.
The RHQ Storage [rhqstorage-WIN-2008] service was not running.
RHQ storage node has stopped

Comment 7 John Mazzitelli 2014-09-18 20:26:04 UTC
i will try to replicate this.

Comment 8 John Mazzitelli 2014-09-25 19:10:16 UTC
I replicated on Windows 8 using 3.3 ER03 build. I think the problem might be that we try to delete the rhq-storage directory BEFORE we stop it - and windows file locking will thus not remove the dierctory. The next upgrade attempt will see the rhq-storage directory still exists and thing its been installed.

Looks like this was already addressed here:

Commit e53a218269a501f22ec491927a15362fe31159b2
[BZ 1139780] UndoTasks are done in reverse order, so add stop command after the delete command to the undoTask list

The stopping of the rhq-storage node should now occur before the attempt to delete the directory. This went in Sept 12, which I think is after the ER03 build.

WORKAROUND: Manually delete the "rhq-storage" directory. Then re-run the upgrade.

I tried the workaround and it worked. So I would say, wait for the next ER build since that fix should be in it. I think that will address the problem because once the storage node is stopped, then windows won't lock the rhq-storage files and it should be able to remove them all. I think this is why it works on Linux, because it doesn't have that windows file locking getting in the way.

Comment 9 John Mazzitelli 2014-09-25 19:12:59 UTC
cherry picked 3.3 commit: e53a218269a501f22ec491927a15362fe31159b2

Comment 10 John Mazzitelli 2014-09-25 19:22:14 UTC
setting to modified - it looks like the earlier fix that was cherry picked might also correct this issue. Will need to have QE retest.

Comment 11 Simeon Pinder 2014-10-01 21:33:04 UTC
Moving to ON_QA as available for test with build:
https://brewweb.devel.redhat.com/buildinfo?buildID=388959

Comment 12 Filip Brychta 2014-10-07 13:41:18 UTC
Verified on
Version :	
3.3.0.ER04
Build Number :	
99d2107:d7c537e


Note You need to log in before you can comment on or make changes to this bug.