Bug 1231199 - Upgrade on windows failed with "Could not verify that the node is up and running"
Summary: Upgrade on windows failed with "Could not verify that the node is up and run...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Installer
Version: JON 3.3.3
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ER01
: JON 3.3.5
Assignee: Michael Burman
QA Contact: Filip Brychta
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-12 11:29 UTC by Filip Brychta
Modified: 2019-07-11 09:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-03 15:02:22 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
storage log (33.40 KB, text/plain)
2015-06-12 12:45 UTC, Filip Brychta
no flags Details
console log (3.25 KB, text/plain)
2015-06-12 12:45 UTC, Filip Brychta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1471653 0 None None None Never
Red Hat Product Errata RHSA-2016:0118 0 normal SHIPPED_LIVE Critical: Red Hat JBoss Operations Network 3.3.5 update 2016-02-03 20:00:55 UTC

Description Filip Brychta 2015-06-12 11:29:04 UTC
Description of problem:
Upgrade from JON 3.2.0 to JON 3.3.3 failed with "Could not verify
 that the node is up and running"
I believe this is caused by slow environment:
22:53:18,695 INFO  [org.rhq.storage.installer.StorageInstaller] Starting RHQ Sto
rage Node
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] Could not verify
 that the node is up and running.
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] Check the log fi
le at ../../logs/rhq-storage.log for errors.

There is 10s between "Starting RHQ Storage" and "Could not verify that the node"
and it seems that storage node is not started within 10s on this slow environment.

There are no valid errors in storage log

Version-Release number of selected component (if applicable):
Version :	
3.3.0.GA Update 03
Build Number :	
82ad0cc:a25836e

How reproducible:
5/5

Steps to Reproduce:
1. install and start JON 3.2.0
2. unzip JON 3.3.0
3. unzip CP3
4. apply CP3 on JON 3.3.0
5. stop JON 3.2.0
6. start upgrade (c:\jon-server-3.3.0.GA\bin>rhqctl upgrade --from-server-dir c:\jon-server-3.2.0.GA)

Actual results:
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] Could not verify
 that the node is up and running.
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] Check the log fi
le at ../../logs/rhq-storage.log for errors.
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] The storage inst
aller will now exit
22:53:28,867 INFO  [org.rhq.server.control.command.Upgrade] The storage node upg
rade has finished with an exit value of [2]
The RHQ Server [rhqserver-WIN-2008] service was not running.
Stopping the RHQ Storage [rhqstorage-WIN-2008] service...
RHQ Storage [rhqstorage-WIN-2008] service stopped.
RHQ storage node has stopped
22:53:34,883 ERROR [org.rhq.server.control.RHQControl] The storage node upgrade
failed with exit code [2]

Expected results:
Upgrade is successful

Additional info:
This issue will most probably occur only on environments where starting storage node takes more then 10s.
If the assumption is correct, it should occur during installation as well, anytime when storage node starts longer then 10s

Comment 1 Filip Brychta 2015-06-12 12:45:15 UTC
Created attachment 1038040 [details]
storage log

Comment 2 Filip Brychta 2015-06-12 12:45:33 UTC
Created attachment 1038041 [details]
console log

Comment 6 Michael Burman 2015-11-09 13:48:46 UTC
This isn't about timeout, Cassandra is returning false for NativeTransportRunning so we don't retry (we only retry if there's an exception). I'll fix this by pushing us to the retry policy if false is returned.

Comment 7 Michael Burman 2015-11-09 13:52:43 UTC
Fixed in the master:

commit 0cde115e1081f5aa982170e9f3838da4fd79963f
Author: Michael Burman <miburman>
Date:   Mon Nov 9 15:52:07 2015 +0200

    [BZ 1231199] If Cassandra returns NativeTransportRunning is false, force retry policy to try again

Comment 9 Michael Burman 2015-11-18 15:49:24 UTC
Merged to release/jon3.3.x:

commit 9c049818e3caef5321f2b35f45adee9e9b1b8a69
Author: Michael Burman <miburman>
Date:   Mon Nov 9 15:52:07 2015 +0200

    [BZ 1231199] If Cassandra returns NativeTransportRunning is false, force retry policy to try again
    
    (cherry picked from commit 0cde115e1081f5aa982170e9f3838da4fd79963f)

Comment 10 Simeon Pinder 2015-12-09 06:29:23 UTC
Moving to ON_QA as available to test with the following brew build:

JON Cumulative patch build: https://brewweb.devel.redhat.com/buildinfo?buildID=469635
  *Note: jon-server-patch-3.3.0.GA.zip maps to DR01 build of jon-server-3.3.0.GA-update-05.zip.

Comment 13 errata-xmlrpc 2016-02-03 15:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-0118.html


Note You need to log in before you can comment on or make changes to this bug.