Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1231199

Summary: Upgrade on windows failed with "Could not verify that the node is up and running"
Product: [JBoss] JBoss Operations Network Reporter: Filip Brychta <fbrychta>
Component: InstallerAssignee: Michael Burman <miburman>
Status: CLOSED ERRATA QA Contact: Filip Brychta <fbrychta>
Severity: high Docs Contact:
Priority: medium    
Version: JON 3.3.3CC: loleary, miburman
Target Milestone: ER01Keywords: Triaged
Target Release: JON 3.3.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-03 15:02:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
storage log
none
console log none

Description Filip Brychta 2015-06-12 11:29:04 UTC
Description of problem:
Upgrade from JON 3.2.0 to JON 3.3.3 failed with "Could not verify
 that the node is up and running"
I believe this is caused by slow environment:
22:53:18,695 INFO  [org.rhq.storage.installer.StorageInstaller] Starting RHQ Sto
rage Node
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] Could not verify
 that the node is up and running.
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] Check the log fi
le at ../../logs/rhq-storage.log for errors.

There is 10s between "Starting RHQ Storage" and "Could not verify that the node"
and it seems that storage node is not started within 10s on this slow environment.

There are no valid errors in storage log

Version-Release number of selected component (if applicable):
Version :	
3.3.0.GA Update 03
Build Number :	
82ad0cc:a25836e

How reproducible:
5/5

Steps to Reproduce:
1. install and start JON 3.2.0
2. unzip JON 3.3.0
3. unzip CP3
4. apply CP3 on JON 3.3.0
5. stop JON 3.2.0
6. start upgrade (c:\jon-server-3.3.0.GA\bin>rhqctl upgrade --from-server-dir c:\jon-server-3.2.0.GA)

Actual results:
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] Could not verify
 that the node is up and running.
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] Check the log fi
le at ../../logs/rhq-storage.log for errors.
22:53:28,836 WARN  [org.rhq.storage.installer.StorageInstaller] The storage inst
aller will now exit
22:53:28,867 INFO  [org.rhq.server.control.command.Upgrade] The storage node upg
rade has finished with an exit value of [2]
The RHQ Server [rhqserver-WIN-2008] service was not running.
Stopping the RHQ Storage [rhqstorage-WIN-2008] service...
RHQ Storage [rhqstorage-WIN-2008] service stopped.
RHQ storage node has stopped
22:53:34,883 ERROR [org.rhq.server.control.RHQControl] The storage node upgrade
failed with exit code [2]

Expected results:
Upgrade is successful

Additional info:
This issue will most probably occur only on environments where starting storage node takes more then 10s.
If the assumption is correct, it should occur during installation as well, anytime when storage node starts longer then 10s

Comment 1 Filip Brychta 2015-06-12 12:45:15 UTC
Created attachment 1038040 [details]
storage log

Comment 2 Filip Brychta 2015-06-12 12:45:33 UTC
Created attachment 1038041 [details]
console log

Comment 6 Michael Burman 2015-11-09 13:48:46 UTC
This isn't about timeout, Cassandra is returning false for NativeTransportRunning so we don't retry (we only retry if there's an exception). I'll fix this by pushing us to the retry policy if false is returned.

Comment 7 Michael Burman 2015-11-09 13:52:43 UTC
Fixed in the master:

commit 0cde115e1081f5aa982170e9f3838da4fd79963f
Author: Michael Burman <miburman>
Date:   Mon Nov 9 15:52:07 2015 +0200

    [BZ 1231199] If Cassandra returns NativeTransportRunning is false, force retry policy to try again

Comment 9 Michael Burman 2015-11-18 15:49:24 UTC
Merged to release/jon3.3.x:

commit 9c049818e3caef5321f2b35f45adee9e9b1b8a69
Author: Michael Burman <miburman>
Date:   Mon Nov 9 15:52:07 2015 +0200

    [BZ 1231199] If Cassandra returns NativeTransportRunning is false, force retry policy to try again
    
    (cherry picked from commit 0cde115e1081f5aa982170e9f3838da4fd79963f)

Comment 10 Simeon Pinder 2015-12-09 06:29:23 UTC
Moving to ON_QA as available to test with the following brew build:

JON Cumulative patch build: https://brewweb.devel.redhat.com/buildinfo?buildID=469635
  *Note: jon-server-patch-3.3.0.GA.zip maps to DR01 build of jon-server-3.3.0.GA-update-05.zip.

Comment 13 errata-xmlrpc 2016-02-03 15:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-0118.html