Bug 1002252 - JON server becomes unreachable with no explanation if storage node is not yet running when JON server is started
JON server becomes unreachable with no explanation if storage node is not yet...
Status: CLOSED CURRENTRELEASE
Product: JBoss Operations Network
Classification: JBoss
Component: Core Server (Show other bugs)
JON 3.2
Unspecified Unspecified
medium Severity high
: ER03
: JON 3.2.0
Assigned To: Jirka Kremser
Mike Foley
:
Depends On:
Blocks: jon32-Beta-Blockers-1006862
  Show dependency treegraph
 
Reported: 2013-08-28 13:50 EDT by Larry O'Leary
Modified: 2014-01-02 15:39 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Log generated by server when storage node is not yet up at startup (228.59 KB, text/x-log)
2013-08-28 13:50 EDT, Larry O'Leary
no flags Details
Screenshot (55.13 KB, image/png)
2013-10-31 09:45 EDT, Sunil Kondkar
no flags Details

  None (edit)
Description Larry O'Leary 2013-08-28 13:50:08 EDT
Created attachment 791497 [details]
Log generated by server when storage node is not yet up at startup

Description of problem:
If the JON server is started independently from the storage node or the storage node is not fully available or reachable during startup of JON server, the server is left in a bad state with no way of reaching it.

Version-Release number of selected component (if applicable):
3.2.0.ALPHA_QA.58

How reproducible:
Always

Steps to Reproduce:
1. Configure and install JON server and storage node.
2. Start JON system using rhqctl start.
3. Verify server is running and is reachable.
4. Shutdown system using rhqctl stop.
5. Start JON server only using rhq-server.sh start

Actual results:
Login page is never displayed and JON server is unreachable.

Expected results:
Server should start normally.

Additional info:
From the administrators perspective, the expectation should be that the server starts and provides evidence that a storage node is unreachable along with direction on how to resolve the issue.
Comment 1 Heiko W. Rupp 2013-09-17 03:49:05 EDT
Jirka, can you please while working on Bug 1000065 also consider this case where the server could print a message on the index.html page if startup failed because the storage or relational database was not reachable?

Could we perhaps gather such messages in a in-memory buffer that the index page shows if the buffer is not empty?
Comment 2 Heiko W. Rupp 2013-09-20 07:19:25 EDT
In theory rhqctl should prevent this, but I can see how this may fail.
rhqctl start --server should test if storage is available before really starting the server.
Comment 3 Jirka Kremser 2013-09-25 14:10:50 EDT
^ the storage doesn't be co-located with the rhq server. If this is the case, rhqctl have no way to find out the ip addresses without talking to RDBMS.

link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=5f95e5c0e
time:    2013-09-25 19:48:22 +0200
commit:  5f95e5c0e9db5d58aef69f53165514670b46b7b9
author:  Jirka Kremser - jkremser@redhat.com
message: [BZ 1002252] - JON server becomes unreachable with no explanation if
         storage node is not yet running when JON server is started -
         Displaying what subsystems have failed during the RHQ server
         startup on the root context webapp (localhost:7080/).


branch:  master
link:    http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=c483b092d
time:    2013-09-25 19:48:21 +0200
commit:  c483b092d7ca520275e3424e8042f391c8fd360d
author:  Jirka Kremser - jkremser@redhat.com
message: [BZ 1002252] - JON server becomes unreachable with no explanation if
         storage node is not yet running when JON server is started -
         Letting RHQ server to start even without connection to storage
         cluster. In this case the quartz job is scheduled that will do
         the necessary work once the storage is up (and cancel itself).
Comment 4 Simeon Pinder 2013-10-08 03:41:52 EDT
Moving to ON_QA for testing.
Comment 5 Sunil Kondkar 2013-10-31 09:45:13 EDT
Verified on Version: 3.2.0.ER4 Build Number: e413566:057b211

If storage node is not running and server is started, The localhost:7080 displays 'Following subsystems had problems to start: storage'.

Please refer the screenshot.

After the server gets started, the login screen appears.

after starting, server log shows below warning:
18:26:31,195 WARN  [org.rhq.enterprise.server.storage.StorageClientManagerBean] (RHQScheduler_Worker-4) Storage client subsystem wasn't initialized because it wasn't possible to connect to the storage cluster. The RHQ server is set to MAINTENANCE mode. Please start the storage cluster as soon as possible.

After starting storage node:
19:00:40,079 INFO  [org.rhq.enterprise.server.storage.StorageClientManagerBean] (RHQScheduler_Worker-2) Storage client subsystem is now initialized
19:01:06,753 INFO  [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] (EJB default - 9) Notified communication layer of server operation mode NORMAL
Comment 6 Sunil Kondkar 2013-10-31 09:45:51 EDT
Created attachment 817922 [details]
Screenshot

Note You need to log in before you can comment on or make changes to this bug.