Created attachment 791497 [details] Log generated by server when storage node is not yet up at startup Description of problem: If the JON server is started independently from the storage node or the storage node is not fully available or reachable during startup of JON server, the server is left in a bad state with no way of reaching it. Version-Release number of selected component (if applicable): 3.2.0.ALPHA_QA.58 How reproducible: Always Steps to Reproduce: 1. Configure and install JON server and storage node. 2. Start JON system using rhqctl start. 3. Verify server is running and is reachable. 4. Shutdown system using rhqctl stop. 5. Start JON server only using rhq-server.sh start Actual results: Login page is never displayed and JON server is unreachable. Expected results: Server should start normally. Additional info: From the administrators perspective, the expectation should be that the server starts and provides evidence that a storage node is unreachable along with direction on how to resolve the issue.
Jirka, can you please while working on Bug 1000065 also consider this case where the server could print a message on the index.html page if startup failed because the storage or relational database was not reachable? Could we perhaps gather such messages in a in-memory buffer that the index page shows if the buffer is not empty?
In theory rhqctl should prevent this, but I can see how this may fail. rhqctl start --server should test if storage is available before really starting the server.
^ the storage doesn't be co-located with the rhq server. If this is the case, rhqctl have no way to find out the ip addresses without talking to RDBMS. link: http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=5f95e5c0e time: 2013-09-25 19:48:22 +0200 commit: 5f95e5c0e9db5d58aef69f53165514670b46b7b9 author: Jirka Kremser - jkremser message: [BZ 1002252] - JON server becomes unreachable with no explanation if storage node is not yet running when JON server is started - Displaying what subsystems have failed during the RHQ server startup on the root context webapp (localhost:7080/). branch: master link: http://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?id=c483b092d time: 2013-09-25 19:48:21 +0200 commit: c483b092d7ca520275e3424e8042f391c8fd360d author: Jirka Kremser - jkremser message: [BZ 1002252] - JON server becomes unreachable with no explanation if storage node is not yet running when JON server is started - Letting RHQ server to start even without connection to storage cluster. In this case the quartz job is scheduled that will do the necessary work once the storage is up (and cancel itself).
Moving to ON_QA for testing.
Verified on Version: 3.2.0.ER4 Build Number: e413566:057b211 If storage node is not running and server is started, The localhost:7080 displays 'Following subsystems had problems to start: storage'. Please refer the screenshot. After the server gets started, the login screen appears. after starting, server log shows below warning: 18:26:31,195 WARN [org.rhq.enterprise.server.storage.StorageClientManagerBean] (RHQScheduler_Worker-4) Storage client subsystem wasn't initialized because it wasn't possible to connect to the storage cluster. The RHQ server is set to MAINTENANCE mode. Please start the storage cluster as soon as possible. After starting storage node: 19:00:40,079 INFO [org.rhq.enterprise.server.storage.StorageClientManagerBean] (RHQScheduler_Worker-2) Storage client subsystem is now initialized 19:01:06,753 INFO [org.rhq.enterprise.server.cloud.instance.ServerManagerBean] (EJB default - 9) Notified communication layer of server operation mode NORMAL
Created attachment 817922 [details] Screenshot