Bug 1024326

Summary: Unable to create second JON server without storage node on HA setup
Product: [JBoss] JBoss Operations Network Reporter: Jeeva Kandasamy <jkandasa>
Component: High AvailabilityAssignee: Stefan Negrea <snegrea>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: jkandasa, jsanda
Target Milestone: ER05   
Target Release: JON 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1012435    
Attachments:
Description Flags
log file
none
Storage Node data in "rhq_system_config" none

Description Jeeva Kandasamy 2013-10-29 12:11:23 UTC
Created attachment 817061 [details]
log file

Description of problem:
I'm unable to create second JON server without storage node on JON HA setup. 
Command I executed,
[jenkins@rhel6-vm bin]$ ./rhqctl install --server --agent

It's throwing the exception,

17:31:03,111 ERROR [org.rhq.enterprise.server.installer.InstallerServiceImpl] Could not complete storage cluster schema installation: All host(s) tried for query failed (tried: localhost/127.0.0.1 ([localhost/127.0.0.1] Cannot connect)): com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1 ([localhost/127.0.0.1] Cannot connect))

Version-Release number of selected component (if applicable):
JBoss Operations Network 
Version: 3.2.0.ER4
Build Number: e413566:057b211
GWT Version: 2.5.0
SmartGWT Version: 3.0p


How reproducible:
always

Steps to Reproduce:
1. Setup(install) second JON server on HA setup without storage node

Additional info: Details log message is attached.

Comment 2 John Sanda 2013-10-30 18:59:04 UTC
Installing a server without a co-located storage node is valid. The only two requirements are that each storage node is co-located with an agent and that each server can communicate with (via CQL) each storage node. I do not think that the latter holds in this case. In the log provided by Jeeva I see,


17:31:03,111 ERROR [org.rhq.enterprise.server.installer.InstallerServiceImpl] Could not complete storage cluster schema installation: All host(s) tried for query failed (tried: localhost/127.0.0.1 ([localhost/127.0.0.1] Cannot connect)): com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1 ([localhost/127.0.0.1] Cannot connect))


This indicates that the storage node is bound to localhost; consequently, only the first server, co-located with the storage node, can communicate with it.

Jeeva, take a look at rhq-storage-installer.log on your first server machine. You should see a warning message like,

"This Storage Node is bound to the loopback address <storage_address>. It will not be able to communicate with Storage Nodes on other machines, and it can only receive client requests from this machine."

If/when we confirm that this is the issue, then I think we can close this out.

Comment 3 John Sanda 2013-10-30 20:13:29 UTC
While I think Jeeva did have an environment issue, Stefan also found the problem in the server installer. The installer fetches the storage cluster ports from the database, but it does not fetch the storage node addresses. Stefan is working on the fix so I am reassigning to him. Note though that even with the fix, the storage node should not be using localhost.

Comment 4 Jeeva Kandasamy 2013-10-31 09:45:47 UTC
(In reply to John Sanda from comment #3)
> While I think Jeeva did have an environment issue, Stefan also found the
> problem in the server installer. The installer fetches the storage cluster
> ports from the database, but it does not fetch the storage node addresses.
> Stefan is working on the fix so I am reassigning to him. Note though that
> even with the fix, the storage node should not be using localhost.

True, Storage node IP is not stored on the table "rhq_system_config". If I update storage node IP manually it's resetting to localhost IP.

Comment 5 Jeeva Kandasamy 2013-10-31 09:48:20 UTC
Created attachment 817788 [details]
Storage Node data in "rhq_system_config"

Storage node details on postgresql database (Table: rhq_system_config). Storage node IP is missing. If we have more than on storage node, we have to maintain all the nodes IP somehow. Screen shot is attached

Comment 6 John Sanda 2013-10-31 11:38:00 UTC
Storage node endpoints are stored in the rhq_storage_node table. The endpoint addresses are visible from the storage node admin UI. Jeeva, can you please provide your rhq-storage-installer.log file.

Comment 7 Jeeva Kandasamy 2013-10-31 12:56:35 UTC
(In reply to John Sanda from comment #6)
> Storage node endpoints are stored in the rhq_storage_node table. The
> endpoint addresses are visible from the storage node admin UI. Jeeva, can
> you please provide your rhq-storage-installer.log file.

Yes, It's there. Just I checked the table rhq_storage_node table. I'm facing this issue if I didn't enter storage node IP in rhq-server.properties file. Yes, as mentioned in the comment #3 ip address of storage node should be taken automatically from postgresql database. Earlier I was thinking that storage node IP also will be on 'rhq_system_config' table. If I provide storage node IP on rhq-server.properties it's working fine.

Comment 8 Stefan Negrea 2013-10-31 14:29:10 UTC
The storage node information was not retrieved from the database like all the other store cluster settings. Added code to retrieve the stroge node addresses from the respective table as a comma separated list.


release/jon3.2.x branch commit:
https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=c8f4da3b4219226a28b54c8aa28bedec97321c85



Please retest ...

Comment 9 Simeon Pinder 2013-11-07 02:18:05 UTC
Moving to ON_QA for test with new brew build.

Comment 10 Jeeva Kandasamy 2013-11-08 13:29:05 UTC
Verified, On HA setup second JON server is taking storage node(s) IP from postgresql database.

Version : 3.2.0.ER5
Build Number : 2cb2bc9:225c796
GWT Version : 2.5.0
SmartGWT Version : 3.0p