Bug 1024326 - Unable to create second JON server without storage node on HA setup
Unable to create second JON server without storage node on HA setup
Status: CLOSED CURRENTRELEASE
Product: JBoss Operations Network
Classification: JBoss
Component: High Availability (Show other bugs)
JON 3.2
Unspecified Unspecified
unspecified Severity high
: ER05
: JON 3.2.0
Assigned To: Stefan Negrea
Mike Foley
:
Depends On:
Blocks: 1012435
  Show dependency treegraph
 
Reported: 2013-10-29 08:11 EDT by Jeeva Kandasamy
Modified: 2014-01-02 15:39 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
log file (10.53 KB, text/x-log)
2013-10-29 08:11 EDT, Jeeva Kandasamy
no flags Details
Storage Node data in "rhq_system_config" (69.23 KB, image/jpeg)
2013-10-31 05:48 EDT, Jeeva Kandasamy
no flags Details

  None (edit)
Description Jeeva Kandasamy 2013-10-29 08:11:23 EDT
Created attachment 817061 [details]
log file

Description of problem:
I'm unable to create second JON server without storage node on JON HA setup. 
Command I executed,
[jenkins@rhel6-vm bin]$ ./rhqctl install --server --agent

It's throwing the exception,

17:31:03,111 ERROR [org.rhq.enterprise.server.installer.InstallerServiceImpl] Could not complete storage cluster schema installation: All host(s) tried for query failed (tried: localhost/127.0.0.1 ([localhost/127.0.0.1] Cannot connect)): com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1 ([localhost/127.0.0.1] Cannot connect))

Version-Release number of selected component (if applicable):
JBoss Operations Network 
Version: 3.2.0.ER4
Build Number: e413566:057b211
GWT Version: 2.5.0
SmartGWT Version: 3.0p


How reproducible:
always

Steps to Reproduce:
1. Setup(install) second JON server on HA setup without storage node

Additional info: Details log message is attached.
Comment 2 John Sanda 2013-10-30 14:59:04 EDT
Installing a server without a co-located storage node is valid. The only two requirements are that each storage node is co-located with an agent and that each server can communicate with (via CQL) each storage node. I do not think that the latter holds in this case. In the log provided by Jeeva I see,


17:31:03,111 ERROR [org.rhq.enterprise.server.installer.InstallerServiceImpl] Could not complete storage cluster schema installation: All host(s) tried for query failed (tried: localhost/127.0.0.1 ([localhost/127.0.0.1] Cannot connect)): com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1 ([localhost/127.0.0.1] Cannot connect))


This indicates that the storage node is bound to localhost; consequently, only the first server, co-located with the storage node, can communicate with it.

Jeeva, take a look at rhq-storage-installer.log on your first server machine. You should see a warning message like,

"This Storage Node is bound to the loopback address <storage_address>. It will not be able to communicate with Storage Nodes on other machines, and it can only receive client requests from this machine."

If/when we confirm that this is the issue, then I think we can close this out.
Comment 3 John Sanda 2013-10-30 16:13:29 EDT
While I think Jeeva did have an environment issue, Stefan also found the problem in the server installer. The installer fetches the storage cluster ports from the database, but it does not fetch the storage node addresses. Stefan is working on the fix so I am reassigning to him. Note though that even with the fix, the storage node should not be using localhost.
Comment 4 Jeeva Kandasamy 2013-10-31 05:45:47 EDT
(In reply to John Sanda from comment #3)
> While I think Jeeva did have an environment issue, Stefan also found the
> problem in the server installer. The installer fetches the storage cluster
> ports from the database, but it does not fetch the storage node addresses.
> Stefan is working on the fix so I am reassigning to him. Note though that
> even with the fix, the storage node should not be using localhost.

True, Storage node IP is not stored on the table "rhq_system_config". If I update storage node IP manually it's resetting to localhost IP.
Comment 5 Jeeva Kandasamy 2013-10-31 05:48:20 EDT
Created attachment 817788 [details]
Storage Node data in "rhq_system_config"

Storage node details on postgresql database (Table: rhq_system_config). Storage node IP is missing. If we have more than on storage node, we have to maintain all the nodes IP somehow. Screen shot is attached
Comment 6 John Sanda 2013-10-31 07:38:00 EDT
Storage node endpoints are stored in the rhq_storage_node table. The endpoint addresses are visible from the storage node admin UI. Jeeva, can you please provide your rhq-storage-installer.log file.
Comment 7 Jeeva Kandasamy 2013-10-31 08:56:35 EDT
(In reply to John Sanda from comment #6)
> Storage node endpoints are stored in the rhq_storage_node table. The
> endpoint addresses are visible from the storage node admin UI. Jeeva, can
> you please provide your rhq-storage-installer.log file.

Yes, It's there. Just I checked the table rhq_storage_node table. I'm facing this issue if I didn't enter storage node IP in rhq-server.properties file. Yes, as mentioned in the comment #3 ip address of storage node should be taken automatically from postgresql database. Earlier I was thinking that storage node IP also will be on 'rhq_system_config' table. If I provide storage node IP on rhq-server.properties it's working fine.
Comment 8 Stefan Negrea 2013-10-31 10:29:10 EDT
The storage node information was not retrieved from the database like all the other store cluster settings. Added code to retrieve the stroge node addresses from the respective table as a comma separated list.


release/jon3.2.x branch commit:
https://git.fedorahosted.org/cgit/rhq/rhq.git/commit/?h=release/jon3.2.x&id=c8f4da3b4219226a28b54c8aa28bedec97321c85



Please retest ...
Comment 9 Simeon Pinder 2013-11-06 21:18:05 EST
Moving to ON_QA for test with new brew build.
Comment 10 Jeeva Kandasamy 2013-11-08 08:29:05 EST
Verified, On HA setup second JON server is taking storage node(s) IP from postgresql database.

Version : 3.2.0.ER5
Build Number : 2cb2bc9:225c796
GWT Version : 2.5.0
SmartGWT Version : 3.0p

Note You need to log in before you can comment on or make changes to this bug.