Bug 1032308 - Storage node should output a warning from RhqInternodeAuthenticator if node not found
Storage node should output a warning from RhqInternodeAuthenticator if node n...
Status: NEW
Product: RHQ Project
Classification: Other
Component: Storage Node (Show other bugs)
4.9
Unspecified Unspecified
unspecified Severity unspecified (vote)
: ---
: ---
Assigned To: RHQ Project Maintainer
Mike Foley
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-19 18:59 EST by Elias Ross
Modified: 2013-11-20 16:31 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Elias Ross 2013-11-19 18:59:34 EST
Description of problem:

I have spent a few hours trying to figure out why certain nodes were not connecting and my cluster could not start. It turns out that I was missing nodes in rhq-storage-auth.conf that I needed. 

This was through a manual process.

The authenticator should output a WARN message if it fails to authenticate, perhaps at most 2 or 3 times, to keep the log from flooding.


Version-Release number of selected component (if applicable): 4.9


How reproducible: Always


Steps to Reproduce:
1. Remote a host from rhq-storage-auth.conf
2. Attempt to start the cluster

Actual results: Cluster can't start

Expected results: Not starting, but something explaining why.

Additional info:
Comment 1 John Sanda 2013-11-20 16:31:17 EST
RHQ updates the rhq-storage-auth.conf file when nodes are added/removed from the cluster. The only time a user directly edit the file is when multiple storage nodes are deployed prior to your RHQ server being installed. With that said, it is  entirely possible for the file to be incorrect.

We could certainly see about adding some logging, but we would certainly want to keep it light and fast as the authenticator executes at the bottom of the C* stack in the messaging layer. More importantly though, we need a comprehensive solution for when new nodes fail to join the cluster or when existing nodes cannot communicate with the cluster. The cluster status column in the storage node UI already addresses deployment scenarios. If a node's cluster status is DOWN, then it can be assumed that the node is not part of the cluster. It does not however address post-deployment scenarios.

Note You need to log in before you can comment on or make changes to this bug.