Bug 1139228

Summary: [Nagios] Auto-config removes all the configuration if the host used for discovery is detached from the cluster.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: RamaKasturi <knarra>
Component: nagios-server-addonsAssignee: Ramesh N <rnachimu>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: asrivast, dpati, esammons, kmayilsa, knarra, psriniva, rnachimu, sharne, ssampat
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.0.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nagios-server-addons-0.1.8-1.el6rhs Doc Type: Bug Fix
Doc Text:
Previously, if the host that was used for discovery was detached from the Red Hat Storage Trusted Storage Pool, then all the hosts would get removed from the Nagios configuration when auto-discovery was performed. With this fix, the auto config service does not remove the configurations and it works as expected.
Story Points: ---
Clone Of: 1107998 Environment:
Last Closed: 2015-01-15 13:49:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1107998    
Bug Blocks: 1087818    

Description RamaKasturi 2014-09-08 12:47:37 UTC
+++ This bug was initially created as a clone of Bug #1107998 +++

Description of problem:
Auto configuration removes all the configurations(hosts,volumes, bricks) from Nagios if the host used for discovery is no longer part of the cluster

Version-Release number of selected component (if applicable):
nagios-server-addons-0.1.5-1.el6rhs.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster with 3 nodes , make sure that the hosts does not have unique host names.
2. Create some volumes and start them
3. Run discovery script by providing name of the cluster and ip of HostA
4. Make sure all the volumes/hosts show up in nagios UI
5. Detach HostA from the cluster using "gluster peer detach" command
6. Re-schedule the auto-config in nagios ui

Actual results:
Except HostA all other hosts/volumes removed from the nagios configuration and HostA adds it self as a new entity to the cluster.

Expected results:
hosts/volumes should not removed from the nagios configuration. User should run the discovery by providing the ip of HostB.

Additional info:

--- Additional comment from Ramesh N on 2014-06-11 06:39:22 EDT ---

Patch sent upstrean : http://review.gluster.org/#/c/8024/

--- Additional comment from Shruti Sampat on 2014-06-17 07:50:44 EDT ---

Verified as fixed in nagios-server-addons-0.1.4-1.el6rhs.x86_64

Performed the following steps -

1. Created a cluster of four nodes, host1, host2, host3 and host4.
2. Created a couple of volumes, and started them.
3. Configured this cluster to be monitored via nagios server, which is also one of the RHS nodes.
4. Removed host4 from the cluster using gluster peer detach command.
5. Attempted to run cluster auto-config service using the Nagios UI. Saw the status of the service change to critical with the following status information -

Can't remove all hosts except sync host in 'auto' mode. Run auto discovery manually

6. Ran auto-discovery at the nagios server using command line, manually -

# /usr/lib64/nagios/plugins/gluster/discovery.py -c cluster_auto -H host1

Cluster configurations changed

Changes :
Hostgroup cluster_auto - UPDATE
Host cluster_auto - UPDATE
         Service - Cluster Auto Config -UPDATE 
Host host4 - REMOVE
Are you sure, you want to commit the changes? (Yes, No) [Yes]: 
Cluster configurations synced successfully from host host1
Do you want to restart Nagios to start monitoring newly discovered entities? (Yes, No) [Yes]: 
Nagios re-started successfully

In the Nagios UI, the host host4 was removed, and the status of the cluster auto-config service changed to OK.

Comment 1 RamaKasturi 2014-09-08 12:50:15 UTC
Description of problem:
Auto configuration removes all the configurations(hosts,volumes, bricks) from Nagios if the host used for discovery is no longer part of the cluster

Version-Release number of selected component (if applicable):
nagios-server-addons-0.1.5-1.el6rhs.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster with 3 nodes , make sure that the hosts does not have unique host names.
2. Create some volumes and start them
3. Run discovery script by providing name of the cluster and ip of HostA
4. Make sure all the volumes/hosts show up in nagios UI
5. Detach HostA from the cluster using "gluster peer detach" command
6. Re-schedule the auto-config in nagios ui

Actual results:
Except HostA all other hosts/volumes removed from the nagios configuration and HostA adds it self as a new entity to the cluster.

Expected results:
hosts/volumes should not removed from the nagios configuration. User should run the discovery by providing the ip of HostB.

Comment 3 Shalaka 2014-09-20 11:15:59 UTC
Please review and sign-off edited doc text.

Comment 4 Ramesh N 2014-09-22 07:18:04 UTC
As I mentioned in the previous doc text, This issue is applicable only if the host names are not unique and ip address is used as hostname in nagios.

Comment 6 RamaKasturi 2014-11-17 13:06:24 UTC
verified and works fine with build nagios-server-addons-0.1.8-1.el6rhs.noarch.

Performed the steps below:

Hostnames of the hosts in the cluster were not uniqe.

1) Had a cluster with three nodes A, B and C.

2) Ran discovery.py script using the host A using its ip adress.

3) Detached Host A from the cluster by running 'gluster peer detach 'A'.

4) Auto config service status went to 'critical' with status information "can't remove all hosts except sync host in 'auto' mode. Run auto discovery manually.

5) Ran discovery.py  with Host B.

6) None of the hosts / volumes were removed from the cluster except Host A and Auto config service became 'OK' with status information ' Cluster configurations are in sync'

Comment 7 Pavithra 2014-12-24 09:07:03 UTC
Hi Ramesh,

Can you please review the edited doc text for technical accuracy and sign off?

Comment 8 Ramesh N 2014-12-24 11:22:11 UTC
Instead of generically saying "auto config service works as expected", we could say that auto config won't remove the configurations.

Comment 9 Pavithra 2014-12-24 15:45:11 UTC
Thank you Ramesh. 
I modified the description.

Comment 10 Ramesh N 2014-12-24 19:55:58 UTC
Doc text looks good to me

Comment 12 errata-xmlrpc 2015-01-15 13:49:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0039.html