Description of problem: I've turned off 5 from 6 nodes where is Gluster pool with volume_beta_arbiter_2_plus_1x2 volume. I see alerts about quorum, sub-volumes and hosts/peers but not about status change of bricks. If I do this with only one host, I see alerts about brick status change. Version-Release number of selected component (if applicable): glusterfs-3.8.4-52.el7rhgs.x86_64 glusterfs-api-3.8.4-52.el7rhgs.x86_64 glusterfs-cli-3.8.4-52.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-52.el7rhgs.x86_64 glusterfs-events-3.8.4-52.el7rhgs.x86_64 glusterfs-fuse-3.8.4-52.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-52.el7rhgs.x86_64 glusterfs-libs-3.8.4-52.el7rhgs.x86_64 glusterfs-rdma-3.8.4-52.el7rhgs.x86_64 glusterfs-server-3.8.4-52.el7rhgs.x86_64 gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64 python-gluster-3.8.4-52.el7rhgs.noarch tendrl-collectd-selinux-1.5.3-2.el7rhgs.noarch tendrl-commons-1.5.4-3.el7rhgs.noarch tendrl-gluster-integration-1.5.4-3.el7rhgs.noarch tendrl-node-agent-1.5.4-3.el7rhgs.noarch tendrl-selinux-1.5.3-2.el7rhgs.noarch vdsm-gluster-4.17.33-1.2.el7rhgs.noarch How reproducible: 100% Steps to Reproduce: 1. set up Gluster pool with arbiter_2_plus_1x2 volume (6 peers). 2. turn off 5 of 6 nodes 3. check snmp, email and UI alerts Actual results: There is no alert about brick state change when multiple nodes are off. Expected results: There are alerts about status change of all bricks from nodes which are down even if there are alerts about quorum, sub-volumes or peers status change.
Currently, we are not raising an alert for brick down or any other alert when a node is down, we just raising alert like peer reject or disconnect only. We need to add this feature.
I would like to mark this for a future release
This issue is fixed https://github.com/Tendrl/gluster-integration/pull/549, brick status alert is raised when peer disconnected or node goes down
Alerting for brick status seems to work ok. --> VERIFIED Tested with: tendrl-ansible-1.6.3-3.el7rhgs.noarch tendrl-api-1.6.3-3.el7rhgs.noarch tendrl-api-httpd-1.6.3-3.el7rhgs.noarch tendrl-commons-1.6.3-4.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-2.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-2.el7rhgs.noarch tendrl-node-agent-1.6.3-4.el7rhgs.noarch tendrl-notifier-1.6.3-2.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-1.el7rhgs.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616