Bug 1665361

Summary: Alerts for offline nodes
Product: [Community] GlusterFS Reporter: Nigel Babu <nigelb>
Component: project-infrastructureAssignee: bugs <bugs>
Status: CLOSED UPSTREAM QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, dkhandel, gluster-infra, mscherer
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:13:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nigel Babu 2019-01-11 06:57:04 UTC
I want to have a report that tells us which Jenkins nodes are offline and why they're offline. This is offline in terms of Jenkins. We often have failures in a few nodes and it takes us a few weeks to get around to fixing them.

This bug is for a solution as well as implementing it.

Option 1: A jenkins job which makes API calls and sends us an email in case there are machines offline.

Option 2: Nagios check which alerts us. This is slightly more explosive :)

Comment 1 M. Scherer 2019-01-14 10:23:14 UTC
I suspect option 2 is not what we want. 

But yeah, nagios do handle this quite well, doing notification, etc, etc. But would still need to do the basic script that do the API call anyway, the difference would be between "send a email", or "do a api call to nagios to trigger a alert", and I think we could switch between thel quite easily if needed.

Comment 2 sankarshan 2019-05-27 02:09:34 UTC
Is there any decision on whether Option#1 can be implemented? Deepshikha, can we have Naresh to look into this?

Comment 3 Deepshikha khandelwal 2019-05-27 04:00:05 UTC
According to me we should have it on nagios rather than alerting jenkins job. Nagios is already in place for builders to alert about any memory failures or so. Though I don't receive notifications (that's a different story) but would be good to have just one such source of alerting. 

Naresh can look at the script if we agree on this.

Comment 5 Worker Ant 2020-03-12 12:13:05 UTC
This bug is moved to https://github.com/gluster/project-infrastructure/issues/6, and will be tracked there from now on. Visit GitHub issues URL for further details