Bug 1665361 - Alerts for offline nodes
Summary: Alerts for offline nodes
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: project-infrastructure
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-11 06:57 UTC by Nigel Babu
Modified: 2020-03-12 12:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:13:05 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nigel Babu 2019-01-11 06:57:04 UTC
I want to have a report that tells us which Jenkins nodes are offline and why they're offline. This is offline in terms of Jenkins. We often have failures in a few nodes and it takes us a few weeks to get around to fixing them.

This bug is for a solution as well as implementing it.

Option 1: A jenkins job which makes API calls and sends us an email in case there are machines offline.

Option 2: Nagios check which alerts us. This is slightly more explosive :)

Comment 1 M. Scherer 2019-01-14 10:23:14 UTC
I suspect option 2 is not what we want. 

But yeah, nagios do handle this quite well, doing notification, etc, etc. But would still need to do the basic script that do the API call anyway, the difference would be between "send a email", or "do a api call to nagios to trigger a alert", and I think we could switch between thel quite easily if needed.

Comment 2 sankarshan 2019-05-27 02:09:34 UTC
Is there any decision on whether Option#1 can be implemented? Deepshikha, can we have Naresh to look into this?

Comment 3 Deepshikha khandelwal 2019-05-27 04:00:05 UTC
According to me we should have it on nagios rather than alerting jenkins job. Nagios is already in place for builders to alert about any memory failures or so. Though I don't receive notifications (that's a different story) but would be good to have just one such source of alerting. 

Naresh can look at the script if we agree on this.

Comment 5 Worker Ant 2020-03-12 12:13:05 UTC
This bug is moved to https://github.com/gluster/project-infrastructure/issues/6, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.