Bug 1679609 - RFE: ceph-dashboard should be able to send SNMP trap upon change of cluster status
Summary: RFE: ceph-dashboard should be able to send SNMP trap upon change of cluster s...
Keywords:
Status: CLOSED DUPLICATE of bug 1259160
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 3.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 5.1
Assignee: Paul Cuzner
QA Contact: Sunil Angadi
Karen Norteman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-21 14:41 UTC by Matthias Muench
Modified: 2022-01-27 10:21 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-18 13:57:21 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 52708 0 None None None 2021-10-06 16:40:26 UTC
Github ceph ceph pull 43274 0 None open monitoring:Adding the Ceph MIB 2021-10-06 16:40:26 UTC
Red Hat Issue Tracker RHCEPH-1676 0 None None None 2021-09-13 07:50:14 UTC

Description Matthias Muench 2019-02-21 14:41:54 UTC
Description of problem:
Integration of Ceph with existing enterprise monitoring tools would require to at least generate a SNMP trap to a SNMP trap destination server (or ideally a list of multiple).

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
Integration of Ceph with existing enterprise monitoring systems is not possible due to missing SNMP trap generation upon status changes.


Expected results:
At least, upon change of health of a cluster from healthy to something else should generate a SNMP trap, sent to a list of configured SNMP trap destination servers.


Additional info:
Alternative implementation would be on a ceph-mon/mgr level, however this would require individual configuration for every Ceph cluster. Using dashboard as central point of monitoring could perhaps provide either one config for all (changes for all clusters reported to same destinations, initial solution) or a more sophisticated setup to be able to separate SNMP trap destinations for different clusters to allow deviation of traps depending on assignment within organisations.

Comment 1 Ernesto Puerta 2019-03-27 17:30:02 UTC
An approach explored in the past consisted of:
-  ceph-mgr ==> Prometheus exporter ==> Prometheus ==> Prometheus AlertManager ==> HTTP Webhook API ==> Prometheus SNMPTrapper Webhook (https://github.com/chrusty/prometheus_webhook_snmptrapper)

However, that latter project shows no activity since 2 years ago. On the other hand, this other webhook integration (https://github.com/maxwo/snmp_notifier) has been recently released. Both rely on Net-SNMP stack.

That said, Ceph-Dashboard is not strictly required for this. However, the current upstream approach is to expose AlertManager in Dashboard, so technically we could book a place there for UI.

Pros:
- No code changes required in Ceph, as long as all metrics to send as 'traps' are already exported to Prometheus.
- Prometheus and Alertmanager are already building blocks.
- No big caveats in reliability, as long as SNMP traps shouldn't be used (alone) if reliability is a key concern.

Cons:
- Complexity moved to deployment/configuration stage.
- No FOSS License assessment performed yet on those projects.
- Both projects seem to have marginal community adoption/response (small or no track of issues/bugfixing activity). So a big question mark in terms of code/SNMP implementation quality.

Comment 3 Giridhar Ramaraju 2019-08-20 06:58:03 UTC
Level setting the severity of this defect to "High" with a bulk update. Pls refine it to a more closure value, as defined by the severity definition in https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity

Comment 12 Sebastian Wagner 2022-01-18 12:47:41 UTC
backport pr: https://github.com/ceph/ceph/pull/44529

Comment 13 Sebastian Wagner 2022-01-18 13:57:21 UTC

*** This bug has been marked as a duplicate of bug 1259160 ***


Note You need to log in before you can comment on or make changes to this bug.