Bug 1679609

Summary: RFE: ceph-dashboard should be able to send SNMP trap upon change of cluster status
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Matthias Muench <mmuench>
Component: CephadmAssignee: Paul Cuzner <pcuzner>
Status: CLOSED DUPLICATE QA Contact: Sunil Angadi <sangadi>
Severity: high Docs Contact: Karen Norteman <knortema>
Priority: high    
Version: 3.2CC: anharris, ceph-eng-bugs, epuertat, flucifre, gsitlani, jelopez, mmuench, pcuzner, rlepaksh, rmandyam, sangadi, sewagner, tserlin, vereddy
Target Milestone: ---Keywords: FutureFeature
Target Release: 5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-18 13:57:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matthias Muench 2019-02-21 14:41:54 UTC
Description of problem:
Integration of Ceph with existing enterprise monitoring tools would require to at least generate a SNMP trap to a SNMP trap destination server (or ideally a list of multiple).

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
Integration of Ceph with existing enterprise monitoring systems is not possible due to missing SNMP trap generation upon status changes.


Expected results:
At least, upon change of health of a cluster from healthy to something else should generate a SNMP trap, sent to a list of configured SNMP trap destination servers.


Additional info:
Alternative implementation would be on a ceph-mon/mgr level, however this would require individual configuration for every Ceph cluster. Using dashboard as central point of monitoring could perhaps provide either one config for all (changes for all clusters reported to same destinations, initial solution) or a more sophisticated setup to be able to separate SNMP trap destinations for different clusters to allow deviation of traps depending on assignment within organisations.

Comment 1 Ernesto Puerta 2019-03-27 17:30:02 UTC
An approach explored in the past consisted of:
-  ceph-mgr ==> Prometheus exporter ==> Prometheus ==> Prometheus AlertManager ==> HTTP Webhook API ==> Prometheus SNMPTrapper Webhook (https://github.com/chrusty/prometheus_webhook_snmptrapper)

However, that latter project shows no activity since 2 years ago. On the other hand, this other webhook integration (https://github.com/maxwo/snmp_notifier) has been recently released. Both rely on Net-SNMP stack.

That said, Ceph-Dashboard is not strictly required for this. However, the current upstream approach is to expose AlertManager in Dashboard, so technically we could book a place there for UI.

Pros:
- No code changes required in Ceph, as long as all metrics to send as 'traps' are already exported to Prometheus.
- Prometheus and Alertmanager are already building blocks.
- No big caveats in reliability, as long as SNMP traps shouldn't be used (alone) if reliability is a key concern.

Cons:
- Complexity moved to deployment/configuration stage.
- No FOSS License assessment performed yet on those projects.
- Both projects seem to have marginal community adoption/response (small or no track of issues/bugfixing activity). So a big question mark in terms of code/SNMP implementation quality.

Comment 3 Giridhar Ramaraju 2019-08-20 06:58:03 UTC
Level setting the severity of this defect to "High" with a bulk update. Pls refine it to a more closure value, as defined by the severity definition in https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity

Comment 12 Sebastian Wagner 2022-01-18 12:47:41 UTC
backport pr: https://github.com/ceph/ceph/pull/44529

Comment 13 Sebastian Wagner 2022-01-18 13:57:21 UTC

*** This bug has been marked as a duplicate of bug 1259160 ***