Created attachment 1356901 [details] full snmptrapd.log file Description of problem ====================== There is either single (or small subset) of MIB structure used in snmp trap messages tendrl-notifier is sending for alerts. This seems to be implemented as planned in upstream, looking into snmp spec[1]: > design and implementation of a browseable MIB or MIB for Tendrl > is not currently planned; > this will allow us to translate the potentially large number of Tendrl > events and alerts into a very limited number of named element types > third-party trap handlers may be required to perform some parsing on > the generalized trap messages (we’ve found precedent for this in, e.g., > OpenNMS and Nagios), SNMP trap message structure is not documented in upstream. I'm creating this BZ to validate this design wrt downstream requirements. [1] https://github.com/Tendrl/specifications/issues/185 Version-Release =============== tendrl-notifier-1.5.4-2.el7rhgs.noarch How reproducible ================ 100 % Steps to Reproduce ================== 1. Install RHGS WA using tendrl-ansible 2. Configure alerting to send events via snmp 3. Import gluster trusted storage pool with a volume 4. Stop and start again glusterd on some storage done 5. Stop adn start again whole volume 6. Check snmp trap messages you have received, eg: journalctl -u snmptrapd > snmptrapd.log When qe playbook for alerting test setup is used: https://github.com/usmqe/usmqe-setup/blob/master/test_setup.snmp.yml one can check incoming snmp trap messages via: # journalctl -u snmptrapd -fe Actual results ============== Checking snmptrapd log file, all alerts uses the same MIB structure: > 'DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-MIB::coldStart SNMPv2-SMI::private.2312.19.1.0 = STRING: As can be demonstrated here: ``` $ grep '\[Tendrl Alert\]' snmptrapd.log | wc -l 38 $ grep 'DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (0) 0:00:00.00 SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-MIB::coldStart SNMPv2-SMI::private.2312.19.1.0 = STRING:' snmptrapd.log | wc -l 38 ``` To provide a full context for the example shown above, among these 38 messages, we have only 5 types of alerts: ``` $ grep 'STRING:' snmptrapd.log | sed 's/.*STRING: //' | sed 's/\(INFO\|WARNING\).*//' | sort | uniq "[Tendrl Alert] Afr Quorum State, "[Tendrl Alert] Afr Subvol State, "[Tendrl Alert] Brick Status, "[Tendrl Alert] Peer Status, "[Tendrl Alert] Volume Status, ``` Expected results ================ ? Additional info =============== Options we have is to: * consider if we need to document structure of snmp trap messages we are sending (especially if structure of a trap message is different in some cases) * consider if we need to change the structure a bit, without creating our own MIB * plan development of our own MIB for next version
Asking Luboš to provide additional information and insight, as he originally suggested to check this.
For sure if we want to have SNMP correctly implemented we should also have proper MIB and OIDs, where: MIB stands for Management Information Base and is a collection of definitions that define the properties of the managed object within the device to be managed. MIB files are written in an independent format and the object information they contain is organized hierarchically. The various pieces of information can be accessed by SNMP. OIDs or Object Identifiers uniquely identify managed objects in the MIB. Of course anyone can send snmp trap without deeper knowledge. In other words use some strange ID, as that's mandatory attribute, and put there just different message for different traps, but that's not how things should look like.
If you look on the logged snmp traps some MIB was used, because it has to. A SNMPv2-MIB.mib was used and so all web admin traps are "SNMPv2-MIB::coldStart" (OID 1.3.6.1.6.3.1.1.5.1) which means 1 iso 3 identified-organization, org, iso-identified-organization 6 dod 1 internet 6 snmpV2 3 snmpModules 1 snmpAlarmNextIndex, snmpMIB 1 snmpMIBObjects 5 snmpTraps 1 coldStart DESCRIPTION "A coldStart trap signifies that the SNMP entity, supporting a notification originator application, is reinitializing itself and that its configuration may have been altered." Should not be OID something under '1.3.6.1.4.1.2312'? 1 iso 3 identified-organization, org, iso-identified-organization 6 dod 1 internet 4 private 1 enterprise, enterprises 2312 RedHat Software For upcoming release we could choose some OID from existing ones, it's clear we should choose some more proper. It still will not be good as all traps will have the same OID, which should not. Because then it's not easy to process them. But at least it will make more sense than it has now. I admit I am not sure what exactly SNMPv2-SMI::private.2312.19.1.0 means however if 2312 here stands for 'RedHat Software' than it should be 'enterprises.2312' not 'private.2312'.
Proposing to address this in a future release. Moving this out
This is something which incurs huge development effort. Looking at current scope, I don't see this getting implemented in the near future. Closing the Bz