Bug 1388118

Summary: snmpd starts before the network stack is online (network.target vs network-online.target)
Product: Red Hat Enterprise Linux 7 Reporter: Sven Hoexter <sven>
Component: net-snmpAssignee: Josef Ridky <jridky>
Status: CLOSED ERRATA QA Contact: David Jež <djez>
Severity: low Docs Contact:
Priority: unspecified    
Version: 7.2CC: djez, mike, msekleta, ovasik, sven
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: net-snmp-5.7.2-44.el7 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-31 19:54:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journalctl Boot Log Including snmptrapd Failure with "debug" none

Description Sven Hoexter 2016-10-24 14:13:17 UTC
Description of problem:
Hi,
since we migrated a bunch of our systems to faster storage we encounter a race condition between the network stack going online and snmpd starting up.
If the snmpd is faster then the network it will hang or not properly bind to
all interfaces (not yet 100% sure, I still try to collect some information).

The caveat here is that we're not using networkmanager but the old
ifcfg script stack. I'm now exprimenting with a systemd drop-in to
depend on network-online.target instead of network.target.
# cat /etc/systemd/system/snmpd.service.d/afternetworkonline.conf 
[Unit]
Wants=network-online.target
After=network-online.target


So from my point of view it might be a good idea for the snmpd unit
file to depend on network-online.target instead of network.target in
general, since it does not work properly if it's starting before the
network stack is online.

Version-Release number of selected component (if applicable):
net-snmp-5.7.2-24.el7_2.1.x86_64

How reproducible:
Store your VMs on very fast storage and reboot a few times and try to query the snmpd from the network afterwards.


Actual results:
snmpd does not answer.


Expected results:
It just works. :)

Comment 2 Michael Chinn 2017-02-07 23:51:38 UTC
We have a similar issue when attempting to limit snmptrapd to a single interface. Based on the logs below, it appears that snmptrapd performs a DNS lookup on all snmpTrapAddr arguments and fails if one of them cannot be resolved. Manually starting snmptrapd with the same configuration from the console works as expected. Removing the snmpTrapAddr configuration option from /etc/snmp/snmptrapd.conf allows the service to start. Adding an "After=network-online.target" drop-in (no "Wants=") works as expected over a half-dozen boots.

We've elected to use the snmptrapd default (all IPv4 interfaces) and limit traffic via the firewall, but others may benefit from delaying snmptrapd startup.

-- Reboot --
Feb 07 17:23:11 anon systemd[1]: Starting Simple Network Management Protocol (SNMP) Trap Daemon....
Feb 07 17:23:11 anon snmptrapd[969]: getaddrinfo("192.168.10.1", NULL, ...): Address family for hostname not supported
Feb 07 17:23:11 anon snmptrapd[969]: couldn't open 192.168.10.1 -- errno 99 ("Cannot assign requested address")
Feb 07 17:23:11 anon systemd[1]: snmptrapd.service: main process exited, code=exited, status=1/FAILURE
Feb 07 17:23:11 anon systemd[1]: Failed to start Simple Network Management Protocol (SNMP) Trap Daemon..
Feb 07 17:23:11 anon systemd[1]: Unit snmptrapd.service entered failed state.
Feb 07 17:23:11 anon systemd[1]: snmptrapd.service failed.
Feb 07 17:24:50 anon systemd[1]: Starting Simple Network Management Protocol (SNMP) Trap Daemon....
Feb 07 17:24:50 anon snmptrapd[1373]: NET-SNMP version 5.7.2
Feb 07 17:24:50 anon systemd[1]: Started Simple Network Management Protocol (SNMP) Trap Daemon..
Feb 07 18:26:05 anon snmptrapd[1373]: 2017-02-07 18:26:05 NET-SNMP version 5.7.2 Stopped.
Feb 07 18:26:05 anon snmptrapd[1373]: Stopping snmptrapd
Feb 07 18:26:05 anon systemd[1]: Stopping Simple Network Management Protocol (SNMP) Trap Daemon....
Feb 07 18:26:06 anon systemd[1]: Stopped Simple Network Management Protocol (SNMP) Trap Daemon..

Comment 5 Josef Ridky 2017-02-08 13:27:26 UTC
May I ask you for debug log from unmodified RHEL 7 system (system without modification described in previous comments) ?

To reach this log, add 'debug' to kernel command line 's/test/ten'

Comment 6 Josef Ridky 2017-02-08 14:22:26 UTC
See comment #5.

Comment 7 Sven Hoexter 2017-02-10 16:10:44 UTC
Hi Josef,
at the moment I fail to reproduce the issue. I know that it was a race condition before we put the workaround in place. I also still have a a journal excerpt in an internal ticket that shows the issue:

Oct 13 12:18:29 voc01 snmpd[628]: NET-SNMP version 5.7.2
Oct 13 12:18:29 voc01 systemd[1]: Started Simple Network Management Protocol (SNMP) Daemon..
Oct 13 12:18:30 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave/: Can't contact LDAP server: Connection refused
Oct 13 12:18:30 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave1/: Can't contact LDAP server: Connection refused
Oct 13 12:18:30 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave2/: Can't contact LDAP server: Connection refused
Oct 13 12:18:30 voc01 nslcd[659]: [8b4567] <group/member="root"> no available LDAP server found, sleeping 1 seconds
Oct 13 12:18:31 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave/: Can't contact LDAP server: Connection refused
Oct 13 12:18:31 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave1/: Can't contact LDAP server: Connection refused
Oct 13 12:18:31 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave2/: Can't contact LDAP server: Connection refused
Oct 13 12:18:31 voc01 nslcd[659]: [8b4567] <group/member="root"> no available LDAP server found, sleeping 1 seconds
Oct 13 12:18:31 voc01 network[627]: Bringing up interface eth0:  [  OK  ]

In case I can reproduce it somehow I try to gather debug output. But maybe Michael Chinn can help out in the meantime.

Comment 8 Michael Chinn 2017-02-10 22:34:28 UTC
Created attachment 1249171 [details]
journalctl Boot Log Including snmptrapd Failure with "debug"

Search for "getaddrinfo" to locate the snmptrapd start failure.

Comment 9 Josef Ridky 2019-01-10 11:55:40 UTC
Move to RHEL-7.8

Comment 15 errata-xmlrpc 2020-03-31 19:54:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1081