Hide Forgot
Description of problem: Hi, since we migrated a bunch of our systems to faster storage we encounter a race condition between the network stack going online and snmpd starting up. If the snmpd is faster then the network it will hang or not properly bind to all interfaces (not yet 100% sure, I still try to collect some information). The caveat here is that we're not using networkmanager but the old ifcfg script stack. I'm now exprimenting with a systemd drop-in to depend on network-online.target instead of network.target. # cat /etc/systemd/system/snmpd.service.d/afternetworkonline.conf [Unit] Wants=network-online.target After=network-online.target So from my point of view it might be a good idea for the snmpd unit file to depend on network-online.target instead of network.target in general, since it does not work properly if it's starting before the network stack is online. Version-Release number of selected component (if applicable): net-snmp-5.7.2-24.el7_2.1.x86_64 How reproducible: Store your VMs on very fast storage and reboot a few times and try to query the snmpd from the network afterwards. Actual results: snmpd does not answer. Expected results: It just works. :)
We have a similar issue when attempting to limit snmptrapd to a single interface. Based on the logs below, it appears that snmptrapd performs a DNS lookup on all snmpTrapAddr arguments and fails if one of them cannot be resolved. Manually starting snmptrapd with the same configuration from the console works as expected. Removing the snmpTrapAddr configuration option from /etc/snmp/snmptrapd.conf allows the service to start. Adding an "After=network-online.target" drop-in (no "Wants=") works as expected over a half-dozen boots. We've elected to use the snmptrapd default (all IPv4 interfaces) and limit traffic via the firewall, but others may benefit from delaying snmptrapd startup. -- Reboot -- Feb 07 17:23:11 anon systemd[1]: Starting Simple Network Management Protocol (SNMP) Trap Daemon.... Feb 07 17:23:11 anon snmptrapd[969]: getaddrinfo("192.168.10.1", NULL, ...): Address family for hostname not supported Feb 07 17:23:11 anon snmptrapd[969]: couldn't open 192.168.10.1 -- errno 99 ("Cannot assign requested address") Feb 07 17:23:11 anon systemd[1]: snmptrapd.service: main process exited, code=exited, status=1/FAILURE Feb 07 17:23:11 anon systemd[1]: Failed to start Simple Network Management Protocol (SNMP) Trap Daemon.. Feb 07 17:23:11 anon systemd[1]: Unit snmptrapd.service entered failed state. Feb 07 17:23:11 anon systemd[1]: snmptrapd.service failed. Feb 07 17:24:50 anon systemd[1]: Starting Simple Network Management Protocol (SNMP) Trap Daemon.... Feb 07 17:24:50 anon snmptrapd[1373]: NET-SNMP version 5.7.2 Feb 07 17:24:50 anon systemd[1]: Started Simple Network Management Protocol (SNMP) Trap Daemon.. Feb 07 18:26:05 anon snmptrapd[1373]: 2017-02-07 18:26:05 NET-SNMP version 5.7.2 Stopped. Feb 07 18:26:05 anon snmptrapd[1373]: Stopping snmptrapd Feb 07 18:26:05 anon systemd[1]: Stopping Simple Network Management Protocol (SNMP) Trap Daemon.... Feb 07 18:26:06 anon systemd[1]: Stopped Simple Network Management Protocol (SNMP) Trap Daemon..
May I ask you for debug log from unmodified RHEL 7 system (system without modification described in previous comments) ? To reach this log, add 'debug' to kernel command line 's/test/ten'
See comment #5.
Hi Josef, at the moment I fail to reproduce the issue. I know that it was a race condition before we put the workaround in place. I also still have a a journal excerpt in an internal ticket that shows the issue: Oct 13 12:18:29 voc01 snmpd[628]: NET-SNMP version 5.7.2 Oct 13 12:18:29 voc01 systemd[1]: Started Simple Network Management Protocol (SNMP) Daemon.. Oct 13 12:18:30 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave/: Can't contact LDAP server: Connection refused Oct 13 12:18:30 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave1/: Can't contact LDAP server: Connection refused Oct 13 12:18:30 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave2/: Can't contact LDAP server: Connection refused Oct 13 12:18:30 voc01 nslcd[659]: [8b4567] <group/member="root"> no available LDAP server found, sleeping 1 seconds Oct 13 12:18:31 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave/: Can't contact LDAP server: Connection refused Oct 13 12:18:31 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave1/: Can't contact LDAP server: Connection refused Oct 13 12:18:31 voc01 nslcd[659]: [8b4567] <group/member="root"> failed to bind to LDAP server ldap://ldap-slave2/: Can't contact LDAP server: Connection refused Oct 13 12:18:31 voc01 nslcd[659]: [8b4567] <group/member="root"> no available LDAP server found, sleeping 1 seconds Oct 13 12:18:31 voc01 network[627]: Bringing up interface eth0: [ OK ] In case I can reproduce it somehow I try to gather debug output. But maybe Michael Chinn can help out in the meantime.
Created attachment 1249171 [details] journalctl Boot Log Including snmptrapd Failure with "debug" Search for "getaddrinfo" to locate the snmptrapd start failure.
Move to RHEL-7.8
Commit: http://pkgs.devel.redhat.com/cgit/rpms/net-snmp/commit/?id=a3d5a4690cbce8b9f064fcce4c6b06579a26407a
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1081