Description of problem: Using cmsg.ipi.ipi_ifindex = if_index instead of cmsg.ipi.ipi_ifindex = 0 in mentioned patch (netsnmp_udp_sendto()) pushes kernel to send answer locally (not via gateway) on some interfaces. Upstream SVN uses cmsg.ipi.ipi_ifindex = 0. I suggest this change to preserve broadcast answering issue: 84c84 < + cmsg.ipi.ipi_ifindex = if_index; --- > + cmsg.ipi.ipi_ifindex = 0; 102a103 > + cmsg.ipi.ipi_ifindex = if_index; This thread can describe it better: http://www.cs.helsinki.fi/linux/linux-kernel/2003-07/0711.html Version-Release number of selected component (if applicable): net-snmp-5.3.2.2-9.el5 net-snmp-5.3.2.2-9.el5_5.1 How reproducible: always (on some particular machines - seems to depends on concrete route table) Steps to Reproduce: ? Actual results: $ snmpget -v 1 -c public -r 0 10.107.1.1 sysUpTime.0 Timeout: No Response from 10.107.1.1. 14:22:51.155776 IP 10.1.220.105.38585 > 10.107.1.1.snmp: GetRequest(28) .1.3.6.1.2.1.1.3.0 14:22:51.157855 arp who-has 10.1.220.105 tell 10.107.1.1 14:22:52.157253 arp who-has 10.1.220.105 tell 10.107.1.1 14:22:53.157907 arp who-has 10.1.220.105 tell 10.107.1.1 Expected results: (net-snmp with ipi_ifindex = 0) $ snmpget -v 1 -c public -r 0 10.107.1.1 sysUpTime.0 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (891) 0:00:08.91 14:24:15.671491 IP 10.1.220.105.40785 > 10.107.1.1.snmp: GetRequest(28) .1.3.6.1.2.1.1.3.0 14:24:15.671957 IP 10.107.1.1.snmp > 10.1.220.105.40785: GetResponse(30) .1.3.6.1.2.1.1.3.0=891 Additional info: multi-interface machine, quagga, ospfd, all interfaces responds to SNMP except one on _some_ machines
to be more exact: kernel sends datagram to local link every time the interface chosen via ipi_ifindex differs from interface used for routing to destination - it is unwanted when destination is not on local broadcast domain
I haven't been able to reproduce this bug in a simple virtual test environment with three machines connected together and MASTER rule them all (via ssh): TEST1 -------+ | | | X | | | ROUTER ------X--MASTER | | X | | | | TEST2 -------+ Where 'X' is a bridge, i.e. there are two cables from TEST1 connected to (virtual) bridge where ROUTER is connected. The same for TEST2. TEST1 and TEST2 are in different networks (192.168.101.0/24, 192.168.102.0/24) with ROUTER routing between them (with obvious routing table). All snmpgets from TEST1 to both IP addresses of TEST2 succeeded as expected,. I don't see any weird ARP queries on TEST2 trying to send responses to TEST1 on local link, all responses are correctly routed via ROUTER. I understand, my environment is 1) very simple and 2) virtual. While I think the second does not matter - the packets look real to the kernel - I need to know how does your environment look like. Would you be able to describe it in full detail? I.e. simplify it to the smallest set of active elements and show me your interface configurations (ifconfig) and routing tables on all of them. > Upstream SVN uses cmsg.ipi.ipi_ifindex = 0. No, current SVN trunk uses my patches with ipi_ifindex = if_index/ > This thread can describe it better: > > http://www.cs.helsinki.fi/linux/linux-kernel/2003-07/0711.html The thread is not much useful there, just that man 7 ip shows wrong information. It was corrected (but it's still a bit misleading - full behaviour, especially re broadcast packets is not described anywhere).
R2 (10.1.0.2) ---- (10.1.0.1) R1 (10.3.0.1) ---- (10.3.0.2) PC R2 (10.2.0.2) ---- (10.2.0.1) R1 nets are /16, R2 def gw is 10.1.0.1 - PC can't get SNMP response from 10.2.0.2 more configuration info (cut): PC ip a inet 10.3.0.2/16 brd 10.3.255.255 scope global eth0 ip r 10.3.0.0/16 dev eth0 proto kernel scope link src 10.3.0.2 default via 10.3.0.1 dev eth0 R1 ip a inet 10.1.0.1/16 brd 10.1.255.255 scope global eth1 inet 10.2.0.1/16 brd 10.2.255.255 scope global eth2 inet 10.3.0.1/16 brd 10.3.255.255 scope global eth3 ip r 10.1.0.0/16 dev eth1 proto kernel scope link src 10.1.0.1 10.2.0.0/16 dev eth2 proto kernel scope link src 10.2.0.1 10.3.0.0/16 dev eth3 proto kernel scope link src 10.3.0.1 R2 ip a inet 10.1.0.2/16 brd 10.1.255.255 scope global eth1 inet 10.2.0.2/16 brd 10.2.255.255 scope global eth2 ip r 10.1.0.0/16 dev eth1 proto kernel scope link src 10.1.0.2 10.2.0.0/16 dev eth2 proto kernel scope link src 10.2.0.2 default via 10.1.0.1 dev eth1 results: PC $ snmpget -v1 -cpublic -r0 10.1.0.2 sysUpTime.0 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (16588) 0:02:45.88 $ snmpget -v1 -cpublic -r0 10.2.0.2 sysUpTime.0 Timeout: No Response from 10.2.0.2. R2 tshark -i any, :c8 - eth1, :c9 - eth2 0.000000 10.3.0.2 -> 10.1.0.2 SNMP get-request SNMPv2-MIB::sysUpTime.0 0.001073 Intel_e1:68:c8 -> ARP Who has 10.1.0.1? Tell 10.1.0.2 0.001159 Intel_d4:f5:ef -> ARP 10.1.0.1 is at 00:04:23:d4:f5:ef 0.001169 10.1.0.2 -> 10.3.0.2 SNMP get-response SNMPv2-MIB::sysUpTime.0 2.738342 10.3.0.2 -> 10.2.0.2 SNMP get-request SNMPv2-MIB::sysUpTime.0 2.740060 Intel_e1:68:c9 -> ARP Who has 10.3.0.2? Tell 10.2.0.2 3.739052 Intel_e1:68:c9 -> ARP Who has 10.3.0.2? Tell 10.2.0.2 4.739049 Intel_e1:68:c9 -> ARP Who has 10.3.0.2? Tell 10.2.0.2 5.739082 10.2.0.2 -> 10.2.0.2 ICMP Destination unreachable (Host unreachable) > No, current SVN trunk uses my patches with ipi_ifindex = if_index/ ok, I was wrong (checked against branches/V5-4-patches) > The thread is not much useful there, just that man 7 ip shows wrong > information. It was corrected (but it's still a bit misleading - full > behaviour, especially re broadcast packets is not described anywhere). the thread explains the same experience with kernel - see "particular behaviour" another zeroing ipi_ifindex patch: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=181701
I reproduced it internally with the exact network setup as in comment #3. sendmsg() with both ipi_spec_dst and ipi_ifindex nozero sends the message only when there is route defined from the ipi_ifindex interface and the packet destination. I'll investigate what can be done, clearing ipi_ifindex is one of the possibilities (but I think I get into troubles responding to broadcast requests, which is required by other customers...) You can add new default route to R2 as temporary workaround: default via 10.2.0.1 dev eth2
I've checked a fix to upstream SVN, http://net-snmp.svn.sourceforge.net/viewvc/net-snmp?view=revision&revision=19846
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1076.html