Red Hat Bugzilla – Bug 600319
net-snmp-126.96.36.199-broadcast-response.patch broke answering to non local host on some interfaces
Last modified: 2011-07-21 08:22:27 EDT
Description of problem:
cmsg.ipi.ipi_ifindex = if_index
cmsg.ipi.ipi_ifindex = 0
in mentioned patch (netsnmp_udp_sendto()) pushes kernel to send answer locally (not via gateway) on some interfaces.
Upstream SVN uses cmsg.ipi.ipi_ifindex = 0.
I suggest this change to preserve broadcast answering issue:
< + cmsg.ipi.ipi_ifindex = if_index;
> + cmsg.ipi.ipi_ifindex = 0;
> + cmsg.ipi.ipi_ifindex = if_index;
This thread can describe it better:
Version-Release number of selected component (if applicable):
always (on some particular machines - seems to depends on concrete route table)
Steps to Reproduce:
$ snmpget -v 1 -c public -r 0 10.107.1.1 sysUpTime.0
Timeout: No Response from 10.107.1.1.
14:22:51.155776 IP 10.1.220.105.38585 > 10.107.1.1.snmp: GetRequest(28) .188.8.131.52.184.108.40.206.0
14:22:51.157855 arp who-has 10.1.220.105 tell 10.107.1.1
14:22:52.157253 arp who-has 10.1.220.105 tell 10.107.1.1
14:22:53.157907 arp who-has 10.1.220.105 tell 10.107.1.1
(net-snmp with ipi_ifindex = 0)
$ snmpget -v 1 -c public -r 0 10.107.1.1 sysUpTime.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (891) 0:00:08.91
14:24:15.671491 IP 10.1.220.105.40785 > 10.107.1.1.snmp: GetRequest(28) .220.127.116.11.18.104.22.168.0
14:24:15.671957 IP 10.107.1.1.snmp > 10.1.220.105.40785: GetResponse(30) .22.214.171.124.126.96.36.199.0=891
multi-interface machine, quagga, ospfd, all interfaces responds to SNMP except one on _some_ machines
to be more exact: kernel sends datagram to local link every time the interface chosen via ipi_ifindex differs from interface used for routing to destination - it is unwanted when destination is not on local broadcast domain
I haven't been able to reproduce this bug in a simple virtual test environment with three machines connected together and MASTER rule them all (via ssh):
| | |
| | |
Where 'X' is a bridge, i.e. there are two cables from TEST1 connected to (virtual) bridge where ROUTER is connected. The same for TEST2. TEST1 and TEST2 are in different networks (192.168.101.0/24, 192.168.102.0/24) with ROUTER routing between them (with obvious routing table). All snmpgets from TEST1 to both IP addresses of TEST2 succeeded as expected,. I don't see any weird ARP queries on TEST2 trying to send responses to TEST1 on local link, all responses are correctly routed via ROUTER.
I understand, my environment is 1) very simple and 2) virtual. While I think the second does not matter - the packets look real to the kernel - I need to know how does your environment look like. Would you be able to describe it in full detail? I.e. simplify it to the smallest set of active elements and show me your interface configurations (ifconfig) and routing tables on all of them.
> Upstream SVN uses cmsg.ipi.ipi_ifindex = 0.
No, current SVN trunk uses my patches with ipi_ifindex = if_index/
> This thread can describe it better:
The thread is not much useful there, just that man 7 ip shows wrong information. It was corrected (but it's still a bit misleading - full behaviour, especially re broadcast packets is not described anywhere).
R2 (10.1.0.2) ---- (10.1.0.1) R1 (10.3.0.1) ---- (10.3.0.2) PC
R2 (10.2.0.2) ---- (10.2.0.1) R1
nets are /16, R2 def gw is 10.1.0.1 - PC can't get SNMP response from 10.2.0.2
more configuration info (cut):
inet 10.3.0.2/16 brd 10.3.255.255 scope global eth0
10.3.0.0/16 dev eth0 proto kernel scope link src 10.3.0.2
default via 10.3.0.1 dev eth0
inet 10.1.0.1/16 brd 10.1.255.255 scope global eth1
inet 10.2.0.1/16 brd 10.2.255.255 scope global eth2
inet 10.3.0.1/16 brd 10.3.255.255 scope global eth3
10.1.0.0/16 dev eth1 proto kernel scope link src 10.1.0.1
10.2.0.0/16 dev eth2 proto kernel scope link src 10.2.0.1
10.3.0.0/16 dev eth3 proto kernel scope link src 10.3.0.1
inet 10.1.0.2/16 brd 10.1.255.255 scope global eth1
inet 10.2.0.2/16 brd 10.2.255.255 scope global eth2
10.1.0.0/16 dev eth1 proto kernel scope link src 10.1.0.2
10.2.0.0/16 dev eth2 proto kernel scope link src 10.2.0.2
default via 10.1.0.1 dev eth1
$ snmpget -v1 -cpublic -r0 10.1.0.2 sysUpTime.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (16588) 0:02:45.88
$ snmpget -v1 -cpublic -r0 10.2.0.2 sysUpTime.0
Timeout: No Response from 10.2.0.2.
tshark -i any, :c8 - eth1, :c9 - eth2
0.000000 10.3.0.2 -> 10.1.0.2 SNMP get-request SNMPv2-MIB::sysUpTime.0
0.001073 Intel_e1:68:c8 -> ARP Who has 10.1.0.1? Tell 10.1.0.2
0.001159 Intel_d4:f5:ef -> ARP 10.1.0.1 is at 00:04:23:d4:f5:ef
0.001169 10.1.0.2 -> 10.3.0.2 SNMP get-response SNMPv2-MIB::sysUpTime.0
2.738342 10.3.0.2 -> 10.2.0.2 SNMP get-request SNMPv2-MIB::sysUpTime.0
2.740060 Intel_e1:68:c9 -> ARP Who has 10.3.0.2? Tell 10.2.0.2
3.739052 Intel_e1:68:c9 -> ARP Who has 10.3.0.2? Tell 10.2.0.2
4.739049 Intel_e1:68:c9 -> ARP Who has 10.3.0.2? Tell 10.2.0.2
5.739082 10.2.0.2 -> 10.2.0.2 ICMP Destination unreachable (Host unreachable)
> No, current SVN trunk uses my patches with ipi_ifindex = if_index/
ok, I was wrong (checked against branches/V5-4-patches)
> The thread is not much useful there, just that man 7 ip shows wrong
> information. It was corrected (but it's still a bit misleading - full
> behaviour, especially re broadcast packets is not described anywhere).
the thread explains the same experience with kernel - see "particular behaviour"
another zeroing ipi_ifindex patch:
I reproduced it internally with the exact network setup as in comment #3.
sendmsg() with both ipi_spec_dst and ipi_ifindex nozero sends the message only when there is route defined from the ipi_ifindex interface and the packet destination. I'll investigate what can be done, clearing ipi_ifindex is one of the possibilities (but I think I get into troubles responding to broadcast requests, which is required by other customers...)
You can add new default route to R2 as temporary workaround: default via 10.2.0.1 dev eth2
I've checked a fix to upstream SVN, http://net-snmp.svn.sourceforge.net/viewvc/net-snmp?view=revision&revision=19846
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.