Description of problem: This may be related to the response of bug 133144. If so, sorry. /proc/sys/net/ipv5/conf/all/arp_{announce,ignore} are set to 0 by default in all released versions of RHEL. This endorses promiscuous arp replies. Besides being confusing, it can interfere with performance and with failover (IEEE 802.3ad) when the failover is initiated by the network device (versus initiation by the RHEL host). Setting the values to 1 changes the behavior to match the behavior of other operating systems. Version-Release number of selected component (if applicable): Tested with RHEL3 and all errata, RHEL4. How reproducible: Configure a RHEL machine to have multiple links, all on the same LAN segment. Generate arp requests for all the addresses of active interfaces assigned to those links. Observe the replies, containing the correct response, but originating from the wrong MAC address. Steps to Reproduce: 1. 2. 3. Actual results: Promiscuous arp replies. Sometimes it takes a stream of requests to duplicate, but happens frequently under load. Expected results: ARP replies advertising MAC addresses should be seen from the interfaces they belong to. Additional info: Happy to provide tcpdump . . .
This behavior is consistent with the Linux IPV4 stack's adherence to the host based model of IPV4 address instead of the interface based model which is pretty much what you are asking for. Both addressing models are described by the RFCs and are completely valid. This should not be changed and is not a bug. If you want different behavior, the tunables are there for you to tweak, but that does not make them right for everyone as a default.
Okay, that makes sense; thanks for the information! Out of curiosity, what is the motivation for using the host based model versus the interface based model?
In many circumstances, it increases the likelyhood that two hosts can communicate successfully. By definition, if we will only assosciate IP addresses with specific interfaces, there are cases where we will choose not to respond and thus communication would fail.
What does that mean for SO_BINDTODEVICE? (Obviously, it breaks with the host based model; is it appropriate for that to be advertised anywhere?)
Hi David, Thank you for your responses on this bug report. Cisco and Nortel routers silently drop ARP request when the sender protocol address is not in the same subnet as the target protocol address that is bound to the interface where these routers receive the ARP request and hence the Linux sender fails to resolve the hardware address of such routers in such case. Below I will try to describe a scenario in which this issue occurs: A Linux host has two interfaces, eth0 and eth1. IP address 192.168.0.1/24 is bound to eth0 and IP address 10.0.0.1/8 is bound to interface eth1. The default gateway is a Cisco router at IP address 10.0.0.2. No other specific network or host routes are configured on the Linux host. arp_announce is not configured (0). The hardware address of the Cisco router has not been resolved per ARP yet. The Linux host receives a datagram at eth0 with source IP address 172.16.0.1 and destination IP address 192.168.0.1. The Linux host needs to reply to 172.16.0.1 via the default gateway. The source IP address in the reply datagram will be 192.168.0.1 and the destination IP address 172.16.0.1. Because the Linux host has not resolved the hardware address of the Cisco router yet, it will broadcast an ARP request from eth1 with the hardware address of eth1 as sender hardware address, sender protocol address 192.168.0.1 and target protocol address 10.0.0.2. The Cisco router drops this ARP request and does not update its ARP table with an entry that maps IP address 192.168.0.1 to the hardware address of the Linux host's interface eth1, because it does not consider 192.168.0.1 in the same physical network as where the interface with binding 10.0.0.2 is connected to. It received the broadcast ARP request on that interface. Because the Linux host fails to resolve the hardware address of the default gateway, it cannot return its reply datagram to 172.16.0.1 and communication fails. This scenario does not cause any trouble when arp_announce is set to 1 or 2, because the Linux host will then set the sender protocol address in the ARP request to 10.0.0.1 (instead of 192.168.0.1). To help me understand why you want to keep the default setting of arp_announce 0, can you describe a scenario in which ARP will fail when arp_announce is set to 1, please? Thank you very much for your understanding and help, Fons
Hi, I'm also having an issue with these settings. I have a system with 4 network interfaces all on the same network with eth0 holding the gateway. I found that as network interfaces went up and down that arp flux was occurring. When trying to ping eth0 from another network I would notice that eth1 was responding to arp requests for eth0. This was causing the ping to fail. I found the information on ARP flux and made the following changes rp_filter=1, arp_ignore=1, arp_announce=2 This seemed to fix the arp flux issue with eth0 at first but I later noticed that only eth0 could respond to ping requests. Even when the ping came from an interface on the local network subnet. My understanding of the information says that local addresses should still be able to ping an interface with this configuration. Can you explain why this is occurring? BTW. I've found information on the host model (weak and strong) but can't find anything on the interface model. Thanks Aaron