Bug 245787

Summary: Piranha send_arp utility generates incorrectly-formed gratuitous ARP's (patch to fix attached)
Product: [Retired] Red Hat Cluster Suite Reporter: Vince Worthington <vincew>
Component: piranhaAssignee: Marek Grac <mgrac>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0794 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-25 19:08:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch to make gratuitous ARP's generated by send_arp RFC2002-compliant
none
small tcpdump from pulse service startup containing example bad gratuitous arp pkts none

Description Vince Worthington 2007-06-26 18:00:21 UTC
Description of problem:
The customer has observed a problem with their routers and layer 3 switches
updating their arp tables when a failover event occurs (ie the active Piranha
router fails over and the backup takes over).

In attempt to troubleshoot this themselves, they collected tcpdumps and noticed
that the gratuitous arps for the virtual IP's the backup Piranha node is "taking
over" did not seem to be formed correctly, and they believed this to be the
reason the routers and other network equipment were not updating their arp
tables when the backup Piranha LVS router takes over the load balancing.

At the core of this issue seems to be a problem with Piranha generating
gratuitous arps which are formed in an RFC-compliant manner.  I examined RFC
2002 section 4.6, which seems to be the defining document for gratuitous arp
behavior -- I wouldn't doubt this being defined prior to 1996, I just couldn't
find an earlier ARP-related RFC that defined gratuitous arp behavior.  Anyway, I
compared what is defined in this RFC to the arp traffic in the tcpdump provided
by the customer.

RFC 2002 section 4.6 says:



      -  A Gratuitous ARP [23] is an ARP packet sent by a node in order to
         spontaneously cause other nodes to update an entry in their ARP
         cache.  A gratuitous ARP MAY use either an ARP Request or an ARP
         Reply packet.  In either case, the ARP Sender Protocol Address
         and ARP Target Protocol Address are both set to the IP address
         of the cache entry to be updated, and the ARP Sender Hardware
         Address is set to the link-layer address to which this cache
         entry should be updated.  When using an ARP Reply packet, the
         Target Hardware Address is also set to the link-layer address to
         which this cache entry should be updated (this field is not used
         in an ARP Request packet).

         In either case, for a gratuitous ARP, the ARP packet MUST be
         transmitted as a local broadcast packet on the local link.  As
         specified in [16], any node receiving any ARP packet (Request or
         Reply) MUST update its local ARP cache with the Sender Protocol
         and Hardware Addresses in the ARP packet, if the receiving node
         has an entry for that IP address already in its ARP cache.  This
         requirement in the ARP protocol applies even for ARP Request
         packets, and for ARP Reply packets that do not match any ARP
         Request transmitted by the receiving node [16].

---

The packets that send_arp (tool called by pulse) is sending to gratuitously arp
the network on service startup or failover does not comply with the RFC
requirements indicated above.

As a result, the upstream switches and routers on the network are not updating
their arp caches, leading to an interruption of services on the virtual IP('s).

Patch which I believe corrects problem is attached, however the original
customer reporting the problem never bothered to test the test packages, and
their IT closed.  tcpdumps taken with patch applied *appear* to indicate
correctly-formed gratuitous ARP's (although wireshark only seems to label
arp-request-style gratuitous arp packets as gratuitous arp's).  Someone may wish
to double-check to make sure I have the right idea.

Version-Release number of selected component (if applicable):
Piranha (all versions 0.7.x - 0.8.4)

How reproducible:
ALWAYS

Steps to Reproduce:
1. start tcpdump
2. start pulse service (or fail from primary to backup, vice-versa)
3. inspect gratuitous arps sent by send_arp util called by pulse
  
Actual results:
Incorrectly-formed gratuitous arp packets that don't convince strictly
RFC-compliant routers and hosts to update their ARP cache

Expected results:
piranha should be able to correctly update ARP cache on other hosts

Additional info:
The attached .pcap file is a packet capture I generated from a test standalone
Piranha system.  The odd-numbered frames between frame 13 and 21 are the 5
"gratuitous arps" pulse/send_arp generated as the pulse daemon started and
attempted to gratuitously arp the network.

Based on what RFC2002 says (cited above), the following are problems with the
arp packets being sent:

1. Both the ARP sender and ARP target protocol addresses are to be set to the IP
address of the ARP cache entry to be updated.  The packets sent have the correct
sender IP address (192.168.1.250), but the target IP is incorrectly set to the
broadcast address of the local IP subnet. (192.168.1.255)

2. When an ARP Reply type packet is used, the sender AND target hardware
addresses MUST be set to the MAC address of the ARP cache entry to be updated. 
The packets sent by pulse/send_arp contain the correct ARP sender MAC address,
but set the ARP target MAC address to the broadcast address (FF:FF:FF:FF:FF:FF).

Therefore any hosts or layer 3+ switches on the network which are "picky" about
RFC compliance could not be expected to update their local ARP cache tables from
these gratuitous arps.

The arping utility from iputils package sends correctly-formed gratuitous arps,
although they are arp-request-style packets.  (RFC2002 defines both arp request
and arp response style gratuitous arp packets).

--vince

Comment 1 Vince Worthington 2007-06-26 18:00:21 UTC
Created attachment 157925 [details]
Patch to make gratuitous ARP's generated by send_arp RFC2002-compliant

Comment 2 Vince Worthington 2007-06-26 18:06:34 UTC
Created attachment 157928 [details]
small tcpdump from pulse service startup containing example bad gratuitous arp pkts

Comment 4 Marek Grac 2007-07-17 14:27:55 UTC
Patch is in CVS

Comment 10 errata-xmlrpc 2008-07-25 19:08:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0794.html