Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1380405 - send_arp usage() needs update to reflect send_arp.libnet compatibility options
send_arp usage() needs update to reflect send_arp.libnet compatibility options
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents (Show other bugs)
7.2
Unspecified Unspecified
unspecified Severity unspecified
: pre-dev-freeze
: 7.4
Assigned To: Oyvind Albrigtsen
cluster-qe@redhat.com
:
Depends On:
Blocks: 1394959
  Show dependency treegraph
 
Reported: 2016-09-29 09:50 EDT by Vagner Farias
Modified: 2017-08-01 10:55 EDT (History)
7 users (show)

See Also:
Fixed In Version: resource-agents-3.9.5-94.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 10:55:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1844 normal SHIPPED_LIVE resource-agents bug fix and enhancement update 2017-08-01 13:49:20 EDT

  None (edit)
Description Vagner Farias 2016-09-29 09:50:38 EDT
Description of problem:
IPaddr2 resource agent is using wrong send_arp arguments and this could be the reason why in some environments network switches are not having their arp tables updated.

Looking at /usr/lib/ocf/resource.d/heartbeat/IPaddr2, I could see that:

a) SENDARP=$HA_BIN/send_arp
b) HA_BIN=/usr/libexec/heartbeat
c) arguments to SENDARP:
  . ARGS="-i $OCF_RESKEY_arp_interval -r $OCF_RESKEY_arp_count -p $SENDARPPIDFILE $NIC $OCF_RESKEY_ip auto not_used not_used"
  or
  . ARGS="-i $OCF_RESKEY_arp_interval -r $OCF_RESKEY_arp_count -p $SENDARPPIDFILE $NIC $OCF_RESKEY_ip $MY_MAC not_used not_used"
d) arguments accepted by /usr/libexec/heartbeat/send_arp
Usage: arping [-fqbDUAV] [-c count] [-w timeout] [-I device] [-s source] destination
  -f : quit on first reply
  -q : be quiet
  -b : keep broadcasting, don't go unicast
  -D : duplicate address detection mode
  -U : Unsolicited ARP mode, update your neighbours
  -A : ARP answer mode, update your neighbours
  -V : print version and exit
  -c count : how many packets to send
  -w timeout : how long to wait for a reply
  -I device : which ethernet device to use (eth0)
  -s source : source ip address
  destination : ask for what ip address
e) No other send_arp file in the system
[root@dc01-controller-0 ~]# find / -iname send_arp
/usr/libexec/heartbeat/send_arp

Running send_arp manually, using the arguments above:

[root@dc01-controller-0 heartbeat]# ./send_arp -i 200 -r 5 -p /tmp/arp vlan20 10.3.28.21 auto not_used not_used
ARPING 10.3.28.21 from 10.3.28.21 vlan20
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
[root@dc01-controller-0 heartbeat]# echo $?
0

It's not failing, but it's probably not doing the right thing, cause it's sending ARP requests to itself (10.3.28.21 is local interface address). 

Now, using the expected syntax:

[root@dc01-controller-0 heartbeat]# ./send_arp -c 5 -U -I vlan20 -s 10.3.28.21 10.3.28.1
ARPING 10.3.28.1 from 10.3.28.21 vlan20
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
[root@dc01-controller-0 heartbeat]# echo $?
0

Note that it's sending ARP requests to 10.3.28.1 (which could be switch IP), from the local interface address (10.3.28.21). I'm not getting any replies cause I don't have a switch at the target address.

By the way, IPaddr2 resource agent is using the arguments expected by another tool, which is also part of CluserLabs resource agents, but it seems that we don't ship[1].

1: https://github.com/ClusterLabs/resource-agents/blob/master/tools/send_arp.libnet.c

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-54.el7_2.16.x86_64

How reproducible:
At least in one environment with Cisco Nexus 9000 switches VIP failover failed every time.

It's worth noticing that in this same environment, running the following command would force the switch to update it's arp table:

# arping -U -c 1 -I ${IFACE} -s ${IFACE_IP} ${SWITCH_IP}

Steps to Reproduce:
1. Force VIP failover
2. Check if gratuitous arps are being sent the correct way (tcpdump?)


Actual results:
send_arp is called with wrong arguments and this probably causes wrong gratuitous arp requests.

Expected results:
send_arp should be called with correct arguments.
Comment 3 Andreas Karis 2016-10-01 13:33:56 EDT
Hi,

=============

[root@dc01-controller-0 heartbeat]# ./send_arp -i 200 -r 5 -p /tmp/arp vlan20 10.3.28.21 auto not_used not_used
ARPING 10.3.28.21 from 10.3.28.21 vlan20
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
[root@dc01-controller-0 heartbeat]# echo $?
0

It's not failing, but it's probably not doing the right thing, cause it's sending ARP requests to itself (10.3.28.21 is local interface address). 

==============

It's doing exactly what it should do, it's sending a properly crafted gratuitous ARP packet
https://wiki.wireshark.org/Gratuitous_ARP
~~~
 A gratuitous ARP request is an AddressResolutionProtocol request packet where the source and destination IP are both set to the IP of the machine issuing the packet and the destination MAC is the broadcast address ff:ff:ff:ff:ff:ff.
~~~

As well as in RFC 5944, the official definition
https://tools.ietf.org/html/rfc5944#page-74
~~~
A Gratuitous ARP [45] is an ARP packet sent by a node in order to
      spontaneously cause other nodes to update an entry in their ARP
      cache.  A gratuitous ARP MAY use either an ARP Request or an ARP
      Reply packet.  In either case, the ARP Sender Protocol Address and
      ARP Target Protocol Address are both set to the IP address of the
      cache entry to be updated, and the ARP Sender Hardware Address is
      set to the link-layer address to which this cache entry should be
      updated.  When using an ARP Reply packet, the Target Hardware
      Address is also set to the link-layer address to which this cache
      entry should be updated (this field is not used in an ARP Request
      packet).
~~~

~~~
[root@overcloud-controller-0 ~]# yum install wireshark -y
[root@overcloud-controller-0 ~]# tshark -ivlan905 "arp and ether host f2:67:52:70:09:d1" -O arp & /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /tmp/arp vlan905 10.3.28.21 auto not_used not_used
[1] 31475
ARPING 10.3.28.21 from 10.3.28.21 vlan905
Running as user "root" and group "root". This could be dangerous.
Capturing on 'vlan905'
Frame 1: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0
Ethernet II, Src: f2:67:52:70:09:d1 (f2:67:52:70:09:d1), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request/gratuitous ARP)
    Hardware type: Ethernet (1)
    Protocol type: IP (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    [Is gratuitous: True]
    Sender MAC address: f2:67:52:70:09:d1 (f2:67:52:70:09:d1)
    Sender IP address: 10.3.28.21 (10.3.28.21)
    Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
    Target IP address: 10.3.28.21 (10.3.28.21)
~~~

The reason this is not working is likely that the Cisco device either never receives a broadcast (because it gets blocked somewhere, e.g. disabled ARP flooding, or because the device does not correctly react to the ARP request). In case of Cisco ACI, we/Cisco should perhaps tell customers to enable ARP flooding?

The reason why your approach is working is that you do not send a gratuitous ARP packet, but a normal ARP request with all 1s in the target MAC address, asking the device at 10.3.28.1 to return its MAC address.
~~~
[root@overcloud-controller-0 ~]# tshark -ivlan905 "arp and ether host f2:67:52:70:09:d1" -O arp & /usr/libexec/heartbeat/send_arp -c 5 -U -I vlan905 -s 10.3.28.21 10.3.28.1
[1] 2329
ARPING 10.3.28.1 from 10.3.28.21 vlan905
Running as user "root" and group "root". This could be dangerous.
Capturing on 'vlan905'
Frame 1: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0
Ethernet II, Src: f2:67:52:70:09:d1 (f2:67:52:70:09:d1), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IP (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: f2:67:52:70:09:d1 (f2:67:52:70:09:d1)
    Sender IP address: 10.3.28.21 (10.3.28.21)
    Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
    Target IP address: 10.3.28.1 (10.3.28.1)
~~~

Compare this to a "normal" ARP request
~~~
[root@overcloud-controller-0 ~]# tshark -ivlan905 "arp and ether host f2:67:52:70:09:d1" -O arp & ping 10.0.0.100
[1] 8462
PING 10.0.0.100 (10.0.0.100) 56(84) bytes of data.
Running as user "root" and group "root". This could be dangerous.
Capturing on 'vlan905'
Frame 1: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0
Ethernet II, Src: f2:67:52:70:09:d1 (f2:67:52:70:09:d1), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IP (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: f2:67:52:70:09:d1 (f2:67:52:70:09:d1)
    Sender IP address: 10.0.0.5 (10.0.0.5)
    Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Target IP address: 10.0.0.100 (10.0.0.100)
~~~

According to RFC 826, it doesn't matter if it's all 1s or 0s
https://tools.ietf.org/html/rfc826
~~~
ares_hrd$Ethernet, ar$pro to the protocol type that is being
resolved, ar$hln to 6 (the number of bytes in a 48.bit Ethernet
address), ar$pln to the length of an address in that protocol,
ar$op to ares_op$REQUEST, ar$sha with the 48.bit ethernet address
of itself, ar$spa with the protocol address of itself, and ar$tpa
with the protocol address of the machine that is trying to be
accessed.  It does not set ar$tha to anything in particular,
because it is this value that it is trying to determine.  It
could set ar$tha to the broadcast address for the hardware (all
ones in the case of the 10Mbit Ethernet) if that makes it
convenient for some aspect of the implementation.
~~~

Long story short, the rsource_agent's behavior looks o.k. to me, something else in the network is misbehaving if this is not working.
Comment 4 Vagner Farias 2016-10-01 13:51:06 EDT
I won't argue if the resource agent is sending a correct GARP or not, as I trust your research. I also know that when ACI is used and ARP flooding is enabled it works, although I'm unable to say if this is good practice.

My main concern is this may be working by accident. send_arp is being called with wrong arguments. It expects:

Usage: arping [-fqbDUAV] [-c count] [-w timeout] [-I device] [-s source] destination
  -f : quit on first reply
  -q : be quiet
  -b : keep broadcasting, don't go unicast
  -D : duplicate address detection mode
  -U : Unsolicited ARP mode, update your neighbours
  -A : ARP answer mode, update your neighbours
  -V : print version and exit
  -c count : how many packets to send
  -w timeout : how long to wait for a reply
  -I device : which ethernet device to use (eth0)
  -s source : source ip address
  destination : ask for what ip address

We're sending: -i 200 -r 5 -p /tmp/arp vlan905 10.3.28.21 auto not_used not_used

No match between expected and used options.
Comment 5 Andreas Karis 2016-10-01 14:15:09 EDT
Ah, ok, got it. I think it's just a copy paste of the help file of arping ...

https://github.com/ClusterLabs/resource-agents/blob/ca1e614c6cf9f85fb7341a6086b003735589a3a6/tools/send_arp.linux.c

~~~
void usage(void)
{
	fprintf(stderr,
		"Usage: arping [-fqbDUAV] [-c count] [-w timeout] [-I device] [-s source] destination\n"
		"  -f : quit on first reply\n"
		"  -q : be quiet\n"
		"  -b : keep broadcasting, don't go unicast\n"
		"  -D : duplicate address detection mode\n"
		"  -U : Unsolicited ARP mode, update your neighbours\n"
		"  -A : ARP answer mode, update your neighbours\n"
		"  -V : print version and exit\n"
		"  -c count : how many packets to send\n"
		"  -w timeout : how long to wait for a reply\n"
		"  -I device : which ethernet device to use"
#ifdef DEFAULT_DEVICE_STR
			" (" DEFAULT_DEVICE_STR ")"
#endif
			"\n"
		"  -s source : source ip address\n"
		"  destination : ask for what ip address\n"
		);
	exit(2);
}
~~~

The actual options are here
~~~
while ((ch = getopt(argc, argv, "h?bfDUAqc:w:s:I:Vr:i:p:")) != EOF) {
		switch(ch) {
		case 'b':
			broadcast_only=1;
			break;
		case 'D':
			dad++;
			quit_on_reply=1;
			break;
		case 'U':
			unsolicited++;
			break;
		case 'A':
			advert++;
			unsolicited++;
			break;
		case 'q':
			quiet++;
			break;
		case 'r': /* send_arp.libnet compatibility option */
			hb_mode = 1;
			/* fall-through */
		case 'c':
			count = atoi(optarg);
			break;
		case 'w':
			timeout = atoi(optarg);
			break;
		case 'I':
			device.name = optarg;
			break;
		case 'f':
			quit_on_reply=1;
			break;
		case 's':
			source = optarg;
			break;
		case 'V':
			printf("send_arp utility, based on arping from iputils-%s\n", SNAPSHOT);
			exit(0);
		case 'p':
		case 'i':
		    hb_mode = 1;
		    /* send_arp.libnet compatibility options, ignore */
		    break;
		case 'h':
		case '?':
		default:
			usage();
		}
	}
~~~

Note how '-p' does nothing, and -r does this:
~~~
		case 'r': /* send_arp.libnet compatibility option */
			hb_mode = 1;
			/* fall-through */
~~~

Check then how hb_mode changes the interpretation of arguments
~~~
	if(hb_mode) {
	    /* send_arp.libnet compatibility mode */
	    if (argc - optind != 5) {
		usage();
		return 1;
	    }
	    /*
	     *	argv[optind+1] DEVICE		dc0,eth0:0,hme0:0,
	     *	argv[optind+2] IP		192.168.195.186
	     *	argv[optind+3] MAC ADDR		00a0cc34a878
	     *	argv[optind+4] BROADCAST	192.168.195.186
	     *	argv[optind+5] NETMASK		ffffffffffff
	     */

	    unsolicited = 1;
	    device.name = argv[optind];
	    target = argv[optind+1];

	}
~~~

Note also that although not required, optnd+3 +4 and +5 are not used in the code.

So, this bug should be for the "void usage" method, which needs an update.


Too lazy to check the rest of the code, because this here proves that it nevertheless does what it needs to do
~~~
[root@overcloud-controller-0 ~]# yum install wireshark -y
[root@overcloud-controller-0 ~]# tshark -ivlan905 "arp and ether host f2:67:52:70:09:d1" -O arp & /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /tmp/arp vlan905 10.3.28.21 auto not_used not_used
[1] 31475
ARPING 10.3.28.21 from 10.3.28.21 vlan905
Running as user "root" and group "root". This could be dangerous.
Capturing on 'vlan905'
Frame 1: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 0
Ethernet II, Src: f2:67:52:70:09:d1 (f2:67:52:70:09:d1), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request/gratuitous ARP)
    Hardware type: Ethernet (1)
    Protocol type: IP (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    [Is gratuitous: True]
    Sender MAC address: f2:67:52:70:09:d1 (f2:67:52:70:09:d1)
    Sender IP address: 10.3.28.21 (10.3.28.21)
    Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
    Target IP address: 10.3.28.21 (10.3.28.21)
~~~

But I agree with you that the `-h` option and `void usage` method need a fix.
Comment 6 Vagner Farias 2016-10-03 12:41:15 EDT
Thanks for the comprehensive analysis, Andreas. I'll admit I opened the source code and stopped to read it at the "usage" method. Never thought one would "forget" to update it.

I'll update the summary to reflect the real issue.
Comment 7 Oyvind Albrigtsen 2016-10-21 06:17:29 EDT
There seems to be other issues with it as well.

"The send_arp utility for linux ignores the src_hw_addr, broadcast_ip_addr, and netmask arguments. This results in the utility sending out the wrong mac address when called by IProute2 in clone (clusterip) mode."

https://github.com/ClusterLabs/resource-agents/issues/860
Comment 11 Oyvind Albrigtsen 2017-04-04 07:56:42 EDT
Tested and working patch: https://github.com/ClusterLabs/resource-agents/pull/961
Comment 14 Marian Krcmarik 2017-06-23 19:48:00 EDT
Verified based on comment 13. (resource-agents-3.9.5-105)
Comment 15 errata-xmlrpc 2017-08-01 10:55:11 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1844

Note You need to log in before you can comment on or make changes to this bug.