Description of problem: I have 2 systems with 4 NICs each. The NICs are configured into 2 bonds on each system. bond0 is made up of (eth0 & eth2). bond1 is made up of (eth1 & eth3). bond0 is a public ip address. bond1 is a private ip address that serves as an interconnect for an Oracle RAC cluster. If I ping -I eth1 "interconnect address", I get "Destination Host Unreachable" messages. If I then ping -I eth3 "interconnect address", I get "Destination Host Unreachable" messages, followed by the public network connections dropping. I was able to ssh from the 2nd system back into the "down" system accross the private interconnect. However responce is extremely slow or non- responsive to many commands. I have to perform a power down to restart the system and any reasonable amount of time. There are no messages in /var/log/messages to indicate any error. Version-Release number of selected component (if applicable): iputils-20020927-18.EL4.2 How reproducible: 2 for 2. Steps to Reproduce: 1. See above description. 2. 3. Actual results: Expected results: Additional info: # cat /etc/modprobe.conf alias eth0 tg3 alias eth1 tg3 alias eth3 e1000 alias eth2 e1000 alias bond0 bonding options bond0 miimon=100 mode=5 max_bonds=2 alias bond1 bonding options bond1 miimon=100 mode=5 [root@rac01lt log]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 DEVICE=bond0 BOOTPROTO=none IPADDR=xxx.129.234.10 NETMASK=255.255.255.224 ONBOOT=yes [root@rac01lt log]# cat /etc/sysconfig/network-scripts/ifcfg-bond1 DEVICE=bond1 BOOTPROTO=none IPADDR=192.168.234.10 NETMASK=255.255.255.224 ONBOOT=yes [root@rac01lt log]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Broadcom Corporation|NetXtreme BCM5703 Gigabit Ethernet DEVICE=eth0 BOOTPROTO=none HWADDR=00:02:A5:4E:04:7E MASTER=bond0 ONBOOT=yes SLAVE=yes TYPE=Ethernet [root@rac01lt log]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 # Broadcom Corporation|NetXtreme BCM5703 Gigabit Ethernet DEVICE=eth1 BOOTPROTO=none HWADDR=00:02:A5:4E:04:7F MASTER=bond1 ONBOOT=yes SLAVE=yes TYPE=Ethernet [root@rac01lt log]# cat /etc/sysconfig/network-scripts/ifcfg-eth2 # Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) DEVICE=eth2 BOOTPROTO=none HWADDR=00:0E:7F:F1:3E:0D MASTER=bond0 ONBOOT=yes SLAVE=yes TYPE=Ethernet [root@rac01lt log]# cat /etc/sysconfig/network-scripts/ifcfg-eth3 # Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) DEVICE=eth3 BOOTPROTO=none HWADDR=00:0E:7F:F1:3E:0C MASTER=bond1 ONBOOT=yes SLAVE=yes TYPE=Ethernet
Created attachment 302370 [details] sysreport from node 1 (the system that locks up). both systems should be identical.
If a device is in a bond you should never longer use it for direct network communication. Always go through the bond interface. The transmit will probably be fine, but when you receive traffic back on the slave interface the stack will presume bond0 should receive the traffic and will pass the traffic to bond0. Since your socket (from the ping command) is not listening on bond0 the traffic will never be received and passed to the ping command and the command will fail (though a reply is probably received by the kernel). If you perform a ping -I bond0 or ping -I bond1 things should work correctly.
Shouldn't the command be prevented from running against one leg of a bond since it hangs the system?
The command could be prevent setting execution flag to root only because the root knows configuration of system. If the devices are in a bond this should have a reason and therefore the bond should be used. Ping is not the only one sw that uses ICMP echo request ...