Bug 717000 - Bond interface mode 0 and DHCP inconsistently obtains IP address
Summary: Bond interface mode 0 and DHCP inconsistently obtains IP address
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Veaceslav Falico
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-27 17:29 UTC by Mark Heslin
Modified: 2014-09-30 23:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-07 17:48:11 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Mark Heslin 2011-06-27 17:29:42 UTC
Description of problem:

Bringing up a bond interface with DHCP and bond mode 0 results in the bond 
inconsistently obtaining an IP address. Bond mode 0 consistently comes up
when configured with static IP addressing. Bond mode 1 consistently comes up
for both DHCP and static addressing.

Note: I stumbled across this during the early stages of configuring 
      a RHEL 6.1 3-node HA cluster. The cluster was not yet configured 
      but the behavior has been seen on the public facing bonds/interfaces 
      on all three nodes. The issue can be worked around by simply using
      static IP addressing or setting the bond interface to mode 1. 

Version-Release number of selected component (if applicable):

How reproducible:

Inconsistent behavior. Sometimes fails, sometime succeeds to obtain an IP address.

Steps to Reproduce:

1. Create a bond interface (bond0) with 2 interfaces (eth0, eth1), using dhcp and bond mode 1:

   # cd /etc/sysconfig/network-scripts
   # cat ifcfg-bond0
     DEVICE=bond0
     ONBOOT=yes
     BOOTPROTO=dhcp
     BONDING_OPTS="mode=0 miimon=100"
     USERCTL=no
  
2. Bring up the bond 0 interface (succeeds)

   # ifup bond0

   Determining IP information for bond0... done.
   # ifconfig bond0
   bond0     Link encap:Ethernet  HWaddr 00:17:A4:77:24:3C  
             inet addr:10.16.143.151  Bcast:10.16.143.255  Mask:255.255.248.0
             UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
             RX packets:18 errors:0 dropped:0 overruns:0 frame:0
             TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
             collisions:0 txqueuelen:0 
             RX bytes:2732 (2.6 KiB)  TX bytes:692 (692.0 b)

3. Change bond interface (bond0) to use bond mode 0:

   # cat ifcfg-bond0
     DEVICE=bond0
     ONBOOT=yes
     BOOTPROTO=dhcp
     BONDING_OPTS="mode=0 miimon=100"
     USERCTL=no

4. Bring up the bond 0 interface (fails)

   # ifdown bond0
   # ifup bond0

   Determining IP information for bond0...PING 10.16.143.254 (10.16.143.254) from 10.16.143.151 bond0: 56(84) bytes of data.

   --- 10.16.143.254 ping statistics ---
   4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3001ms
   pipe 3
    failed.

5. Try again (succeeds)

   # ifup bond0
   Device eth1 has different MAC address than expected, ignoring.

   Determining IP information for bond0... done.
   # ifconfig bond0
   bond0     Link encap:Ethernet  HWaddr 00:17:A4:77:24:3C  
             inet addr:10.16.143.151  Bcast:10.16.143.255  Mask:255.255.248.0
             UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
             RX packets:303 errors:0 dropped:0 overruns:0 frame:0
             TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
             collisions:0 txqueuelen:0 
             RX bytes:33174 (32.3 KiB)  TX bytes:6366 (6.2 KiB)


Actual results:

The bond interface inconsistently acquires an IP address.


Expected results:

The bond interface should acquire an IP address consistently each time.


Additional info:

   System Configuration

   Host:     HP ProLiant BL460c G6 (HP BladeSystem)
   OS:       RHEL 6.1
   Network: 

   eth0 -
         |- bond0 (using DHCP, bond mode 0)
   eth1 -

   # cat /etc/sysconfig/network-scripts/ifcfg-eth0
     DEVICE=eth0
     BOOTPROTO=none
     HWADDR=00:17:A4:77:24:3C
     ONBOOT=yes
     MASTER=bond0
     SLAVE=yes
     USERCTL=no

   # cat /etc/sysconfig/network-scripts/ifcfg-eth1
     DEVICE=eth1
     BOOTPROTO=none
     HWADDR=00:17:A4:77:24:3E
     ONBOOT=yes
     MASTER=bond0
     SLAVE=yes
     USERCTL=no
   
   # tail -70 /var/log/messages
   Jun 27 11:36:07 ha-web1 kernel: bonding: bond0: setting mode to active-backup (1).
   Jun 27 11:36:07 ha-web1 kernel: bonding: bond0: Adding slave eth0.
   Jun 27 11:36:07 ha-web1 kernel: bnx2x 0000:02:00.0: firmware: requesting bnx2x/bnx2x-e1h-6.2.9.0.fw
   Jun 27 11:36:07 ha-web1 kernel: bnx2x 0000:02:00.0: eth0: using MSI-X  IRQs: sp 57  fp[0] 59 ... fp[2] 61
   Jun 27 11:36:07 ha-web1 kernel: bnx2x 0000:02:00.0: eth0: NIC Link is Up, 7500 Mbps full duplex, receive & transmit flow control ON
   Jun 27 11:36:07 ha-web1 kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
   Jun 27 11:36:07 ha-web1 kernel: bonding: bond0: making interface eth0 the new active one.
   Jun 27 11:36:07 ha-web1 kernel: bonding: bond0: first active interface up!
   Jun 27 11:36:07 ha-web1 kernel: bonding: bond0: enslaving eth0 as an active interface with an up link.
   Jun 27 11:36:07 ha-web1 kernel: bonding: bond0: Adding slave eth1.
   Jun 27 11:36:07 ha-web1 kernel: bnx2x 0000:02:00.1: firmware: requesting bnx2x/bnx2x-e1h-6.2.9.0.fw
   Jun 27 11:36:07 ha-web1 kernel: bnx2x 0000:02:00.1: eth1: using MSI-X  IRQs: sp 62  fp[0] 64 ... fp[2] 66
   Jun 27 11:36:07 ha-web1 kernel: bnx2x 0000:02:00.1: eth1: NIC Link is Up, 7500 Mbps full duplex, receive & transmit flow control ON
   Jun 27 11:36:07 ha-web1 kernel: bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full.
   Jun 27 11:36:07 ha-web1 kernel: bonding: bond0: enslaving eth1 as a backup interface with an up link.
   Jun 27 11:36:07 ha-web1 dhclient[20904]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x6a990483)
   Jun 27 11:36:12 ha-web1 dhclient[20904]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x6a990483)
   Jun 27 11:36:12 ha-web1 dhclient[20904]: DHCPACK from 10.16.136.1 (xid=0x6a990483)
   Jun 27 11:36:12 ha-web1 NET[20951]: /sbin/dhclient-script : updated /etc/resolv.conf
   Jun 27 11:36:12 ha-web1 dhclient[20904]: bound to 10.16.143.151 -- renewal in 10668 seconds.

   Jun 27 11:37:49 ha-web1 kernel: bonding: bond0: Removing slave eth0
   Jun 27 11:37:49 ha-web1 kernel: bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:17:a4:77:24:3c - is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts.
   Jun 27 11:37:49 ha-web1 kernel: bonding: bond0: releasing active interface eth0
   Jun 27 11:37:49 ha-web1 kernel: bonding: bond0: making interface eth1 the new active one.
   Jun 27 11:37:50 ha-web1 kernel: bonding: bond0: Removing slave eth1
   Jun 27 11:37:50 ha-web1 kernel: bonding: bond0: releasing active interface eth1
   Jun 27 11:37:50 ha-web1 NET[21180]: /sbin/dhclient-script : updated /etc/resolv.conf
   Jun 27 11:37:56 ha-web1 kernel: bonding: bond0: setting mode to balance-rr (0).
   Jun 27 11:37:56 ha-web1 kernel: bonding: bond0: Setting MII monitoring interval to 100.
   Jun 27 11:37:56 ha-web1 kernel: bonding: bond0: Adding slave eth0.
   Jun 27 11:37:56 ha-web1 kernel: bnx2x 0000:02:00.0: firmware: requesting bnx2x/bnx2x-e1h-6.2.9.0.fw
   Jun 27 11:37:56 ha-web1 kernel: bnx2x 0000:02:00.0: eth0: using MSI-X  IRQs: sp 57  fp[0] 59 ... fp[2] 61
   Jun 27 11:37:56 ha-web1 kernel: bnx2x 0000:02:00.0: eth0: NIC Link is Up, 7500 Mbps full duplex, receive & transmit flow control ON
   Jun 27 11:37:56 ha-web1 kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
   Jun 27 11:37:56 ha-web1 kernel: bonding: bond0: enslaving eth0 as an active interface with an up link.
   Jun 27 11:37:56 ha-web1 kernel: bonding: bond0: Adding slave eth1.
   Jun 27 11:37:56 ha-web1 kernel: bnx2x 0000:02:00.1: firmware: requesting bnx2x/bnx2x-e1h-6.2.9.0.fw
   Jun 27 11:37:57 ha-web1 kernel: bnx2x 0000:02:00.1: eth1: using MSI-X  IRQs: sp 62  fp[0] 64 ... fp[2] 66
   Jun 27 11:37:57 ha-web1 kernel: bnx2x 0000:02:00.1: eth1: NIC Link is Up, 7500 Mbps full duplex, receive & transmit flow control ON
   Jun 27 11:37:57 ha-web1 kernel: bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full.
   Jun 27 11:37:57 ha-web1 kernel: bonding: bond0: enslaving eth1 as an active interface with an up link.
   Jun 27 11:37:58 ha-web1 dhclient[21304]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x2250e47e)
Jun 27 11:38:03 ha-web1 dhclient[21304]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x2250e47e)
   Jun 27 11:38:11 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 3 (xid=0x26c61821)
   Jun 27 11:38:14 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 7 (xid=0x26c61821)
   Jun 27 11:38:14 ha-web1 dhclient[21304]: DHCPOFFER from 10.16.136.1
   Jun 27 11:38:14 ha-web1 dhclient[21304]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x26c61821)
   Jun 27 11:38:18 ha-web1 dhclient[21304]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x26c61821)
   Jun 27 11:38:28 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 4 (xid=0x529e9e8c)
   Jun 27 11:38:32 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 7 (xid=0x529e9e8c)
   Jun 27 11:38:39 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 14 (xid=0x529e9e8c)
   Jun 27 11:38:39 ha-web1 dhclient[21304]: DHCPOFFER from 10.16.136.1
   Jun 27 11:38:39 ha-web1 dhclient[21304]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x529e9e8c)
   Jun 27 11:38:45 ha-web1 dhclient[21304]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x529e9e8c)
   Jun 27 11:38:53 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 7 (xid=0x101f19b4)
   Jun 27 11:39:00 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 9 (xid=0x101f19b4)
   Jun 27 11:39:09 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 10 (xid=0x101f19b4)
   Jun 27 11:39:19 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 14 (xid=0x101f19b4)
   Jun 27 11:39:33 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 8 (xid=0x101f19b4)
   Jun 27 11:39:41 ha-web1 dhclient[21304]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 13 (xid=0x101f19b4)
   Jun 27 11:39:54 ha-web1 dhclient[21304]: No DHCPOFFERS received.
   Jun 27 11:39:54 ha-web1 dhclient[21304]: Trying recorded lease 10.16.143.151

   Jun 27 11:53:51 ha-web1 kernel: bonding: bond0: setting mode to balance-rr (0).
   Jun 27 11:53:51 ha-web1 kernel: bonding: bond0: Setting MII monitoring interval to 100.
   Jun 27 11:53:51 ha-web1 dhclient[21421]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x6c453ee5)
   Jun 27 11:53:51 ha-web1 dhclient[21421]: DHCPACK from 10.16.136.1 (xid=0x6c453ee5)
   Jun 27 11:53:52 ha-web1 NET[21468]: /sbin/dhclient-script : updated /etc/resolv.conf
   Jun 27 11:53:52 ha-web1 dhclient[21421]: bound to 10.16.143.151 -- renewal in 8358 seconds.

Comment 2 Mark Heslin 2011-06-27 17:48:16 UTC
Quick follow up: I will file a second bug on the speed/interface detection issue seen the output of /var/log/messages.

Comment 3 Andy Gospodarek 2011-07-01 20:42:39 UTC
The fact that you cannot reliably get an address via DHCP is quite interesting.  RR mode is a funny one because it can cause problems with some switches that do not like to see MAC addresses move too quickly.

Comment 4 RHEL Program Management 2011-10-07 15:38:59 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 6 Mark Heslin 2012-05-02 15:23:37 UTC
>------- Additional Comments from Veaceslav Falico <vfalico>
>I'm trying for several month to reproduce this behaviour on every box that I
>get ahold of, however still I can't hit it. Could you please specify on which
>exactly box/switch/network equipment you've reproduced it? A set up reproducer
>would be ideal.

Ahoj Veaceslav,

My apologies for not detailing more of the configuration here - I had written some of it up on the other bug I filed in parallel to this one:

  https://bugzilla.redhat.com/show_bug.cgi?id=717074

The 3 systems are HP ProLiant BL460C G6 blades and are housed in a HP BladeSystem C7000 chassis. Each blade has Quad Socket, Quad Core (16 cores total) Intel Xeon X5550 2.67GHz CPU, 48 GB memory, 2 x 146 GB SATA drives (mirrored - RAID 1), 2 x QLogic QMH2562 8Gb FC HBA, 8 x Broadcomm NeteXtreme II BCM57711E Gb nics.

The Blades were configured on the BladeSystem chassis with public and private (cluster interconnect) networks (both bonded - 2 nics each). A total of 10Gb of bandwidth is available for both networks.

I think Andy's previous comment is spot-on. I am about to re-deploy this hardware for the development of a new Clustered Samba reference architecture and will also use bonding for this configuration but on RHEL 6.2. If you can wait for a week or two, I'll have this configuration built and can report back here on my findings. Let me know if this is good for you timing wise.

Ciao,

-m

Comment 7 Veaceslav Falico 2012-05-02 15:55:57 UTC
(In reply to comment #6)
> I think Andy's previous comment is spot-on. I am about to re-deploy this
> hardware for the development of a new Clustered Samba reference architecture
> and will also use bonding for this configuration but on RHEL 6.2. If you can
> wait for a week or two, I'll have this configuration built and can report back
> here on my findings. Let me know if this is good for you timing wise.

Yep, it's completely ok, write here/ping me once the systems are available.

Comment 8 Mark Heslin 2012-05-02 17:42:57 UTC
Veaceslav,

Ok - Will do, thanks.

-m

Comment 9 Veaceslav Falico 2014-01-07 17:48:11 UTC
Closing per insufficient data. Feel free to reopen once the reproducer is found.


Note You need to log in before you can comment on or make changes to this bug.