Red Hat Bugzilla – Bug 114355
Bonding have been taking LONG TIME to FAIL OVER!
Last modified: 2007-11-30 17:07:00 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5)
Description of problem:
I have a serious problem about the bonding!
Customer have deployed the bonding to fail over the network connection.
But, the bonding is not perfect to recover a connection after
Step1) eth0 : connected, eth1 : connected
Step2) eth0 : disconnected, eth1: connected
Step3) eth0 : recovering the connection, eth1: connected
==> *the "bond0" showed about 50% packets loss during 20-30secs.*
Why did the "bond0" lose about 50% packets during 20-30sec after
re-connecting the NIC cable?
--- additional story -------------------------------
I have tried to set the network bonding on RHEL3 as below.
And, I made a test for a fault tolerance of network connection.
Step1) eth4: connected eth5: connected
Step2) eth4: disconnected eth5: connected
Step3) eth4: connected eth5: connected
Step4) eth4: connected eth5: disconnected
Step5) eth4: connected eth5: connected
Curious result was shown that some packet loss at S3 and S5 during
about 20-30secs.After recovering the network connection, the packet
loss were taken over.And, any packet loss during 20-30secs will make a
terrible problem to customer who have been tring to deploy RHEL3 into
So, I tried to change mode=0,1,2,3,4,5 and 6. But, I cannot find the
mode not to loss any packets. And, I tried to chage downdelay and
*Finally, I can hard to trust the bonding.*
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.disconnecting the network cable from NIC.
2.re-connecting the network cable to NIC.
3.during recovering the connection, about 50% packets are lost
Thats really strange.
For RHEL3 we (Rik for kernel and myself for userland) made very sure
that the whole ipbonding parts were updated to the latest version.
I have had several very successfull reports of bonding at customers
and even used it myself a couple of times. And the failover usually
goes very fast with default settings (meaning no special parameter
I'll conctact one of our consulting guys who set up a bonding and
failover solution at a customer and let you know what exactly he did.
My only guess is that the drivers used are buggy. So the setup seems
to work fine, but during operation you have problems which also leads
me to belive that it's more of a kernel problem than a userland
problem. Userland isn't involved at all after the setup is done, so
i'm reassigning this bug to kernel.
Read ya, Phil
PS: Please add for the kernel folks the excact hardware you run your
tests on, that might already give them a clue.
Hello, please add the relevant hardware/controller information
to this big report. Thank you.
The Network Controller is Intel PWLA8492MT(Dual Port).
Cisco3750 : Gigabit switch, using RJ-45 connector.
Currenttly, I have found the best mode, active-backup with updelay=20000!
Even though packet loss has happened, the lost was one
or two packet(s).
Other modes are show the long failover time, yet.
I definitely get poor/inconsistent behaviour w/ bonding and this card
using the RHEL3 U1 kernel. Later kernels seem to work a lot better.
Can you recreate this problem using a later kernel? (e.g. RHEL3 U3)
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.3.EL).
An errata has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.