Red Hat Bugzilla – Bug 151054
kernel panic when bringing up and down multiple interfaces simultaneously
Last modified: 2007-11-30 17:07:06 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Description of problem:
While attempting to come up with a simple test script to demonstrate
the issue found in Bugzilla bug id 150130, I wrote a simple pair of
scripts to bring all interfaces up and down. To my surprise, the box
panic'ed less than a minute after starting the test script. I have
reproduced the problem a number of times (I tried 5 times and the box
panic'ed 5 times).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Configure a box with a lot of ethernet interfaces (I use 11 e1000
2. Attach interfaces in question to network with dhcp server.
3. Configure all interfaces to start up using dhcp.
4. Run the attached script.
Actual Results: The box will panic.
Expected Results: The box should never panic.
Created attachment 111979 [details]
Script to reproduce failure.
Modify the "master" script to match your configuration. Place both scripts in
the same directory. Run master.
Created attachment 111980 [details]
companion script to "master"
This is the other half of the script pair to reproduce this problem.
Created attachment 111981 [details]
This is the oops output for the 5 oops'es that I have recorded.
David, can this problem be reproduced with an untainted kernel?
Created attachment 112001 [details]
panic with untainted kernel
Yes, the problem occurs with an untainted kernel.
All of the EIP values in the OOPS traces look wrong.
They all point into areas outside of the kernel image
or modules area, and thus the symbols it matches up to
are garbage as well.
Can some x86 guru interpret this or suggest a way to
get more reasonable dump output?
I think I just hit this same bug. I set up a new firewall box the way I have
dozens others but this new one (the first running FC3 770) has panic'd a lot
during my testing. I narrowed it down (I think) to the ifdown/ifup my watchdog
scripts do on the interfaces when they are down/unpingable/unplugged.
As a definitive test, I was reliably able to make the kernel panic by typing
repeatedly: ifdown eth2; ifup eth2
After about 5-15 times as fast as I can hit up-arrow/return, the machine panics.
It did this with cheap SMC NIC's (Linksys-chip driver I think), and my
tried-and-true Realtek 8139 and the onboard r8169. The only NIC that was
constant in the testing was the onboard r8169, for obvious reasons. I've never
had problems with either the SMC or 8139's in the past, and I run them on many
machines. The 8169 is a new one for me, so I can't vouch for it.
The interfaces I am resetting to cause the panic are NOT running dhclient
(dhcp), they have static addresses. So in that sense what I've hit appears
different than David's. Perhaps, David, you can try setting those interfaces
static for a brief test and see if yours still panics? I bet it's not related
to DHCP vs. static.
The machine appears 100% stable if not many ifdown/ifups are done. It's been up
2 days with no panic. If I ifdown/ifup it right now I guarantee it will panic.
Also, the ifdown/ifup count seems to be cumulative over a long period of time.
If my scripts do it once every 10 mins then it will crash the box after 30-60 mins.
The new system I tested is also the first P4 HT box I've done, so I tried
turning off HT, booting with noapic, booting with noacpi, BIOS set to MPS 1.1
and 1.4, running UP instead of SMP, but nothing affected the panic.
I have screenshots of many of the panics I can attach if it looks like this is
the same bug -- otherwise I'll open a new bug. Right now I live in fear of
network problems that will cause my scripts to ifdown/ifup and hang the (remote)
This is definite bug in the kernel. I have reproduced this on 3 different
firewall boxes running FC3 766 and 770, using at least 5 different brand/models
of NIC. I think it's a timing issue. I do not think it is hardware dependent
(panic'd on both i865 and i7205).
On my own main workstation (4 eth interfaces), I can crash it in 20 seconds by
running: ifdown eth1; ifup eth1 (or eth2) repeatedly. Always crashes after 2-12
iterations. The interfaces I tested were NOT running dhclient -- they were
However, if you "sync; ifdown eth1" -- pause -- "sync; ifup eth1", the system
DOES NOT seem to crash.
It appears the ifdown has not completed its entire process before the ifup
starts its thing. The stack trace is interesting.
I do not _think_ that this bug has been in the kernel for long because I'm sure
that my watchdog scripts would have crashed machines before now. It's probably
safe to say that it was not in 2.4, but I can't be sure.
A friend tested this on a box with only 1 NIC (static) and it did NOT crash for
him (FC3 770). It must be dependent on multiple NICs, or something else weird I
am doing like custom iptables scripts, or named/dhcpd/smbd/etc listening on the
I will attach images of the panic screenshots I took with my dig cam.
Created attachment 112205 [details]
panic with HT/SMP (I think) and SMC NICs (I think)
Created attachment 112206 [details]
another panic HT/SMP (I think) and Realtek cards (I think)
Created attachment 112207 [details]
panic with noapic boot (maybe SMP off?)
Created attachment 112208 [details]
panic on 766 UP (others were 770), my workstation
I have done some research, and I believe that comment 8 through comment 13
relate to a different problem. In fact, I believe I have the patch to fix that
Trevor, please open a bug against Fedora Core 3 to cover the issue you are
seeing. Assign it to me if you can, and please post the bug number here for
This bug will remain open to deal with the (strikingly similar, but different)
problem observed on RHEL3.
Created attachment 112232 [details]
more oops output
This file has the oops data from three failures with untainted kernel.
Hopefully, these will work better.
Oh yeah. The interfaces are configured with static IP addresses now, so it's
not dhcp. The box now has 15 interfaces that I am bringing up and down.
For comment 8 through comment 13, see new bug 151874
Hmmm...well, I don't doubt that there is a problem...but the oopses from comment
17, while consistent, don't seem to narrow down the problem. In fact, they just
don't make sense... :-(
I speculate that there is a connection between this and bug 150130, and probably
ug 145959 as well...I'm just not sure what it is yet...
Hmmm...that should be "bug 145959 as well..."
I have no idea if this is related, but I have seen this same configuration hang
hard a number of times (3+), too. Magic sysrq doesn't work. Box doesn't
repond to pings, etc...
Please see bug 150130 comment 9...thanks!
Created attachment 114108 [details]
Oops with latest kernel (2.4.21-32.3.EL.jwltest.22smp)
This took 384 seconds to happen on box with 4 interfaces.
Created attachment 114251 [details]
oops with an even newer kernel (2.4.21-32.3.EL.jwltest.24smp)
Ran for 1456 seconds before failing.
Could you find these lines in /etc/sysconfing/network-scripts/ifup?
# Is there a firewall running, and does it look like one we configured?
if iptables -L -n 2>/dev/null | LC_ALL=C grep -q RH-Lokkit-0-50-INPUT ; then
modprobe -r iptable_filter >/dev/null 2>&1
Once you find them, comment them out (i.e. put a "#" at the beginning of each
of those lines). Then please attempt your test again, and post the results
If the problem persists, please attach a copy of your
modified /etc/sysconfig/network-scripts/ifup to ensure that I told you to do
the right thing... :-)
I commented out the requested lines and my test script is still running (after 2
days, 19 hours and over 41,000 iterations). Looks like a clue :^)
*** Bug 150130 has been marked as a duplicate of this bug. ***
Looks like doing a loop which inserts and removes iptable_filter repeatedly
will trigger the same problem.
iptable_filter depends on ip_tables...doing the loop w/ ip_tables causes the
same problem as well...getting closer?
Looks like most any module will do...loop does it as well...
I've posted some test kernels here:
I no longer seem to be able to recreate the insmod failure when using these
kernels. Would you mind giving them a try and posting the results? Thanks!
Created attachment 115505 [details]
I have retested with kernel version 2.4.21-32.8.EL.jwltest.32smp on i686. It
is working great.
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.10.EL).
Removing dependency of bug 145959 on this one, since the former is against Fedora.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.