Bug 210119

Summary: Kernel panic after loading iptables nat rule and producing traffic over gigabit ethernet card
Product: [Fedora] Fedora Reporter: Egon Kastelijn <redhat2>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5CC: pfrields, twoerner, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-05 21:03:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Egon Kastelijn 2006-10-10 05:30:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.0.7) Gecko/20060913 Fedora/1.5.0.7-1.fc5 Firefox/1.5.0.7 pango-text

Description of problem:
When an iptables NAT rule is loaded (which causes the loading the modules ip_conntrack,xt_state,ipt_MASQUERADE,iptable_nat,ip_nat) and an external connection is made to the machine over another (non-iptables filtered) gigabit ethernet-card, and there is a reasonable amount of traffic from the machine outbound (like issueing 'dmesg' over an SSH connection) then there is an immediate kernel panic.
This happens with multiple types of ethernet cards (intel Pro1000, Realtec, Sky2).
If the firewall NAT rules are unloaded, but the ip_conntrack module(s) are still loaded, then the problem still exists.
If the ip_conntrack module is also unloaded then the problem does not happen anymore.

Version-Release number of selected component (if applicable):
kernel-2.6.17-1.2187_FC5, iptables-1.3.5-1.2

How reproducible:
Always


Steps to Reproduce:
1. Do an FC5 installation on a Pentium D940 Dual Core x86_64 machine with mulitiple gigabit ethernet cards.
2. Load an iptables NAT rule on ethernet card eth1.
3. Establish an SSH connection from an external machine to the x86_64 machine inbound over eth0.
4. Type: 'dmesg'

Actual Results:
Kernel Panic.

Expected Results:
No kernel panic, just the output of dmesg over SSH.

Additional info:
Just typeing 'ls' over SSH does not trigger the problem.
It can also be reproduced by other (more 'heavy') ethernet traffic.

The machine is running with Intel D940 Dual-core 3.2GHz, Asus P5WD2-Premium motherboard, 3x SATA-2 Maxtor 6H500F0 disks, Software RAID5, LVM2, Ext3, 1GB of PC6400 DDR2 memory (800MHz FSB).

# lspci
00:00.0 Host bridge: Intel Corporation Memory Controller Hub (rev c0)
00:01.0 PCI bridge: Intel Corporation PCI Express Graphics Port (rev c0)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controllers cc=AHCI (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller
01:01.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
01:02.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller
01:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
02:00.0 SATA controller: Marvell Technology Group Ltd. Unknown device 6141 (rev 01)
04:00.0 VGA compatible controller: nVidia Corporation GeForce 6200 TurboCache(TM) (rev a1)

Comment 1 Thomas Woerner 2006-10-10 14:11:00 UTC
iptables is a userland config tool. Assigning to kernel.

Comment 2 Egon Kastelijn 2006-10-14 09:18:02 UTC
After some more testing the problem has narrowed down a bit.
It is not the ip_conntrack module that is causing the problem but the
iptable_nat module.

Here some test results:
-----------------------------
Module      | Loaded?       |
-----------------------------
iptable_nat | n | n | n | y |
-----------------------------
ip_nat      | n | n | y | y |
-----------------------------
nfnetlink   | n | y | y | y |
=============================
panic?      | N | N | N | Y |

From these test results can be concluded that the problem can only be reproduced
when the iptable_nat module is loaded.
Tests with mangle, filter and raw table do not result in a Kernel Panic.

The problem is also not present when the SSH connection (with dmesg) is done
over the lo interface.


Comment 3 Dave Jones 2006-10-16 21:34:33 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 4 Egon Kastelijn 2006-10-18 18:41:28 UTC
I am unable to boot the new kernel, because my root filesystem lives on a RAID5
disk.
The mkinitrd does not handle the raid456.ko kernel module correctly, rendering
my initrd useless.
As soon as this problem has been fixed I would be happy to give it another try.

Comment 5 Egon Kastelijn 2006-10-19 05:56:36 UTC
My mkinitrd problem is covered in bug 211030.
I used the workaround in Comment #2 from Dieter Stolte.
This made 2.6.18-1.2200.fc5 bootable for me.

I tested the new 2.6.18-1.2200.fc5 kernel and I am unable to reproduce the
problem now. It looks to me that the problem has been solved!

Great work guys!