Bug 521362 - Kernel panic in conntrack with libvirt and iptables
Summary: Kernel panic in conntrack with libvirt and iptables
Keywords:
Status: CLOSED DUPLICATE of bug 533087
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 11
Hardware: i686
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-05 05:56 UTC by Laurentiu Badea
Modified: 2010-02-03 08:45 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-02-03 08:45:35 UTC


Attachments (Terms of Use)
Screen photo of kernel panic stack dump (147.60 KB, image/jpeg)
2009-09-05 06:00 UTC, Laurentiu Badea
no flags Details

Description Laurentiu Badea 2009-09-05 05:56:42 UTC
Couldn't find a netfilter category so I chose iptables.

Description of problem:

Fedora 11 host OS with one F11 fully virtualized domain (KVM) directly connected to host eth0 via a bridge br0. Both fresh installs.

Once the virtual machine is set to autostart and the host rebooted, the host locks up shortly after bootup (under a minute, faster with a faster CPU), with a message like "BUG: unable to handle kernel NULL pointer dereference at 00000001" (not always same message) followed by a hefty stack dump, sometimes repeated.

If autostart is disabled and the domain is started manually after boot then the system appears to be stable (which is why the install succeeded).

The problem has also _completely gone away_, even with autostart enabled, once I took the bridge traffic off iptables (sysctl.conf) and rebooted. The iptables setting is the only one that mattered but all of them should be off on a server for security anyway, I realize that probably messes the default NATed bridged setup but that's not useful in a server anyway.

net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-arptables = 0 

Version-Release number of selected component (if applicable):
  kernel-PAE-2.6.29.6-217.2.16.fc11.i686
  libvirt-0.6.2-14.fc11.i586
  iptables-1.4.3.1-1.fc11.i586
  bridge-utils-1.2-7.fc11.i586

How reproducible:
Lockup happens in under a minute with the above setup.
On a faster machine (E6400 @2.1GHz) it happens almost as soon as login prompt appears.

Steps to Reproduce:
1. Install Fedora 11 with virtualization
2. Disable NetworkManager
   chkconfig NetworkManager off
   chkconfig network on
3. Create bridge for eth0
   Create /etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
TYPE=Bridge
BOOTPROTO=dhcp
ONBOOT=yes 
   Modify /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
BRIDGE=br0 
   add "-A FORWARD -i br0 -o br0 -j ACCEPT" to /etc/sysconfig/iptables
   set default runlevel to 3 in /etc/inittab (to watch panics)
   reboot
4. Install a fully-virtualized Fedora 11 domain "test", its network device connected to br0 (host eth0).
5. Confirm system is stable while domain "test" is up.
6. virsh autostart test; reboot
7. Within 1min after login prompt appears, the host OS hangs. The kernel
panic is not always the same, sometimes it can't finish writing it.

8. Stop the bridge from sending traffic via netfilter:
   Add to /etc/sysctl.conf
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-arptables = 0 
   virsh autostart test
   reboot 
9. The system does not lock up any more.

Actual results:
Host hangs when virtual domain is set to autostart.

Expected results:
Host should not hang under any circumstance.

Additional info:
  System: Dell Dimension 9200, 
  CPU Intel Core 2 Duo E6300 @1.8GHz, 
  eth0 is Intel 82566DC Gigabit onboard, 
  runlevel 3 in fb text console mode to eliminate xorg video driver.
  The physical network contains a lot of Windows machines that do a lot of broadcast chatter (this is apparently needed because I noticed the e1000 driver is always listed somewhere in the stack dumps).

May be related to bug 501137 but that one does not have enough info be sure.

Comment 1 Laurentiu Badea 2009-09-05 06:00:43 UTC
Created attachment 359859 [details]
Screen photo of kernel panic stack dump

Comment 2 Thomas Woerner 2009-09-07 09:15:46 UTC
This is a kernel problem. Reassigning to kernel.

Comment 3 Chuck Ebbert 2009-09-08 17:29:54 UTC
Please try the latest update, kernel-2.6.30.5-43

Comment 4 Laurentiu Badea 2009-09-09 01:25:40 UTC
Same thing with 2.6.30.5-43, tried both PAE and non-PAE.

Comment 5 Ian Pilcher 2009-09-17 05:38:48 UTC
I think I'm seeing the same bug -- 32-bit F11 guest on 64-bit F11 host.  Everything is fine when guest is started manually, but autostarting causes host kernel panic.

I already have net.bridge.bridge-nf-call-iptables = 0, I'll try setting arptables and ip6tables to 0 also.

Comment 6 Ian Pilcher 2009-09-17 05:59:54 UTC
Just to confirm, setting net.bridge.bridge-nf-call-{arptables,ip6tables} to 0 gets rid of the panic for me as well.

Comment 7 Jon Masters 2010-01-27 02:12:11 UTC
I have these sysctls set (by default, it would seem) on F12 and have a box falling over under very similar circumstances. This isn't fixed.

Comment 8 Jon Masters 2010-02-03 08:45:35 UTC

*** This bug has been marked as a duplicate of bug 533087 ***


Note You need to log in before you can comment on or make changes to this bug.