Bug 521362

Summary: Kernel panic in conntrack with libvirt and iptables
Product: [Fedora] Fedora Reporter: Laurentiu Badea <bugzilla-redhat>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 11CC: ipilcher, itamar, jcm, kernel-maint, tre-bugzilla-redhat, twoerner
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-02-03 08:45:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Screen photo of kernel panic stack dump none

Description Laurentiu Badea 2009-09-05 05:56:42 UTC
Couldn't find a netfilter category so I chose iptables.

Description of problem:

Fedora 11 host OS with one F11 fully virtualized domain (KVM) directly connected to host eth0 via a bridge br0. Both fresh installs.

Once the virtual machine is set to autostart and the host rebooted, the host locks up shortly after bootup (under a minute, faster with a faster CPU), with a message like "BUG: unable to handle kernel NULL pointer dereference at 00000001" (not always same message) followed by a hefty stack dump, sometimes repeated.

If autostart is disabled and the domain is started manually after boot then the system appears to be stable (which is why the install succeeded).

The problem has also _completely gone away_, even with autostart enabled, once I took the bridge traffic off iptables (sysctl.conf) and rebooted. The iptables setting is the only one that mattered but all of them should be off on a server for security anyway, I realize that probably messes the default NATed bridged setup but that's not useful in a server anyway.

net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-arptables = 0 

Version-Release number of selected component (if applicable):
  kernel-PAE-2.6.29.6-217.2.16.fc11.i686
  libvirt-0.6.2-14.fc11.i586
  iptables-1.4.3.1-1.fc11.i586
  bridge-utils-1.2-7.fc11.i586

How reproducible:
Lockup happens in under a minute with the above setup.
On a faster machine (E6400 @2.1GHz) it happens almost as soon as login prompt appears.

Steps to Reproduce:
1. Install Fedora 11 with virtualization
2. Disable NetworkManager
   chkconfig NetworkManager off
   chkconfig network on
3. Create bridge for eth0
   Create /etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
TYPE=Bridge
BOOTPROTO=dhcp
ONBOOT=yes 
   Modify /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
BRIDGE=br0 
   add "-A FORWARD -i br0 -o br0 -j ACCEPT" to /etc/sysconfig/iptables
   set default runlevel to 3 in /etc/inittab (to watch panics)
   reboot
4. Install a fully-virtualized Fedora 11 domain "test", its network device connected to br0 (host eth0).
5. Confirm system is stable while domain "test" is up.
6. virsh autostart test; reboot
7. Within 1min after login prompt appears, the host OS hangs. The kernel
panic is not always the same, sometimes it can't finish writing it.

8. Stop the bridge from sending traffic via netfilter:
   Add to /etc/sysctl.conf
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-arptables = 0 
   virsh autostart test
   reboot 
9. The system does not lock up any more.

Actual results:
Host hangs when virtual domain is set to autostart.

Expected results:
Host should not hang under any circumstance.

Additional info:
  System: Dell Dimension 9200, 
  CPU Intel Core 2 Duo E6300 @1.8GHz, 
  eth0 is Intel 82566DC Gigabit onboard, 
  runlevel 3 in fb text console mode to eliminate xorg video driver.
  The physical network contains a lot of Windows machines that do a lot of broadcast chatter (this is apparently needed because I noticed the e1000 driver is always listed somewhere in the stack dumps).

May be related to bug 501137 but that one does not have enough info be sure.

Comment 1 Laurentiu Badea 2009-09-05 06:00:43 UTC
Created attachment 359859 [details]
Screen photo of kernel panic stack dump

Comment 2 Thomas Woerner 2009-09-07 09:15:46 UTC
This is a kernel problem. Reassigning to kernel.

Comment 3 Chuck Ebbert 2009-09-08 17:29:54 UTC
Please try the latest update, kernel-2.6.30.5-43

Comment 4 Laurentiu Badea 2009-09-09 01:25:40 UTC
Same thing with 2.6.30.5-43, tried both PAE and non-PAE.

Comment 5 Ian Pilcher 2009-09-17 05:38:48 UTC
I think I'm seeing the same bug -- 32-bit F11 guest on 64-bit F11 host.  Everything is fine when guest is started manually, but autostarting causes host kernel panic.

I already have net.bridge.bridge-nf-call-iptables = 0, I'll try setting arptables and ip6tables to 0 also.

Comment 6 Ian Pilcher 2009-09-17 05:59:54 UTC
Just to confirm, setting net.bridge.bridge-nf-call-{arptables,ip6tables} to 0 gets rid of the panic for me as well.

Comment 7 Jon Masters 2010-01-27 02:12:11 UTC
I have these sysctls set (by default, it would seem) on F12 and have a box falling over under very similar circumstances. This isn't fixed.

Comment 8 Jon Masters 2010-02-03 08:45:35 UTC

*** This bug has been marked as a duplicate of bug 533087 ***