Bug 521362

Summary:

Kernel panic in conntrack with libvirt and iptables

Product:

[Fedora] Fedora

Reporter:

Laurentiu Badea <bugzilla-redhat>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED DUPLICATE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

low

Version:

CC:

ipilcher, itamar, jcm, kernel-maint, tre-bugzilla-redhat, twoerner

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2010-02-03 08:45:35 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Screen photo of kernel panic stack dump	none

Description Laurentiu Badea 2009-09-05 05:56:42 UTC

Couldn't find a netfilter category so I chose iptables.

Description of problem:

Fedora 11 host OS with one F11 fully virtualized domain (KVM) directly connected to host eth0 via a bridge br0. Both fresh installs.

Once the virtual machine is set to autostart and the host rebooted, the host locks up shortly after bootup (under a minute, faster with a faster CPU), with a message like "BUG: unable to handle kernel NULL pointer dereference at 00000001" (not always same message) followed by a hefty stack dump, sometimes repeated.

If autostart is disabled and the domain is started manually after boot then the system appears to be stable (which is why the install succeeded).

The problem has also _completely gone away_, even with autostart enabled, once I took the bridge traffic off iptables (sysctl.conf) and rebooted. The iptables setting is the only one that mattered but all of them should be off on a server for security anyway, I realize that probably messes the default NATed bridged setup but that's not useful in a server anyway.

net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-arptables = 0

Version-Release number of selected component (if applicable):
kernel-PAE-2.6.29.6-217.2.16.fc11.i686
libvirt-0.6.2-14.fc11.i586
iptables-1.4.3.1-1.fc11.i586
bridge-utils-1.2-7.fc11.i586

How reproducible:
Lockup happens in under a minute with the above setup.
On a faster machine (E6400 @2.1GHz) it happens almost as soon as login prompt appears.

Steps to Reproduce:
1. Install Fedora 11 with virtualization
2. Disable NetworkManager
chkconfig NetworkManager off
chkconfig network on
3. Create bridge for eth0
Create /etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
TYPE=Bridge
BOOTPROTO=dhcp
ONBOOT=yes
Modify /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
BRIDGE=br0
add "-A FORWARD -i br0 -o br0 -j ACCEPT" to /etc/sysconfig/iptables
set default runlevel to 3 in /etc/inittab (to watch panics)
reboot
4. Install a fully-virtualized Fedora 11 domain "test", its network device connected to br0 (host eth0).
5. Confirm system is stable while domain "test" is up.
6. virsh autostart test; reboot
7. Within 1min after login prompt appears, the host OS hangs. The kernel
panic is not always the same, sometimes it can't finish writing it.

8. Stop the bridge from sending traffic via netfilter:
Add to /etc/sysctl.conf
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-arptables = 0
virsh autostart test
reboot
9. The system does not lock up any more.

Actual results:
Host hangs when virtual domain is set to autostart.

Expected results:
Host should not hang under any circumstance.

Additional info:
System: Dell Dimension 9200,
CPU Intel Core 2 Duo E6300 @1.8GHz,
eth0 is Intel 82566DC Gigabit onboard,
runlevel 3 in fb text console mode to eliminate xorg video driver.
The physical network contains a lot of Windows machines that do a lot of broadcast chatter (this is apparently needed because I noticed the e1000 driver is always listed somewhere in the stack dumps).

May be related to bug 501137 but that one does not have enough info be sure.

Comment 1 Laurentiu Badea 2009-09-05 06:00:43 UTC

Created attachment 359859 [details]
Screen photo of kernel panic stack dump

Comment 2 Thomas Woerner 2009-09-07 09:15:46 UTC

This is a kernel problem. Reassigning to kernel.

Comment 3 Chuck Ebbert 2009-09-08 17:29:54 UTC

Please try the latest update, kernel-2.6.30.5-43

Comment 4 Laurentiu Badea 2009-09-09 01:25:40 UTC

Same thing with 2.6.30.5-43, tried both PAE and non-PAE.

Comment 5 Ian Pilcher 2009-09-17 05:38:48 UTC

I think I'm seeing the same bug -- 32-bit F11 guest on 64-bit F11 host.  Everything is fine when guest is started manually, but autostarting causes host kernel panic.

I already have net.bridge.bridge-nf-call-iptables = 0, I'll try setting arptables and ip6tables to 0 also.

Comment 6 Ian Pilcher 2009-09-17 05:59:54 UTC

Just to confirm, setting net.bridge.bridge-nf-call-{arptables,ip6tables} to 0 gets rid of the panic for me as well.

Comment 7 Jon Masters 2010-01-27 02:12:11 UTC

I have these sysctls set (by default, it would seem) on F12 and have a box falling over under very similar circumstances. This isn't fixed.

Comment 8 Jon Masters 2010-02-03 08:45:35 UTC


*** This bug has been marked as a duplicate of bug 533087 ***