Bug 980254 - fix for unreliable guest->host multicast triggers oops
Summary: fix for unreliable guest->host multicast triggers oops
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 979838 980857 981052 981075 981437 981702 981725 981868 981999 982218 982431 983224 983441 983576 984073 985626 987849 (view as bug list)
Depends On:
Blocks: CVE-2013-4129
TreeView+ depends on / blocked
 
Reported: 2013-07-01 21:28 UTC by James Ralston
Modified: 2013-08-06 19:43 UTC (History)
51 users (show)

Fixed In Version: kernel-3.10.3-300.fc19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 03:45:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description James Ralston 2013-07-01 21:28:19 UTC
Description of problem:

kernel-3.9.8-200.fc18 contains a patch to address bug 880035.

Unfortunately, whenever I shut down a particular KVM virtual guest (Windows 7, paravirt drivers, using hugemem) when the host system is running kernel-3.9.8-200.fc18, the host system throws an oops. I've tried 4 times so far, and the kernel has oopsed every single time.

I can attach a camera picture of the oops screen if requested, but the beginning of the call trace is telling:

Call Trace:
 [<ffffffffa07782e0>] br_multicast_del_pg.isra.20+0x100/0x130 [bridge]
 [<ffffffffa0778c88>] br_multicast_disable_port+0x58/0xc0 [bridge]
 [<ffffffffa0771e79>] br_stp_disable_port+0xa9/0x100 [bridge]
 [<ffffffffa0770668>] br_device_event+0x208/0x210 [bridge]
 [<ffffffff8166661d>] notifier_call_chain+0x3d/070

I'm fairly confident that the fix for bug 880035 is itself broken.

The Windows 7 virtual guest is the only guest I have on this system, so I can't tell if the oops is specific to the guest configuration/OS, or this happens for all virtual guests.

Regardless, kernel-3.9.8-200.fc18 should not be pushed to stable.

Comment 1 Cong Wang 2013-07-02 13:53:34 UTC
Please try this quick fix:
https://bugzilla.redhat.com/show_bug.cgi?id=880035#c53

Comment 2 Josh Boyer 2013-07-03 11:31:08 UTC
*** Bug 980857 has been marked as a duplicate of this bug. ***

Comment 3 Josh Boyer 2013-07-03 13:12:33 UTC
For all of those having trouble with vhost and/or bridging in guests, please try the scratch build below when it completes.  It contains the patch from bug 880035 for the timer fix and the use-after-free fix for vhost-net backported to 3.9.8.

http://koji.fedoraproject.org/koji/taskinfo?taskID=5569247

Comment 4 Josh Boyer 2013-07-03 14:22:42 UTC
Sigh.  Of course, it would help if I didn't typo the patch.  Anyway, here is a scratch build that should actually finish building:

http://koji.fedoraproject.org/koji/taskinfo?taskID=5569571

Comment 5 Josh Boyer 2013-07-03 16:37:01 UTC
Third time is a charm.  This one actually looks like it built.  Sigh, sorry about that.

http://koji.fedoraproject.org/koji/taskinfo?taskID=5569631

Comment 6 Josh Boyer 2013-07-05 12:28:41 UTC
*** Bug 981075 has been marked as a duplicate of this bug. ***

Comment 7 Josh Boyer 2013-07-05 12:45:03 UTC
*** Bug 981437 has been marked as a duplicate of this bug. ***

Comment 8 Josh Boyer 2013-07-05 12:52:42 UTC
*** Bug 981052 has been marked as a duplicate of this bug. ***

Comment 9 Josh Boyer 2013-07-05 13:03:35 UTC
I've applied the patches to F17-F19 now.  Assuming the testing holds, this should be fixed with the next update

Comment 10 Richard Chan 2013-07-05 14:34:56 UTC
Works for me: shutdown Windows 7 VM no longer causes oops on F19.
Using Josh's 3.9.8-300.7.f19.x86_64

Comment 11 Josh Boyer 2013-07-05 15:51:58 UTC
*** Bug 981725 has been marked as a duplicate of this bug. ***

Comment 12 Fedora Update System 2013-07-05 19:04:06 UTC
kernel-3.9.9-201.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.9.9-201.fc18

Comment 13 Michael Hampton 2013-07-05 20:24:30 UTC
Josh Boyer's scratch build of 3.9.8-300.7.fc19 fixes the issue for me as well. However, the 3.9.9-301.fc19 in updates-testing still panics.

Comment 14 Josh Boyer 2013-07-05 20:27:00 UTC
(In reply to Michael Hampton from comment #13)
> Josh Boyer's scratch build of 3.9.8-300.7.fc19 fixes the issue for me as
> well. However, the 3.9.9-301.fc19 in updates-testing still panics.

3.9.9-301 doesn't contain the patches.  They just went into the Fedora git repo this morning.

Comment 15 Fedora Update System 2013-07-07 01:38:46 UTC
Package kernel-3.9.9-201.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.9.9-201.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-12530/kernel-3.9.9-201.fc18
then log in and leave karma (feedback).

Comment 16 Josh Boyer 2013-07-07 20:16:53 UTC
*** Bug 981999 has been marked as a duplicate of this bug. ***

Comment 17 PJ Waskiewicz 2013-07-07 23:49:08 UTC
Bug 981075 was also marked as a duplicate of this and 981999.  However, getting the kernel-3.9.9-301.fc19 (x86_64 here) is still having the same problem outlined in 981075.  It's the same exact backtrace.  It doesn't seem these are the same bug.

Comment 18 Josh Boyer 2013-07-08 00:21:52 UTC
(In reply to PJ Waskiewicz from comment #17)
> Bug 981075 was also marked as a duplicate of this and 981999.  However,
> getting the kernel-3.9.9-301.fc19 (x86_64 here) is still having the same
> problem outlined in 981075.  It's the same exact backtrace.  It doesn't seem
> these are the same bug.

See comment #14.  3.9.9-301 doesn't fix this.

Comment 19 PJ Waskiewicz 2013-07-08 02:01:13 UTC
(In reply to Josh Boyer from comment #18)
> (In reply to PJ Waskiewicz from comment #17)
> > Bug 981075 was also marked as a duplicate of this and 981999.  However,
> > getting the kernel-3.9.9-301.fc19 (x86_64 here) is still having the same
> > problem outlined in 981075.  It's the same exact backtrace.  It doesn't seem
> > these are the same bug.
> 
> See comment #14.  3.9.9-301 doesn't fix this.

I did see that comment, and replied to this because I want to make sure it doesn't get forgotten as it's called a duplicate of this bug.

Comment 20 Josh Boyer 2013-07-08 12:18:04 UTC
*** Bug 982218 has been marked as a duplicate of this bug. ***

Comment 21 Mat Booth 2013-07-08 12:50:47 UTC
Josh Boyer:

Sorry for raising the duplicate bug #982218. I did not see this bug because my search did not include bugs that are "ON_QA" -- is that the correct state for this bug to be in? There does not seem to be any pending updates that fix this bug...

Comment 22 Josh Boyer 2013-07-08 13:03:39 UTC
(In reply to Mat Booth from comment #21)
> Josh Boyer:
> 
> Sorry for raising the duplicate bug #982218. I did not see this bug because
> my search did not include bugs that are "ON_QA" -- is that the correct state
> for this bug to be in? There does not seem to be any pending updates that
> fix this bug...

Yes, because there's an F18 fix pending for it and this was originally reported on F18.  The F19 build is complete in koji, but not filed as an update yet:

http://koji.fedoraproject.org/koji/buildinfo?buildID=431939

Bodhi will leave the usual comments here when it hits F19 updates-testing.

Comment 23 Mat Booth 2013-07-08 13:20:15 UTC
Great stuff, thanks for the info.

Comment 24 John Ellson 2013-07-08 18:04:31 UTC
I've hit by this bug too, during a shutdown of a Windows VM.

This bug reliably damages BTRFS, requiring a btrfs-zero-log (which crashes, but seems to fix the corruption first.)

The related BTRFS bug is BUG #953443

Comment 25 Cole Robinson 2013-07-08 19:46:35 UTC
*** Bug 981868 has been marked as a duplicate of this bug. ***

Comment 26 Cole Robinson 2013-07-08 19:48:36 UTC
*** Bug 981702 has been marked as a duplicate of this bug. ***

Comment 28 Cole Robinson 2013-07-09 12:25:47 UTC
*** Bug 982431 has been marked as a duplicate of this bug. ***

Comment 29 Josh Boyer 2013-07-09 13:43:18 UTC
*** Bug 979838 has been marked as a duplicate of this bug. ***

Comment 30 Charles R. Anderson 2013-07-09 20:08:53 UTC
It crashed again with kernel-3.9.9-302.fc19 from:

http://koji.fedoraproject.org/koji/buildinfo?buildID=431939

I have bridges and a KVM Windows XP guest.  The crash this time happened after restarting the guest with a change of guest video adapter (from Cirrus to VGA), but crashes on the previous 301 kernel happened "spontaneously" with no particular activity with a guest or KVM.

[56183.974489] br0: port 2(vnet0) entered disabled state
[56197.059509] device vnet0 entered promiscuous mode
[56197.070239] br0: port 2(vnet0) entered forwarding state
[56197.075467] br0: port 2(vnet0) entered forwarding state
[56212.128010] br0: port 2(vnet0) entered forwarding state
[56325.754469] ------------[ cut here ]------------
[56325.758051] WARNING: at lib/list_debug.c:33 __list_add+0xac/0xc0()
[56325.758051] Hardware name: OptiPlex 960
[56325.758051] list_add corruption. prev->next should be next (ffff88022c631648), but was ffffffff811c9d60. (prev=ffff8802167ec100).
[56325.758051] Modules linked in: arc4 md4 nls_utf8 cifs dns_resolver fscache tun fuse ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle
 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables bridge bnep bluetooth rfkill stp llc i
p6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_analog snd_hda_intel snd_hda_codec dell_wmi sparse_keymap iTCO_wdt iTCO_vendor_support snd_hwdep ppdev snd_seq snd_seq_device dcdbas snd_pcm mperf coretem
p snd_page_alloc snd_timer snd kvm_intel e1000e kvm tg3 ptp pps_core mei microcode natsemi tulip wmi soundcore i2c_i801 lpc_ich serio_raw mfd_core joydev parport_pc parport uinput dm_crypt usb_storage raid1 radeon
 i2c_algo_bit drm_kms_helper ttm drm ata_generic i2c_core pata_acpi
[56325.758051] Pid: 0, comm: swapper/3 Not tainted 3.9.9-302.fc19.x86_64 #1
[56325.758051] Call Trace:
[56325.758051]  <IRQ>  [<ffffffff81306d00>] ? __list_add+0x30/0xc0
[56325.758051]  [<ffffffff8105cc56>] warn_slowpath_common+0x66/0x80
[56325.758051]  [<ffffffff8105ccbc>] warn_slowpath_fmt+0x4c/0x50
[56325.758051]  [<ffffffff811c9d60>] ? invalidate_bh_lrus+0x30/0x30
[56325.758051]  [<ffffffff81306d7c>] __list_add+0xac/0xc0
[56325.758051]  [<ffffffff8106bed3>] __internal_add_timer+0x113/0x130
[56325.758051]  [<ffffffff8106c527>] internal_add_timer+0x17/0x40
[56325.758051]  [<ffffffff8106d812>] mod_timer+0x102/0x210
[56325.758051]  [<ffffffff8106d938>] add_timer+0x18/0x20
[56325.758051]  [<ffffffff815e9dd1>] addrconf_verify+0x171/0x330
[56325.758051]  [<ffffffff815e69db>] ? inet6_ifa_notify+0xab/0xe0
[56325.758051]  [<ffffffff815eb05b>] addrconf_prefix_rcv+0x23b/0xac0
[56325.758051]  [<ffffffff815468df>] ? neigh_update+0x2bf/0x5c0
[56325.758051]  [<ffffffff815f98a8>] ndisc_rcv+0x7f8/0xf20
[56325.758051]  [<ffffffff81600968>] icmpv6_rcv+0x498/0x720
[56325.758051]  [<ffffffff8156880b>] ? nf_iterate+0x8b/0xa0
[56325.758051]  [<ffffffff81646d75>] ? _raw_read_unlock_bh+0x15/0x20
[56325.758051]  [<ffffffff815e3f31>] ip6_input_finish+0xd1/0x410
[56325.758051]  [<ffffffff815e46d2>] ip6_input+0x22/0x60
[56325.758051]  [<ffffffff815e47d2>] ip6_mc_input+0xc2/0x200
[56325.758051]  [<ffffffff815e3e50>] ip6_rcv_finish+0x80/0x90
[56325.758051]  [<ffffffff815e4523>] ipv6_rcv+0x2b3/0x440
[56325.758051]  [<ffffffff8153b612>] __netif_receive_skb_core+0x622/0x7f0
[56325.758051]  [<ffffffff8153b7f8>] __netif_receive_skb+0x18/0x60
[56325.758051]  [<ffffffff8153b873>] netif_receive_skb+0x33/0xb0
[56325.758051]  [<ffffffffa043a5a7>] br_handle_frame_finish+0x237/0x330 [bridge]
[56325.758051]  [<ffffffffa043a825>] br_handle_frame+0x185/0x270 [bridge]
[56325.758051]  [<ffffffff8153b232>] __netif_receive_skb_core+0x242/0x7f0
[56325.758051]  [<ffffffff8101a300>] ? native_read_tsc+0x20/0x20
[56325.758051]  [<ffffffff8153b7f8>] __netif_receive_skb+0x18/0x60
[56325.758051]  [<ffffffff8153b873>] netif_receive_skb+0x33/0xb0
[56325.758051]  [<ffffffff8153c250>] napi_gro_receive+0x80/0xb0
[56325.758051]  [<ffffffffa0480f93>] e1000_receive_skb+0x73/0xd0 [e1000e]
[56325.758051]  [<ffffffffa048240a>] e1000_clean_rx_irq+0x24a/0x400 [e1000e]
[56325.758051]  [<ffffffffa0489f2d>] e1000e_poll+0x6d/0x310 [e1000e]
[56325.758051]  [<ffffffff8153bbe9>] net_rx_action+0x149/0x240
[56325.758051]  [<ffffffff81065657>] __do_softirq+0xf7/0x240
[56325.758051]  [<ffffffff81065925>] irq_exit+0xa5/0xb0
[56325.758051]  [<ffffffff81650f56>] do_IRQ+0x56/0xc0
[56325.758051]  [<ffffffff816471ad>] common_interrupt+0x6d/0x6d
[56325.758051]  <EOI>  [<ffffffff810937f8>] ? sched_clock_cpu+0xa8/0x100
[56325.758051]  [<ffffffff81042106>] ? native_safe_halt+0x6/0x10
[56325.758051]  [<ffffffff8101b54a>] default_idle+0x4a/0x100
[56325.758051]  [<ffffffff8101c03f>] cpu_idle+0xef/0x140
[56325.758051]  [<ffffffff81635f49>] start_secondary+0x249/0x24b
[56325.758051] ---[ end trace 2eff8e813e8c988c ]---
[56380.384055] ------------[ cut here ]------------

Comment 31 Charles R. Anderson 2013-07-09 22:31:58 UTC
Happened again with kernel-3.9.9-302.fc19, even when I refrained from starting any KVM guests.

Comment 32 Trapier Marshall 2013-07-09 23:16:13 UTC
(In reply to Josh Boyer from comment #5)
> http://koji.fedoraproject.org/koji/taskinfo?taskID=5569631

kernel-3.9.9-301.fc19.x86_64 crashed ~90% of the time with the stack trace in comment #1 when stopping a corosync (multicast) cluster running on RHEL 6 kvm guests.  On the scratch-build kernel-3.9.8-300.7.fc19.x86_64, the crash did not occur in 10 consecutive cluster stop and start cycles.

Comment 33 Josh Boyer 2013-07-10 02:00:42 UTC
(In reply to Charles R. Anderson from comment #31)
> Happened again with kernel-3.9.9-302.fc19, even when I refrained from
> starting any KVM guests.

It's plausible you're hitting the timer issue with the bridge mdb code.  The patch we have was from bug 880035 and lacks the fix that went into br_mdb.c with this upstream commit:

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1

Cong, is there a way you can tell if that might be the case with the oops in comment #32?

Comment 34 Cong Wang 2013-07-10 02:39:49 UTC
(In reply to Josh Boyer from comment #33)
> (In reply to Charles R. Anderson from comment #31)
> > Happened again with kernel-3.9.9-302.fc19, even when I refrained from
> > starting any KVM guests.
> 
> It's plausible you're hitting the timer issue with the bridge mdb code.  The
> patch we have was from bug 880035 and lacks the fix that went into br_mdb.c
> with this upstream commit:
> 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/
> ?id=c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1
> 
> Cong, is there a way you can tell if that might be the case with the oops in
> comment #32?

I assume you mean comment #30.

The crash in comment #30 is related with another timer, not the one I fixed, looking at the code, it is:

net/ipv6/addrconf.c:

        add_timer(&addr_chk_timer);

Comment 35 Cong Wang 2013-07-10 02:42:39 UTC
(In reply to Cong Wang from comment #34)
> 
> The crash in comment #30 is related with another timer, not the one I fixed,
> looking at the code, it is:
> 
> net/ipv6/addrconf.c:
> 
>         add_timer(&addr_chk_timer);


And here is a quick fix:

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index cfdcf7b..002ef92 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5255,6 +5255,8 @@ void addrconf_cleanup(void)
                WARN_ON(!hlist_empty(&inet6_addr_lst[i]));
        spin_unlock_bh(&addrconf_hash_lock);
 
+       spin_lock(&addrconf_verify_lock);
        del_timer(&addr_chk_timer);
+       spin_unlock(&addrconf_verify_lock);
        rtnl_unlock();
 }

Please give it a try?

Comment 36 Josh Boyer 2013-07-10 19:06:00 UTC
*** Bug 983224 has been marked as a duplicate of this bug. ***

Comment 37 Matt Ford 2013-07-11 08:56:29 UTC
The latest kernel, 302, fixed this hang on shutdown issue for me.  Many thanks.

Comment 38 Izhar Firdaus 2013-07-11 09:07:05 UTC
on -302 too .. havent crashed so far ..

Comment 39 Josh Boyer 2013-07-11 12:47:07 UTC
*** Bug 983441 has been marked as a duplicate of this bug. ***

Comment 40 Josh Boyer 2013-07-11 14:51:18 UTC
*** Bug 983576 has been marked as a duplicate of this bug. ***

Comment 41 galens 2013-07-11 20:31:36 UTC
Description of problem:
Unknown.

I was Installing a software package in a windows XP 64 bit VM (playon.tv's application, using kvm and virt-manager), had chrome open and connected to pandora (but paused), and had firefox open and browsing to a number of different websites.  

These crashes seemed to start with the 3.9 kernels; I don't recall any similar ones on the 3.8 or earlier kernels, but it has been happening several times a day since.  

Version-Release number of selected component:
kernel

Additional info:
reporter:       libreport-2.1.5
cmdline:        BOOT_IMAGE=/vmlinuz-3.9.9-301.fc19.x86_64 root=/dev/mapper/vg_blackcompany-bc_root ro rd.md=0 rd.dm=0 rd.lvm.lv=vg_blackcompany/bc_root rd.luks=0 vconsole.keymap=us
kernel:         3.9.9-301.fc19.x86_64
runlevel:       N 5
type:           Kerneloops

Truncated backtrace:
WARNING: at lib/list_debug.c:33 __list_add+0xac/0xc0()
Hardware name:         
list_add corruption. prev->next should be next (ffff880129b75758), but was           (null). (prev=ffff880100ccfc40).
Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables tun bridge stp llc dm_service_time snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm usb_storage r8169 iTCO_wdt iTCO_vendor_support acpi_cpufreq mperf coretemp kvm_intel kvm usblp snd_page_alloc snd_timer snd soundcore lpc_ich mfd_core i2c_i801 mii microcode binfmt_misc dm_multipath nouveau mxm_wmi wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core video [last unloaded: iptable_mangle]
Pid: 1851, comm: firefox Not tainted 3.9.9-301.fc19.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81306d00>] ? __list_add+0x30/0xc0
 [<ffffffff8105cc56>] warn_slowpath_common+0x66/0x80
 [<ffffffff8105ccbc>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff81571324>] ? ip_rcv_finish+0x184/0x320
 [<ffffffff81306d7c>] __list_add+0xac/0xc0
 [<ffffffff8106bed3>] __internal_add_timer+0x113/0x130
 [<ffffffff8106c527>] internal_add_timer+0x17/0x40
 [<ffffffff8106d812>] mod_timer+0x102/0x210
 [<ffffffffa032b21b>] br_multicast_rcv+0x86b/0x1220 [bridge]
 [<ffffffffa0321370>] ? br_handle_local_finish+0x60/0x60 [bridge]
 [<ffffffffa03215ba>] br_handle_frame_finish+0x24a/0x330 [bridge]
 [<ffffffffa0321825>] br_handle_frame+0x185/0x270 [bridge]
 [<ffffffff8153b232>] __netif_receive_skb_core+0x242/0x7f0
 [<ffffffff8101a300>] ? native_read_tsc+0x20/0x20
 [<ffffffff8153b7f8>] __netif_receive_skb+0x18/0x60
 [<ffffffff8153b873>] netif_receive_skb+0x33/0xb0
 [<ffffffff8153c250>] napi_gro_receive+0x80/0xb0
 [<ffffffffa028cc8e>] rtl8169_poll+0x15e/0x658 [r8169]
 [<ffffffff8153bbe9>] net_rx_action+0x149/0x240
 [<ffffffff81065657>] __do_softirq+0xf7/0x240
 [<ffffffff81065925>] irq_exit+0xa5/0xb0
 [<ffffffff81650f56>] do_IRQ+0x56/0xc0
 [<ffffffff816471ad>] common_interrupt+0x6d/0x6d
 <EOI>

Comment 42 Fedora Update System 2013-07-11 22:15:59 UTC
kernel-3.9.9-302.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.9.9-302.fc19

Comment 43 Charles R. Anderson 2013-07-12 00:33:24 UTC
(In reply to Cong Wang from comment #35)
> (In reply to Cong Wang from comment #34)
> > 
> > The crash in comment #30 is related with another timer, not the one I fixed,
> > looking at the code, it is:
> > 
> > net/ipv6/addrconf.c:
> > 
> >         add_timer(&addr_chk_timer);
> 
> 
> And here is a quick fix:
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index cfdcf7b..002ef92 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -5255,6 +5255,8 @@ void addrconf_cleanup(void)
>                 WARN_ON(!hlist_empty(&inet6_addr_lst[i]));
>         spin_unlock_bh(&addrconf_hash_lock);
>  
> +       spin_lock(&addrconf_verify_lock);
>         del_timer(&addr_chk_timer);
> +       spin_unlock(&addrconf_verify_lock);
>         rtnl_unlock();
>  }
> 
> Please give it a try?

I've been running for 30 minutes with this patch applied, bridges re-enabled, and KVM guest started.  It's been fine so far, but let's see how it goes over the next few days.  Thanks.

Comment 44 Fedora Update System 2013-07-12 03:09:59 UTC
kernel-3.9.9-201.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 45 Joshua Rosen 2013-07-12 11:04:11 UTC
When is it going to be pushed to F19?

Comment 46 Josh Boyer 2013-07-12 13:26:19 UTC
(In reply to Joshua Rosen from comment #45)
> When is it going to be pushed to F19?

https://admin.fedoraproject.org/updates/kernel-3.9.9-302.fc19

Comment 47 Charles R. Anderson 2013-07-12 15:08:29 UTC
(In reply to Charles R. Anderson from comment #43)
> (In reply to Cong Wang from comment #35)
> > (In reply to Cong Wang from comment #34)
> > > 
> > > The crash in comment #30 is related with another timer, not the one I fixed,
> > > looking at the code, it is:
> > > 
> > > net/ipv6/addrconf.c:
> > > 
> > >         add_timer(&addr_chk_timer);
> > 
> > 
> > And here is a quick fix:
> > 
> > diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> > index cfdcf7b..002ef92 100644
> > --- a/net/ipv6/addrconf.c
> > +++ b/net/ipv6/addrconf.c
> > @@ -5255,6 +5255,8 @@ void addrconf_cleanup(void)
> >                 WARN_ON(!hlist_empty(&inet6_addr_lst[i]));
> >         spin_unlock_bh(&addrconf_hash_lock);
> >  
> > +       spin_lock(&addrconf_verify_lock);
> >         del_timer(&addr_chk_timer);
> > +       spin_unlock(&addrconf_verify_lock);
> >         rtnl_unlock();
> >  }
> > 
> > Please give it a try?
> 
> I've been running for 30 minutes with this patch applied, bridges
> re-enabled, and KVM guest started.  It's been fine so far, but let's see how
> it goes over the next few days.  Thanks.

No good, still crashed with 302 kernel + above patch.  There is no IPv6 mentioned in this backtrace, so I think there are still problems in the bridge code.

[26852.239898] ------------[ cut here ]------------
[26852.242163] WARNING: at lib/list_debug.c:33 __list_add+0xac/0xc0()
[26852.242163] Hardware name: OptiPlex 960
[26852.242163] list_add corruption. prev->next should be next (ffff88022c631668), but was           (null). (prev=ffff88022ba751c0).
[26852.242163] Modules linked in: arc4 md4 nls_utf8 cifs dns_resolver fscache tun bnep bluetooth rfkill fuse ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables bridge ip6table_filter ip6_tables stp llc snd_hda_codec_hdmi snd_hda_codec_analog iTCO_wdt iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm mperf dell_wmi coretemp sparse_keymap e1000e kvm_intel tg3 snd_page_alloc ptp mei pps_core kvm dcdbas natsemi lpc_ich i2c_i801 tulip mfd_core snd_timer snd microcode serio_raw joydev soundcore parport_pc parport wmi uinput dm_crypt raid1 usb_storage radeon i2c_algo_bit drm_kms_helper ttm drm ata_generic i2c_core pata_acpi
[26852.242163] Pid: 0, comm: swapper/3 Not tainted 3.9.9-302.bz980254c35.fc19.x86_64 #1
[26852.242163] Call Trace:
[26852.242163]  <IRQ>  [<ffffffff81306d00>] ? __list_add+0x30/0xc0
[26852.242163]  [<ffffffff8105cc56>] warn_slowpath_common+0x66/0x80
[26852.242163]  [<ffffffff8105ccbc>] warn_slowpath_fmt+0x4c/0x50
[26852.242163]  [<ffffffff81571324>] ? ip_rcv_finish+0x184/0x320
[26852.242163]  [<ffffffff81306d7c>] __list_add+0xac/0xc0
[26852.242163]  [<ffffffff8106bed3>] __internal_add_timer+0x113/0x130
[26852.242163]  [<ffffffff8106c527>] internal_add_timer+0x17/0x40
[26852.242163]  [<ffffffff8106d812>] mod_timer+0x102/0x210
[26852.242163]  [<ffffffffa03fa22b>] br_multicast_rcv+0x86b/0x1220 [bridge]
[26852.242163]  [<ffffffffa03f0370>] ? br_handle_local_finish+0x60/0x60 [bridge]
[26852.242163]  [<ffffffffa03f05ba>] br_handle_frame_finish+0x24a/0x330 [bridge]
[26852.242163]  [<ffffffffa03f0825>] br_handle_frame+0x185/0x270 [bridge]
[26852.242163]  [<ffffffff8153b232>] __netif_receive_skb_core+0x242/0x7f0
[26852.242163]  [<ffffffff8101a300>] ? native_read_tsc+0x20/0x20
[26852.242163]  [<ffffffff8153b7f8>] __netif_receive_skb+0x18/0x60
[26852.242163]  [<ffffffff8153b873>] netif_receive_skb+0x33/0xb0
[26852.242163]  [<ffffffff8153c250>] napi_gro_receive+0x80/0xb0
[26852.242163]  [<ffffffffa038ff93>] e1000_receive_skb+0x73/0xd0 [e1000e]
[26852.242163]  [<ffffffffa039140a>] e1000_clean_rx_irq+0x24a/0x400 [e1000e]
[26852.242163]  [<ffffffffa0398f2d>] e1000e_poll+0x6d/0x310 [e1000e]
[26852.242163]  [<ffffffff8153bbe9>] net_rx_action+0x149/0x240
[26852.242163]  [<ffffffff81065657>] __do_softirq+0xf7/0x240
[26852.242163]  [<ffffffff81065925>] irq_exit+0xa5/0xb0
[26852.242163]  [<ffffffff81650f56>] do_IRQ+0x56/0xc0
[26852.242163]  [<ffffffff816471ad>] common_interrupt+0x6d/0x6d
[26852.242163]  <EOI>  [<ffffffff81042106>] ? native_safe_halt+0x6/0x10
[26852.242163]  [<ffffffff8101b54a>] default_idle+0x4a/0x100
[26852.242163]  [<ffffffff8101c03f>] cpu_idle+0xef/0x140
[26852.242163]  [<ffffffff81635f59>] start_secondary+0x249/0x24b
[26852.242163] ---[ end trace cf3e357e4da807c3 ]---
[26852.242163] ------------[ cut here ]------------

Comment 48 Josh Boyer 2013-07-12 18:40:32 UTC
*** Bug 984073 has been marked as a duplicate of this bug. ***

Comment 49 galens 2013-07-12 19:35:47 UTC
I'm also getting the crash, still, in fc19 with the 302 Kernel (but not the patch; although I've also never seen a ipv6 mentioned in the backtrace).

What data would be useful to collect at this point?

Comment 50 Fedora Update System 2013-07-13 01:53:05 UTC
Package kernel-3.9.9-302.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.9.9-302.fc19'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-12901/kernel-3.9.9-302.fc19
then log in and leave karma (feedback).

Comment 51 verdoux 2013-07-13 11:50:00 UTC
I still have the problem with 3.9.9-201.fc18

Comment 52 Fedora Update System 2013-07-14 03:30:08 UTC
kernel-3.9.9-302.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 53 Fedora Update System 2013-07-14 11:23:39 UTC
kernel-3.9.10-100.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.9.10-100.fc17

Comment 54 galens 2013-07-14 18:36:13 UTC
Making a note that the issue still exists with 3.9.9-302.fc19 ; I got hit with it overnight.

Given that my guest VM is primarily for windows-only media-serving software, this bug is making the VM useless.

Comment 55 Torbjorn Jansson 2013-07-14 22:05:03 UTC
kernel 3.9.9-302.fc19 fixed the issue i had with shutting down kvm guests running windows

Comment 56 Alexander Dyadyun 2013-07-15 02:38:20 UTC
Host: Fedora 19 64bit + qemu-kvm
Network: classic "network" service with network bridging (both IPv4 and IPv6 protocols are in use)
Guests: Win2K8 R2 SP1 + VirtIO hdd&network drivers

Before 3.9.9-302.fc19 my host machine totally crashed when running Win2k8R2 guests. 

With 3.9.9-302.fc19 situation is much more better (no host crash), but Win2k8R guests are working strange (sometimes they losts network connection and becomes zombie)

As temporary workaround I change VirtIO network drivers to Realtek on my guests. No problems so far(?).

Comment 57 Andrew Birch 2013-07-15 02:42:23 UTC
I just submitted a crash from F19 kernel-3.9.9-302.fc19.x86_64, and abrt pointed me to this bug. (actually 981052)

I was running a win8 KVM guest with a NetworkManager managed bridge.

Comment 58 Cong Wang 2013-07-15 12:17:34 UTC
(In reply to Charles R. Anderson from comment #47)
> 
> No good, still crashed with 302 kernel + above patch.  There is no IPv6
> mentioned in this backtrace, so I think there are still problems in the
> bridge code.
> 

For me, this is more likely to be a bug in kernel/timer.c rather than bridge, also given the fact same list corruption bug (the one in comment #30) appears in non-bridge too.

Therefore it is more likely not related with bridge.

Comment 59 Cong Wang 2013-07-15 12:40:41 UTC
(In reply to Cong Wang from comment #58)
> 
> For me, this is more likely to be a bug in kernel/timer.c rather than
> bridge, also given the fact same list corruption bug (the one in comment
> #30) appears in non-bridge too.
> 

Or even more likely, there is another place where there is a bug on timer list operation, like the one I noticed in comment #35.

So, do you *always* get the same backtrace? Your /proc/timer_list right before the crash would help too, but I think it is hard get right before the crash happens.

Or maybe running kernel-debug could give more information...

Thanks!

Comment 60 Stephen Murray 2013-07-15 17:42:20 UTC
I have seen this problem on both of my KVM machines, but on neither of my non-KVM machines. It has occurred on every kernel *after* the one that came with the F19 DVD images. The original 3.9.5-301.fc19 has *never* crashed, and as a workaround I boot into it on the KVM machines. All of the other kernels have crashed with the same error, including the latest 3.9.9-302.fc19. It appears that a patch introduced after 3.9.5-301.fc19 is the culprit.

I just noticed that the "last reboot" command is showing weird uptimes below. Perhaps caused by rebooting into an earlier kernel.

[root@murraysj log]# last reboot
reboot   system boot  3.9.5-301.fc19.x Mon Jul 15 11:26 - 13:36  (02:09)    
reboot   system boot  3.9.9-302.fc19.x Mon Jul 15 09:45 - 13:36  (03:50)    
reboot   system boot  3.9.5-301.fc19.x Mon Jul  8 13:38 - 09:45 (6+20:06)   
reboot   system boot  3.9.9-301.fc19.x Mon Jul  8 13:30 - 09:45 (6+20:14)   
reboot   system boot  3.9.8-300.fc19.x Mon Jul  8 13:20 - 13:30  (00:09)    
reboot   system boot  3.9.8-300.fc19.x Fri Jul  5 12:07 - 13:30 (3+01:22)   
reboot   system boot  3.9.8-300.fc19.x Fri Jul  5 11:44 - 12:06  (00:22)    
reboot   system boot  3.9.5-301.fc19.x Fri Jul  5 11:15 - 11:43  (00:27)    
reboot   system boot  3.9.5-301.fc19.x Fri Jul  5 11:12 - 11:15  (00:03)    
reboot   system boot  3.9.5-301.fc19.x Fri Jul  5 11:08 - 11:12  (00:03)    
reboot   system boot  3.9.5-301.fc19.x Fri Jul  5 11:03 - 11:08  (00:05)    
reboot   system boot  3.9.5-301.fc19.x Fri Jul  5 10:47 - 11:02  (00:14)    

wtmp begins Fri Jul  5 10:47:33 2013
[root@murraysj log]#

Comment 61 Fedora Update System 2013-07-18 06:11:09 UTC
kernel-3.9.10-100.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 62 Milovidov Mikhail 2013-07-18 06:51:05 UTC
For me 3.9.9-302.fc19 resolve this issue. Thanks!

Comment 64 galens 2013-07-18 17:02:28 UTC
At this point, I believe my continuing crashes are a separate bug.  I opened Bug 98567 to address those issue.  

This fix certainly has made my system more stable; thank you.

Comment 65 Izhar Firdaus 2013-07-19 02:35:18 UTC
(In reply to Alexander Dyadyun from comment #56)
> 
> With 3.9.9-302.fc19 situation is much more better (no host crash), but
> Win2k8R guests are working strange (sometimes they losts network connection
> and becomes zombie)
> 
> As temporary workaround I change VirtIO network drivers to Realtek on my
> guests. No problems so far(?).

This problem sounds like this bug : https://bugzilla.redhat.com/show_bug.cgi?id=975065

Comment 66 Alexander Dyadyun 2013-07-19 06:53:25 UTC
(In reply to Izhar Firdaus from comment #65)

> This problem sounds like this bug :
> https://bugzilla.redhat.com/show_bug.cgi?id=975065

Thanks for the info! This is definitely my case.

Comment 67 Michael Simoni 2013-07-19 15:51:24 UTC
I started experiencing this bug (or a bug) with 3.9.8-100.fc17.x86_64. My Knowledge is limited so I am guessing this is the most relevant thread.
I updated to 3.9.10-100.fc17.x86_64 and although it does seem to be more stable I have experienced a crash. Previously to these two I was running 3.8.13-100.fc17.x86_64 which was very stable. 
I have three vmcores since I upgraded to 3.9.8.100. Each COMMAND in the crash header output is different. One of the vmcores PANIC lines  said this "kernel BUG at kernel/timer.c:729" which led me to this bugzilla thread. I have had many more crashes then that. I do use the fedora VM engine but at the time of the  crash I had no guest actively running. I will continue to monitor this.

approximate uptime since last crash to the time of this post.
 11:48:35 up  3:20,  3 users,  load average: 0.25, 0.19, 0.18

BIOS Information
        Vendor: American Megatrends Inc.
        Version: 0307   
        Release Date: 12/15/2010
Base Board Information
        Manufacturer: ASUSTeK Computer INC.
        Product Name: M4A88T-V EVO/USB3
Processor Information
        Socket Designation: AM3
        Type: Central Processor
        Family: Athlon II
        Manufacturer: AMD              
        ID: 52 0F 10 00 FF FB 8B 17
        Signature: Family 16, Model 5, Stepping 2
        Version: AMD Athlon(tm) II X4 620 Processor 

Current Crash data:
/etc/sysctrl.conf
   # Reboot 5 seconds after panic
   kernel.panic = 5
   # Panic if a hung task was found
   kernel.hung_task_panic = 1
   # Setup timeout for hung task to 300 seconds
   kernel.hung_task_timeout_secs = 300

KERNEL: /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux
    DUMPFILE: ./vmcore  [PARTIAL DUMP]
        CPUS: 4
        DATE: Fri Jul 19 08:24:22 2013
      UPTIME: 02:45:01
LOAD AVERAGE: 0.03, 0.15, 0.20
       TASKS: 401
    NODENAME: *****
     RELEASE: 3.9.10-100.fc17.x86_64
     VERSION: #1 SMP Sun Jul 14 01:31:27 UTC 2013
     MACHINE: x86_64  (2611 Mhz)
      MEMORY: 15.7 GB
       PANIC: ""
         PID: 19
     COMMAND: "ksoftirqd/2"
        TASK: ffff88040986b4f0  [THREAD_INFO: ffff88040988e000]
         CPU: 2
       STATE: TASK_RUNNING (PANIC)

This is from /var/log/messages:
[ 1379.737499] ------------[ cut here ]------------
[ 1379.737519] WARNING: at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[ 1379.737525] Hardware name: System Product Name
[ 1379.737531] list_add corruption. prev->next should be next (ffff8804098bd538), but was           (null). (prev=ffff880402afa700).
[ 1379.737535] Modules linked in: bnep bluetooth rfkill fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer acpi_cpufreq mperf snd sp5100_tco shpchp i2c_piix4 edac_core soundcore edac_mce_amd microcode serio_raw k10temp asus_atk0110 vhost_net tun macvtap macvlan kvm_amd kvm ecryptfs encrypted_keys trusted tpm tpm_bios nfsd auth_rpcgss nfs_acl lockd uinput binfmt_misc raid1 ata_generic pata_acpi firewire_ohci firewire_core 3c59x crc_itu_t tulip pata_atiixp r8169 mii wmi radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_t
ransport_iscsi
[ 1379.737673] Pid: 0, comm: swapper/2 Not tainted 3.9.8-100.fc17.x86_64 #1
[ 1379.737678] Call Trace:
[ 1379.737683]  <IRQ>  [<ffffffff8105efd5>] warn_slowpath_common+0x75/0xa0
[ 1379.737702]  [<ffffffff8105f0b6>] warn_slowpath_fmt+0x46/0x50
[ 1379.737711]  [<ffffffff81586594>] ? ip_rcv_finish+0x194/0x340
[ 1379.737718]  [<ffffffff813152ce>] __list_add+0xbe/0xd0
[ 1379.737725]  [<ffffffff8106e4d3>] __internal_add_timer+0x113/0x130
[ 1379.737731]  [<ffffffff8106eae0>] internal_add_timer+0x20/0x50
[ 1379.737737]  [<ffffffff8106fdf4>] mod_timer+0x124/0x200
[ 1379.737759]  [<ffffffffa05f8552>] br_multicast_rcv+0x862/0x1330 [bridge]
[ 1379.737767]  [<ffffffff8157d7d6>] ? nf_iterate+0x86/0xb0
[ 1379.737782]  [<ffffffffa05ee4a0>] ? br_handle_local_finish+0x60/0x60 [bridge]
[ 1379.737796]  [<ffffffffa05ee6f2>] br_handle_frame_finish+0x252/0x330 [bridge]
[ 1379.737810]  [<ffffffffa05ee946>] br_handle_frame+0x176/0x280 [bridge]
[ 1379.737818]  [<ffffffff8154fce2>] __netif_receive_skb_core+0x352/0x7f0
[ 1379.737826]  [<ffffffff8101b913>] ? native_sched_clock+0x13/0x80
[ 1379.737833]  [<ffffffff815501a1>] __netif_receive_skb+0x21/0x70
[ 1379.737839]  [<ffffffff815503a3>] netif_receive_skb+0x33/0xb0
[ 1379.737846]  [<ffffffff81550dc8>] napi_gro_receive+0x98/0xd0
[ 1379.737864]  [<ffffffffa0174deb>] rtl8169_poll+0x17b/0x6c0 [r8169]
[ 1379.737872]  [<ffffffff81550aa9>] net_rx_action+0x149/0x240
[ 1379.737879]  [<ffffffff81067688>] __do_softirq+0xe8/0x230
[ 1379.737886]  [<ffffffff81067955>] irq_exit+0xa5/0xb0
[ 1379.737894]  [<ffffffff81668963>] do_IRQ+0x63/0xe0
[ 1379.737901]  [<ffffffff8165e96d>] common_interrupt+0x6d/0x6d
[ 1379.737905]  <EOI>  [<ffffffff81044136>] ? native_safe_halt+0x6/0x10
[ 1379.737919]  [<ffffffff8101c641>] default_idle+0x41/0x100
[ 1379.737926]  [<ffffffff8101c792>] amd_e400_idle+0x92/0x120
[ 1379.737933]  [<ffffffff8101d16e>] cpu_idle+0xfe/0x120
[ 1379.737941]  [<ffffffff8164db66>] start_secondary+0x24f/0x251
[ 1379.737947] ---[ end trace d41df35e202e8e84 ]---
[ 1379.737950] ------------[ cut here ]------------
[ 1379.737956] WARNING: at lib/list_debug.c:36 __list_add+0x9c/0xd0()
[ 1379.737960] Hardware name: System Product Name
[ 1379.737964] list_add double add: new=ffff880402afa700, prev=ffff880402afa700, next=ffff8804098bd538.
[ 1379.737967] Modules linked in: bnep bluetooth rfkill fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer acpi_cpufreq mperf snd sp5100_tco shpchp i2c_piix4 edac_core soundcore edac_mce_amd microcode serio_raw k10temp asus_atk0110 vhost_net tun macvtap macvlan kvm_amd kvm ecryptfs encrypted_keys trusted tpm tpm_bios nfsd auth_rpcgss nfs_acl lockd uinput binfmt_misc raid1 ata_generic pata_acpi firewire_ohci firewire_core 3c59x crc_itu_t tulip pata_atiixp r8169 mii wmi radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_t
ransport_iscsi
[ 1379.738077] Pid: 0, comm: swapper/2 Tainted: G        W    3.9.8-100.fc17.x86_64 #1
[ 1379.738080] Call Trace:
[ 1379.738083]  <IRQ>  [<ffffffff8105efd5>] warn_slowpath_common+0x75/0xa0
[ 1379.738097]  [<ffffffff8105f0b6>] warn_slowpath_fmt+0x46/0x50
[ 1379.738103]  [<ffffffff81586594>] ? ip_rcv_finish+0x194/0x340
[ 1379.738109]  [<ffffffff813152ac>] __list_add+0x9c/0xd0
[ 1379.738116]  [<ffffffff8106e4d3>] __internal_add_timer+0x113/0x130
[ 1379.738122]  [<ffffffff8106eae0>] internal_add_timer+0x20/0x50
[ 1379.738128]  [<ffffffff8106fdf4>] mod_timer+0x124/0x200
[ 1379.738145]  [<ffffffffa05f8552>] br_multicast_rcv+0x862/0x1330 [bridge]
[ 1379.738152]  [<ffffffff8157d7d6>] ? nf_iterate+0x86/0xb0
[ 1379.738166]  [<ffffffffa05ee4a0>] ? br_handle_local_finish+0x60/0x60 [bridge]
[ 1379.738180]  [<ffffffffa05ee6f2>] br_handle_frame_finish+0x252/0x330 [bridge]
[ 1379.738194]  [<ffffffffa05ee946>] br_handle_frame+0x176/0x280 [bridge]
[ 1379.738201]  [<ffffffff8154fce2>] __netif_receive_skb_core+0x352/0x7f0
[ 1379.738208]  [<ffffffff8101b913>] ? native_sched_clock+0x13/0x80
[ 1379.738215]  [<ffffffff815501a1>] __netif_receive_skb+0x21/0x70
[ 1379.738221]  [<ffffffff815503a3>] netif_receive_skb+0x33/0xb0
[ 1379.738229]  [<ffffffff81550dc8>] napi_gro_receive+0x98/0xd0
[ 1379.738245]  [<ffffffffa0174deb>] rtl8169_poll+0x17b/0x6c0 [r8169]
[ 1379.738253]  [<ffffffff81550aa9>] net_rx_action+0x149/0x240
[ 1379.738260]  [<ffffffff81067688>] __do_softirq+0xe8/0x230
[ 1379.738267]  [<ffffffff81067955>] irq_exit+0xa5/0xb0
[ 1379.738274]  [<ffffffff81668963>] do_IRQ+0x63/0xe0
[ 1379.738280]  [<ffffffff8165e96d>] common_interrupt+0x6d/0x6d
[ 1379.738283]  <EOI>  [<ffffffff81044136>] ? native_safe_halt+0x6/0x10
[ 1379.738296]  [<ffffffff8101c641>] default_idle+0x41/0x100
[ 1379.738303]  [<ffffffff8101c792>] amd_e400_idle+0x92/0x120
[ 1379.738310]  [<ffffffff8101d16e>] cpu_idle+0xfe/0x120
[ 1379.738317]  [<ffffffff8164db66>] start_secondary+0x24f/0x251
[ 1379.738322] ---[ end trace d41df35e202e8e85 ]---
[ 1382.857847] ------------[ cut here ]------------
[ 1382.857868] WARNING: at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[ 1382.857873] Hardware name: System Product Name
[ 1382.857880] list_add corruption. prev->next should be next (ffff8804098bd538), but was           (null). (prev=ffff880402afa700).
[ 1382.857884] Modules linked in: bnep bluetooth rfkill fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer acpi_cpufreq mperf snd sp5100_tco shpchp i2c_piix4 edac_core soundcore edac_mce_amd microcode serio_raw k10temp asus_atk0110 vhost_net tun macvtap macvlan kvm_amd kvm ecryptfs encrypted_keys trusted tpm tpm_bios nfsd auth_rpcgss nfs_acl lockd uinput binfmt_misc raid1 ata_generic pata_acpi firewire_ohci firewire_core 3c59x crc_itu_t tulip pata_atiixp r8169 mii wmi radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_t
ransport_iscsi
[ 1382.858024] Pid: 0, comm: swapper/2 Tainted: G        W    3.9.8-100.fc17.x86_64 #1
[ 1382.858028] Call Trace:
[ 1382.858033]  <IRQ>  [<ffffffff8105efd5>] warn_slowpath_common+0x75/0xa0
[ 1382.858052]  [<ffffffff8105f0b6>] warn_slowpath_fmt+0x46/0x50
[ 1382.858060]  [<ffffffff81141573>] ? put_page+0x53/0x60
[ 1382.858067]  [<ffffffff813152ce>] __list_add+0xbe/0xd0
[ 1382.858074]  [<ffffffff8106e4d3>] __internal_add_timer+0x113/0x130
[ 1382.858081]  [<ffffffff8106eae0>] internal_add_timer+0x20/0x50
[ 1382.858087]  [<ffffffff8106fdf4>] mod_timer+0x124/0x200
[ 1382.858107]  [<ffffffffa05f61ba>] br_multicast_query_received+0x5a/0xe0 [bridge]
[ 1382.858125]  [<ffffffffa05f846e>] br_multicast_rcv+0x77e/0x1330 [bridge]
[ 1382.858133]  [<ffffffff8157d7d6>] ? nf_iterate+0x86/0xb0
[ 1382.858148]  [<ffffffffa05ee4a0>] ? br_handle_local_finish+0x60/0x60 [bridge]
[ 1382.858162]  [<ffffffffa05ee6f2>] br_handle_frame_finish+0x252/0x330 [bridge]
[ 1382.858176]  [<ffffffffa05ee946>] br_handle_frame+0x176/0x280 [bridge]
[ 1382.858184]  [<ffffffff8154fce2>] __netif_receive_skb_core+0x352/0x7f0
[ 1382.858191]  [<ffffffff815501a1>] __netif_receive_skb+0x21/0x70
[ 1382.858198]  [<ffffffff815503a3>] netif_receive_skb+0x33/0xb0
[ 1382.858205]  [<ffffffff81550dc8>] napi_gro_receive+0x98/0xd0
[ 1382.858223]  [<ffffffffa0174deb>] rtl8169_poll+0x17b/0x6c0 [r8169]
[ 1382.858230]  [<ffffffff81550aa9>] net_rx_action+0x149/0x240
[ 1382.858238]  [<ffffffff81067688>] __do_softirq+0xe8/0x230
[ 1382.858245]  [<ffffffff81067955>] irq_exit+0xa5/0xb0
[ 1382.858254]  [<ffffffff81668963>] do_IRQ+0x63/0xe0
[ 1382.858261]  [<ffffffff8165e96d>] common_interrupt+0x6d/0x6d
[ 1382.858264]  <EOI>  [<ffffffff81044136>] ? native_safe_halt+0x6/0x10
[ 1382.858279]  [<ffffffff8101c641>] default_idle+0x41/0x100
[ 1382.858286]  [<ffffffff8101c792>] amd_e400_idle+0x92/0x120
[ 1382.858293]  [<ffffffff8101d16e>] cpu_idle+0xfe/0x120
[ 1382.858301]  [<ffffffff8164db66>] start_secondary+0x24f/0x251
[ 1382.858306] ---[ end trace d41df35e202e8e86 ]---
[ 1604.979306] ------------[ cut here ]------------
[ 1604.979327] WARNING: at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[ 1604.979333] Hardware name: System Product Name
[ 1604.979339] list_add corruption. prev->next should be next (ffff8804098bd538), but was           (null). (prev=ffff880402afa700).
[ 1604.979343] Modules linked in: bnep bluetooth rfkill fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle bridge stp llc ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer acpi_cpufreq mperf snd sp5100_tco shpchp i2c_piix4 edac_core soundcore edac_mce_amd microcode serio_raw k10temp asus_atk0110 vhost_net tun macvtap macvlan kvm_amd kvm ecryptfs encrypted_keys trusted tpm tpm_bios nfsd auth_rpcgss nfs_acl lockd uinput binfmt_misc raid1 ata_generic pata_acpi firewire_ohci firewire_core 3c59x crc_itu_t tulip pata_atiixp r8169 mii wmi radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_t
ransport_iscsi
[ 1604.979484] Pid: 0, comm: swapper/2 Tainted: G        W    3.9.8-100.fc17.x86_64 #1
[ 1604.979488] Call Trace:
[ 1604.979493]  <IRQ>  [<ffffffff8105efd5>] warn_slowpath_common+0x75/0xa0
[ 1604.979512]  [<ffffffff8105f0b6>] warn_slowpath_fmt+0x46/0x50
[ 1604.979519]  [<ffffffff813152ce>] __list_add+0xbe/0xd0
[ 1604.979526]  [<ffffffff8106e4d3>] __internal_add_timer+0x113/0x130
[ 1604.979532]  [<ffffffff8106eae0>] internal_add_timer+0x20/0x50
[ 1604.979539]  [<ffffffff8106fc5f>] mod_timer_pending+0xff/0x170
[ 1604.979557]  [<ffffffffa06154e8>] __nf_ct_refresh_acct+0xa8/0xc0 [nf_conntrack]
[ 1604.979575]  [<ffffffffa061db6d>] udp_packet+0x5d/0xa0 [nf_conntrack]
[ 1604.979590]  [<ffffffffa06176fd>] nf_conntrack_in+0x38d/0xa30 [nf_conntrack]
[ 1604.979600]  [<ffffffff81093e10>] ? try_to_wake_up+0x2d0/0x2d0
[ 1604.979610]  [<ffffffffa0631601>] ipv4_conntrack_in+0x21/0x30 [nf_conntrack_ipv4]
[ 1604.979618]  [<ffffffff8157d7d6>] nf_iterate+0x86/0xb0
[ 1604.979626]  [<ffffffff81586400>] ? inet_add_protocol+0x50/0x50
[ 1604.979633]  [<ffffffff8157d874>] nf_hook_slow+0x74/0x130
[ 1604.979638]  [<ffffffff81586400>] ? inet_add_protocol+0x50/0x50
[ 1604.979645]  [<ffffffff81586df8>] ip_rcv+0x298/0x360
[ 1604.979653]  [<ffffffff81550012>] __netif_receive_skb_core+0x682/0x7f0
[ 1604.979660]  [<ffffffff815501a1>] __netif_receive_skb+0x21/0x70
[ 1604.979667]  [<ffffffff815503a3>] netif_receive_skb+0x33/0xb0
[ 1604.979683]  [<ffffffffa05ee6df>] br_handle_frame_finish+0x23f/0x330 [bridge]
[ 1604.979697]  [<ffffffffa05ee946>] br_handle_frame+0x176/0x280 [bridge]
[ 1604.979704]  [<ffffffff8154fce2>] __netif_receive_skb_core+0x352/0x7f0
[ 1604.979713]  [<ffffffff8109bff4>] ? enqueue_entity+0x384/0x970
[ 1604.979719]  [<ffffffff815501a1>] __netif_receive_skb+0x21/0x70
[ 1604.979726]  [<ffffffff815503a3>] netif_receive_skb+0x33/0xb0
[ 1604.979733]  [<ffffffff81550dc8>] napi_gro_receive+0x98/0xd0
[ 1604.979752]  [<ffffffffa0174deb>] rtl8169_poll+0x17b/0x6c0 [r8169]
[ 1604.979759]  [<ffffffff81550aa9>] net_rx_action+0x149/0x240
[ 1604.979767]  [<ffffffff81067688>] __do_softirq+0xe8/0x230
[ 1604.979774]  [<ffffffff81067955>] irq_exit+0xa5/0xb0
[ 1604.979782]  [<ffffffff81668963>] do_IRQ+0x63/0xe0
[ 1604.979789]  [<ffffffff8165e96d>] common_interrupt+0x6d/0x6d
[ 1604.979792]  <EOI>  [<ffffffff81044136>] ? native_safe_halt+0x6/0x10
[ 1604.979807]  [<ffffffff8101c641>] default_idle+0x41/0x100
[ 1604.979814]  [<ffffffff8101c792>] amd_e400_idle+0x92/0x120
[ 1604.979821]  [<ffffffff8101d16e>] cpu_idle+0xfe/0x120
[ 1604.979828]  [<ffffffff8164db66>] start_secondary+0x24f/0x251
[ 1604.979833] ---[ end trace d41df35e202e8e87 ]---

Comment 68 Jérémie Grauer 2013-07-19 17:25:23 UTC
I have the same problem since I started using kernel 3.9.9-xxx, didn't have the problem with 3.9.8.

I have a bridge configured and KVM installed but I didn't use any KVM guest for a while.

It's a very annoying crash that happen at random, sometime I'm just typing, other time I come back to my computer and find it freezed...

I freeze at least 4 times a day.

Fedora 19 kernel 3.9.9-302.fc19.x86_64, Dell Laptop Latitude E6530, optimus disabled in BIOS, trace :

Jul 19 18:59:26 lpt-001 kernel: [ 2474.521784] ------------[ cut here ]------------
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521801] WARNING: at lib/list_debug.c:33 __list_add+0xac/0xc0()
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521803] Hardware name: Latitude E6530
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521806] list_add corruption. prev->next should be next (ffff880221d494e8), but was           (null). (prev=ffff880206e0f1c0).
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521808] Modules linked in: fuse ip6table_filter ip6_tables ebtable_nat ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc bnep bluetooth snd_hda_codec_hdmi snd_hda_codec_idt iTCO_wdt iTCO_vendor_support xfs libcrc32c arc4 acpi_cpufreq mperf coretemp dell_wmi ppdev sparse_keymap kvm_intel kvm crc32_pclmul crc32c_intel dell_laptop iwldvm dcdbas ghash_clmulni_intel mac80211 microcode snd_hda_intel snd_hda_codec iwlwifi snd_hwdep uvcvideo videobuf2_vmalloc snd_seq cfg80211 videobuf2_memops i2c_i801 videobuf2_core videodev snd_seq_device media sdhci_pci snd_pcm sdhci lpc_ich mmc_core mfd_core rfkill e1000e snd_page_alloc snd_timer snd mei ptp pps_core soundcore parport_pc parport uinput nouveau mxm_wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core wmi hid_logitech_dj video [last unloaded: ipmi_msghandler]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521892] Pid: 2312, comm: Socket Thread Not tainted 3.9.9-302.fc19.x86_64 #1
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521894] Call Trace:
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521897]  <IRQ>  [<ffffffff81306d00>] ? __list_add+0x30/0xc0
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521912]  [<ffffffff8105cc56>] warn_slowpath_common+0x66/0x80
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521917]  [<ffffffff8105ccbc>] warn_slowpath_fmt+0x4c/0x50
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521925]  [<ffffffff810886f8>] ? __wake_up_common+0x58/0x90
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521930]  [<ffffffff81306d7c>] __list_add+0xac/0xc0
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521937]  [<ffffffff8106bed3>] __internal_add_timer+0x113/0x130
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521943]  [<ffffffff8106c527>] internal_add_timer+0x17/0x40
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521947]  [<ffffffff8106d69b>] mod_timer_pending+0xfb/0x170
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521958]  [<ffffffffa06c6910>] __nf_ct_refresh_acct+0xb0/0xc0 [nf_conntrack]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521968]  [<ffffffffa06cda93>] tcp_packet+0x6b3/0x1530 [nf_conntrack]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521974]  [<ffffffff81568894>] ? nf_hook_slow+0x74/0x130
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521982]  [<ffffffffa06c6d12>] ? ____nf_conntrack_find+0x122/0x160 [nf_conntrack]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521991]  [<ffffffffa06c8672>] nf_conntrack_in+0x382/0xa00 [nf_conntrack]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.521998]  [<ffffffffa04202e1>] ipv4_conntrack_in+0x21/0x30 [nf_conntrack_ipv4]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522001]  [<ffffffff8156880b>] nf_iterate+0x8b/0xa0
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522012]  [<ffffffffa06b51c0>] ? br_nf_pre_routing_finish_ipv6+0x150/0x150 [bridge]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522016]  [<ffffffff81568894>] nf_hook_slow+0x74/0x130
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522024]  [<ffffffffa06b51c0>] ? br_nf_pre_routing_finish_ipv6+0x150/0x150 [bridge]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522032]  [<ffffffffa06b5aa0>] br_nf_pre_routing+0x570/0x640 [bridge]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522036]  [<ffffffff8156880b>] nf_iterate+0x8b/0xa0
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522044]  [<ffffffffa06ae370>] ? br_handle_local_finish+0x60/0x60 [bridge]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522048]  [<ffffffff81568894>] nf_hook_slow+0x74/0x130
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522054]  [<ffffffffa06ae370>] ? br_handle_local_finish+0x60/0x60 [bridge]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522062]  [<ffffffffa06ae870>] br_handle_frame+0x1d0/0x270 [bridge]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522067]  [<ffffffff8153b232>] __netif_receive_skb_core+0x242/0x7f0
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522072]  [<ffffffff8101a300>] ? native_read_tsc+0x20/0x20
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522076]  [<ffffffff8153b7f8>] __netif_receive_skb+0x18/0x60
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522081]  [<ffffffff8153b873>] netif_receive_skb+0x33/0xb0
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522084]  [<ffffffff8153c250>] napi_gro_receive+0x80/0xb0
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522102]  [<ffffffffa0233f93>] e1000_receive_skb+0x73/0xd0 [e1000e]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522115]  [<ffffffffa023540a>] e1000_clean_rx_irq+0x24a/0x400 [e1000e]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522128]  [<ffffffffa023cf2d>] e1000e_poll+0x6d/0x310 [e1000e]
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522132]  [<ffffffff8153bbe9>] net_rx_action+0x149/0x240
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522137]  [<ffffffff81065657>] __do_softirq+0xf7/0x240
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522142]  [<ffffffff81065925>] irq_exit+0xa5/0xb0
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522148]  [<ffffffff81650f56>] do_IRQ+0x56/0xc0
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522154]  [<ffffffff816471ad>] common_interrupt+0x6d/0x6d
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522155]  <EOI>  [<ffffffff8164f440>] ? sysret_audit+0x17/0x21
Jul 19 18:59:26 lpt-001 kernel: [ 2474.522162] ---[ end trace 2eb432fe7e1f2323 ]---

Comment 69 Michael Simoni 2013-07-19 20:40:15 UTC
update - system froze. no vmcore. just frozen. And froze again as I was writing this.
Linux 3.9.10-100.fc17.x86_64 is still unstable.
I have another server that has 3.9.10-100 loaded but it seems to be stable.
One difference is that the one that is stable has the KVM installed and but no guest running. It has no bridge interfaces configured while the one that is not stable has two bridge interfaces configured with three standard ethernet interfaces (actually four but one is inactive). One ethernet interface is not bridged.
The information from /var/log/messages looks like the same stack trace as before.

Comment 70 Paul P Komkoff Jr 2013-07-20 14:34:16 UTC
I'm just going to reopen it, since I'm (on 3.9.9-302.fc19) affected as well.
I have tried to upload my oopses using abrt, which pointed me to this bug.

Comment 71 Robert Sigler 2013-07-20 14:56:41 UTC
Description of problem:
System completely hangs. Seems to be random; i've yet to determine a series of events which lead to the issue. Sometimes I'm web browsing, sometimes cp'ing a large amount of files, a few times, i've reboot my system, left it at the login prompt, went to bed... next morning, frozen system.

Version-Release number of selected component:
kernel

Additional info:
reporter:       libreport-2.1.5
cmdline:        BOOT_IMAGE=/boot/vmlinuz-3.9.9-301.fc19.x86_64 root=/dev/mapper/vg_pyro-lv_fedora_system ro rd.md=0 rd.dm=0 vconsole.keymap=us rd.lvm.lv=vg_pyro/lv_fedora_system rd.luks=0 vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_pyro/lv_swap rhgb quiet LANG=en_US.UTF-8
kernel:         3.9.9-301.fc19.x86_64
runlevel:       N 5
type:           Kerneloops

Truncated backtrace:
WARNING: at lib/list_debug.c:33 __list_add+0xac/0xc0()
Hardware name: System Product Name
list_add corruption. prev->next should be next (ffff88032e7e9578), but was           (null). (prev=ffff88030656dc00).
Modules linked in: vhost_net macvtap macvlan tun bnep bluetooth rfkill fuse ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat bridge stp llc iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_hdmi raid1 iTCO_wdt iTCO_vendor_support acpi_cpufreq mperf coretemp kvm_intel kvm crc32c_intel uvcvideo microcode videobuf2_vmalloc videobuf2_memops videobuf2_core videodev snd_hda_codec_realtek media snd_usb_audio snd_usbmidi_lib serio_raw snd_rawmidi snd_hda_intel joydev snd_hda_codec i2c_i801 snd_hwdep snd_seq lpc_ich snd_seq_device mfd_core snd_pcm sky2 snd_page_alloc snd_timer i7core_edac snd soundcore edac_core asus_atk0110 uinput xfs libcrc32c nouveau video mxm_wmi i2c_algo_bit drm_kms_helper ttm drm i2c_core wmi
Pid: 0, comm: swapper/7 Not tainted 3.9.9-301.fc19.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81306d00>] ? __list_add+0x30/0xc0
 [<ffffffff8105cc56>] warn_slowpath_common+0x66/0x80
 [<ffffffff8105ccbc>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff8152d4ac>] ? consume_skb+0x2c/0x80
 [<ffffffff81306d7c>] __list_add+0xac/0xc0
 [<ffffffff8106bed3>] __internal_add_timer+0x113/0x130
 [<ffffffff8106c527>] internal_add_timer+0x17/0x40
 [<ffffffff8106d812>] mod_timer+0x102/0x210
 [<ffffffffa050821b>] br_multicast_rcv+0x86b/0x1220 [bridge]
 [<ffffffffa04fe5ba>] br_handle_frame_finish+0x24a/0x330 [bridge]
 [<ffffffffa04fe825>] br_handle_frame+0x185/0x270 [bridge]
 [<ffffffff8153b232>] __netif_receive_skb_core+0x242/0x7f0
 [<ffffffff8101a300>] ? native_read_tsc+0x20/0x20
 [<ffffffff8153b7f8>] __netif_receive_skb+0x18/0x60
 [<ffffffff8153b873>] netif_receive_skb+0x33/0xb0
 [<ffffffff8153c250>] napi_gro_receive+0x80/0xb0
 [<ffffffffa02b8a4f>] sky2_poll+0x71f/0xcb0 [sky2]
 [<ffffffff813cecac>] ? add_interrupt_randomness+0x15c/0x190
 [<ffffffff8153bbe9>] net_rx_action+0x149/0x240
 [<ffffffff81065657>] __do_softirq+0xf7/0x240
 [<ffffffff814f0c80>] ? intel_pstate_timer_func+0x2c0/0x2c0
 [<ffffffff81065925>] irq_exit+0xa5/0xb0
 [<ffffffff81650f56>] do_IRQ+0x56/0xc0
 [<ffffffff816471ad>] common_interrupt+0x6d/0x6d
 <EOI>  [<ffffffff814f1601>] ? cpuidle_wrap_enter+0x41/0x80
 [<ffffffff814f1650>] cpuidle_enter_tk+0x10/0x20
 [<ffffffff814f13b2>] cpuidle_idle_call+0xb2/0x1e0
 [<ffffffff8101c035>] cpu_idle+0xe5/0x140
 [<ffffffff81635f49>] start_secondary+0x249/0x24b

Comment 72 Paul P Komkoff Jr 2013-07-20 16:08:53 UTC
One more thing. My test machine used to hang as well, but I used shiny wonderful systemd watchdog feature and I no longer need to send a runner to reboot it! And I get nicely formatted oops to use with abrt.

Comment 73 Cong Wang 2013-07-21 02:52:56 UTC
Ok, the above crash is just fixed in upstream:

commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1
Author: Eric Dumazet <edumazet>
Date:   Fri Jul 19 20:07:16 2013 -0700

    bridge: do not call setup_timer() multiple times

Comment 74 Carlos Vidal 2013-07-22 13:51:37 UTC
The same here with 3.9.9-302.fc19.x86_64 on two Dell R510 servers. Just intalled custom build kernel with "setup_timer" patch mentionned above by Cong Wang. It used to crash in less than 3 days, I will report by the end of the week if they are is still up and running.

Comment 75 Josh Boyer 2013-07-22 18:59:24 UTC
I've applied the commit in comment #73 to the F18-rawhide kernels.

Comment 76 Josh Boyer 2013-07-22 19:04:51 UTC
*** Bug 985626 has been marked as a duplicate of this bug. ***

Comment 77 Fedora Update System 2013-07-23 00:10:58 UTC
kernel-3.9.11-200.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.9.11-200.fc18

Comment 78 Luc de Louw 2013-07-23 07:56:59 UTC
Both builds, kernel-3.9.11-200.fc18 and kernel-3.10.2-301.fc19 seems to fix the problem. Thanks :-)

Comment 79 Fedora Update System 2013-07-24 03:45:41 UTC
kernel-3.9.11-200.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 80 Josh Boyer 2013-07-24 11:37:17 UTC
*** Bug 987849 has been marked as a duplicate of this bug. ***

Comment 81 Fedora Update System 2013-07-26 03:24:13 UTC
kernel-3.10.3-300.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.10.3-300.fc19

Comment 82 Fedora Update System 2013-07-26 22:59:10 UTC
kernel-3.10.3-300.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 83 Carlos Vidal 2013-07-30 12:37:44 UTC
Unfortunately the bug persists after compiling 3.9.9-302.fc19.x86_64 with Cong Wang's patch (comment #73, see patch below). It run 5 days on 2 servers, but eventually crashed on one of them with:

[37374.697760] WARNING: at include/linux/mm.h:280 put_compound_page+0x68/0x270()
[37374.697762] Hardware name: PowerEdge R510
[37374.697764] Modules linked in: nfsd auth_rpcgss nfs_acl lockd sunrpc vhost_net macvtap macvlan tun drbd lru_cache ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle ipmi_si ipmi_devintf ipmi_msghandler bridge stp llc dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison libcrc32c acpi_cpufreq mperf coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support crc32_pclmul crc32c_intel ghash_clmulni_intel dcdbas microcode serio_raw lpc_ich mfd_core i7core_edac edac_core bnx2 mgag200 i2c_algo_bit drm_kms_helper ttm drm mpt2sas i2c_core raid_class scsi_transport_sas
[37374.697834] Pid: 2967, comm: vhost-2966 Not tainted 3.9.9-303.fc19.x86_64 [37374.697836] Call Trace:
[37374.697846]  [<ffffffff8105cc56>] warn_slowpath_common+0x66/0x80
[37374.697852]  [<ffffffffa02effd4>] ? tun_get_user+0x724/0x810 [tun]
[37374.697856]  [<ffffffff8105cd2a>] warn_slowpath_null+0x1a/0x20
[37374.697859]  [<ffffffff8113ca98>] put_compound_page+0x68/0x270
[37374.697863]  [<ffffffffa02effd4>] ? tun_get_user+0x724/0x810 [tun]
[37374.697867]  [<ffffffff8113cceb>] put_page+0x4b/0x60
[37374.697884]  [<ffffffff8152d13a>] __kfree_skb+0x1a/0xb0
[37374.697897]  [<ffffffff8152d202>] kfree_skb+0x
[37374.697907]  [<ffffffffa02effd4>] tun_get_user+0x724/0x810 [tun]
[37374.697912]  [<ffffffffa02f0117>] tun_sendmsg+0x57/0x80 [tun]
[37374.697917]  [<ffffffffa02ffa78>] handle_tx+0x1c8/0x640 [vhost_net]
[37374.697922]  [<ffffffffa02fff25>] handle_tx_kick+0x15/0x20 [vhost_net]
[37374.697926]  [<ffffffffa02fc81d>] vhost_worker+0xed/0x190 [vhost_net]
[37374.697930]  [<ffffffffa02fc730>] ? __vhost_add_used_n+0x100/0x100 vhost_net]


The network card was blocked by one of the VMs running W7 with virtio network drivers. The crash happened on Sunday night, at 20:00, when the guest was idle. The network card got so screwed up that IPMI stopped working on it until the server was unplugged!

I'm trying now with e1000 network cards. All the VMs in the server that did not crash use this card, while in the server that crashes all VMs use virtio.

Here is the patch I applied:

--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -619,6 +619,9 @@ rehash:
    mp->br = br;
    mp->addr = *group;

+   setup_timer(&mp->timer, br_multicast_group_expired,
+           (unsigned long)mp);
+
    hlist_add_head_rcu(&mp->hlist[mdb->ver], &mdb->mhash[hash]);
    mdb->size++;

@@ -1126,7 +1129,6 @@ static int br_ip4_multicast_query(struct net_bridge *br,
    if (!mp)
        goto out;

-   setup_timer(&mp->timer, br_multicast_group_expired, (unsigned long)mp);
    mod_timer(&mp->timer, now + br->multicast_membership_interval);
    mp->timer_armed = true;

@@ -1204,7 +1206,6 @@ static int br_ip6_multicast_query(struct net_bridge *br,
    if (!mp)
        goto out;

-   setup_timer(&mp->timer, br_multicast_group_expired, (unsigned long)mp);
    mod_timer(&mp->timer, now + br->multicast_membership_interval);
    mp->timer_armed = true;


Note You need to log in before you can comment on or make changes to this bug.