Bug 954181

Summary: macvlan/vhost_new: BUG: unable to handle kernel paging request macvlan_start_xmit
Product: [Fedora] Fedora Reporter: Reartes Guillermo <rtguille>
Component: kernelAssignee: jason wang <jasowang>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 19CC: bugzilla, c.schmitt, dzrudy, gansalmon, gleb, itamar, ivo.sir, jk, jonathan, kernel-maint, kevin, madhu.chinakonda, mrunge, mst, nhorman, paul.f.fee, pp, seanlkml, sgehwolf, smooge, suren, tkopecek, unforgiving2
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.9.5-301.fc19 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-08 17:22:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
freeze & call traces captured via serial port from another system
none
new kernel pannic with net kernel and requested vhost_net module parameter
none
dmesg + 3.10rc1 + sysrq+w
none
ps aux | grep qemu
none
virsh dumpxml file
none
Photo of screen after panic - Kernel 3.9.5-201.fc18 none

Description Reartes Guillermo 2013-04-21 16:18:29 UTC
Description of problem:

I am crashing the host from inside a guest (both F19a) when executing
a 'yum update' remotelly via ssh. 

My Systems:

* SYSTEM #1: HOST F17
* SYSTEM #2: HOST F19 (KVM Server) with a F19 Guest (System #3) [KDE + Virtualization + Other]
* SYSTEM #3: GUEST F19 (running on System #2) [Minimal Install + Standard]

Connect via ssh from System #1 (F17) to System #2 (F19 Guest) and executing 
a 'yum update' causes System #2 to either freeze/crash/panic.

System #3 (F19 Guest) goes silent, it passes the 'Dependencies Resolved' but
it never reaches the end of the list. 

Examining System #2 (F19 Host) shows it is also silent. Further examinaton of 
System #2 (F19 Host) shows that the screen is black and the keyboard caps-lock key does not work, so it looks like a freeze/crash. A reset or power-cycle is needed.

I reproduced the issue, but before executing the offender 'yum update' i pressed
a key on System #2 to be able to see what happens. I got some prety bad pictures that for some reason, there are several traces, only the last one is visible. It is from drivers/gpu/drm/drm_crtc.c. I could not obtain anything from the logs.

Version-Release number of selected component (if applicable):
F19a RC Guest running on an updated F19a HOST

How reproducible:
always

Steps to Reproduce:

0. Start System #3 (F19 Guest) on System #2 (F19 Host)
1. Connect from System #1 (F17 Host) to System #3 (F19 Guest)
2. Execute 'yum update'
3. System #2 (F19 Host) gets frozen after some call traces.

Actual results:
frozen System #2, must reset or power-cylcle.

Expected results:
do not freeze nor crash.

Comment 1 Reartes Guillermo 2013-04-21 18:13:44 UTC
Update: 

It seems that just scrolling text (which happens on yum) triggers it.
I just executed a 'find /' on system #3 (f19a guest) and it froze 
the system #2 (f19a host).

* SYSTEM #2: HOST F19 (KVM Server) 

Phenom II + M4N72-E (nvida) with 8gb RAM
Radeon HD5670 (Redwood)

Comment 2 Reartes Guillermo 2013-04-21 20:04:11 UTC
Update:

System #3 (F19 KVM Guest) has two ethernet devices:

From Virt-Manager the F19a Guest has:

eth0 -> virtio -> Source Device: Host Device enp1s9: macvtap -> Source Mode: Bridge 

eth1 -> virtio -> Source Device: Virtual Network 'xxxx': isolated network

Normally i do connect from system #1 (F17 Host) to system #3 (F19a Guest) via eth0.

I logged in to system #2 using KDE (locally) and connected to system #3 via eth1 using ssh, i was able to scroll and even perform the 'yum update'.

So, it looks like there is some issue with the macvtap (bridge) that brings down the host.

Comment 3 Reartes Guillermo 2013-04-26 22:37:24 UTC
I configured the system #2 to use ttyS0 as the console. I can capture the boot messages and reached the login. So i tried to get the complete call traces via an usb serial port.

Host:  3.9.0-0.rc8.git0.2.fc19.x86_64
Guest: 3.9.0-0.rc8.git0.2.fc19.x86_64

I got the call traces, and the freeze, this is it:
(will put the complete one in the next attachment)

[ 2891.918168] BUG: unable to handle kernel paging request at ffff880310d8500b
[ 2891.939118] IP: [<ffffffff8153b96c>] dev_queue_xmit+0xec/0x470
[ 2891.956648] PGD 1fc4067 PUD 0 
[ 2891.965908] Oops: 0000 [#1] SMP 
[ 2891.975662] Modules linked in: ebtable_nat xt_CHECKSUM bridge stp llc bnep bluetooth rfkill t
[ 2892.201923] CPU 0 
[ 2892.207444] Pid: 1441, comm: vhost-1440 Not tainted 3.9.0-0.rc8.git0.2.fc19.x86_64 #1 SystemE
[ 2892.243957] RIP: 0010:[<ffffffff8153b96c>]  [<ffffffff8153b96c>] dev_queue_xmit+0xec/0x470
[ 2892.268772] RSP: 0018:ffff880223839ca0  EFLAGS: 00010202
[ 2892.284690] RAX: ffff880210d85ec0 RBX: ffff880212bd6a00 RCX: ffff880310d84fff
[ 2892.306068] RDX: 0000000000000b92 RSI: 00000000ffffffff RDI: 0000000000000000
[ 2892.327448] RBP: ffff880223839cd8 R08: 0000000000000000 R09: 0000000000000000                
[ 2892.348831] R10: ffff880227001d00 R11: ffff880212509600 R12: ffff880221ba8a00                
[ 2892.370210] R13: ffff880221c4b000 R14: 000000000000000c R15: ffff880212bd6a9c                
[ 2892.391593] FS:  00007f29ce69a700(0000) GS:ffff88022fc00000(0000) knlGS:0000000000000000     
[ 2892.415832] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b                                
[ 2892.433050] CR2: ffff880310d8500b CR3: 000000020ba4b000 CR4: 00000000000007f0                
[ 2892.454430] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                
[ 2892.475810] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400                
[ 2892.497194] Process vhost-1440 (pid: 1441, threadinfo ffff880223838000, task ffff880221c7000)
[ 2892.522992] Stack:                                                                           
[ 2892.529027]  ffff880223839d08 ffff880223a05000 ffff880221c4b000 ffff88022293d000             
[ 2892.551395]  0000000000000b92 000000000000000c ffff880221c4b000 ffff880223839d08             
[ 2892.573713]  ffffffffa00ba2d2 ffff88020bbda000 0000000000000001 0000000000000b92             
[ 2892.596056] Call Trace:                                                                      
[ 2892.603405]  [<ffffffffa00ba2d2>] macvlan_start_xmit+0x62/0x100 [macvlan]                    
[ 2892.623740]  [<ffffffffa02351f5>] macvtap_get_user+0x305/0x4a0 [macvtap]                     
[ 2892.643819]  [<ffffffffa02353bb>] macvtap_sendmsg+0x2b/0x30 [macvtap]                        
[ 2892.663119]  [<ffffffffa0242bae>] handle_tx+0x31e/0x630 [vhost_net]                          
[ 2892.681898]  [<ffffffffa0242ef5>] handle_tx_kick+0x15/0x20 [vhost_net]                       
[ 2892.701457]  [<ffffffffa023f81d>] vhost_worker+0xed/0x190 [vhost_net]                        
[ 2892.720758]  [<ffffffffa023f730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net]               
[ 2892.742394]  [<ffffffff81080380>] kthread+0xc0/0xd0                                          
[ 2892.757013]  [<ffffffff810802c0>] ? insert_kthread_work+0x40/0x40                            
[ 2892.775268]  [<ffffffff8164d6ac>] ret_from_fork+0x7c/0xb0                                    
[ 2892.791448]  [<ffffffff810802c0>] ? insert_kthread_work+0x40/0x40                            
[ 2892.809703] Code: 01 c8 41 89 55 28 66 83 78 02 00 74 3b 41 8b b5 c8 00 00 00 41 8b bd d0 00 
[ 2892.868771] RIP  [<ffffffff8153b96c>] dev_queue_xmit+0xec/0x470                              
[ 2892.886562]  RSP <ffff880223839ca0>                                                          
[ 2892.897018] CR2: ffff880310d8500b

Comment 4 Reartes Guillermo 2013-04-26 22:42:04 UTC
Created attachment 740647 [details]
freeze & call traces captured via serial port from another system

Note: 

I transferred a 17mb file via scp and it worked ok.
But just doing a '# find /' on system #3 causes it.

Doing the very same find command on system #3 connected from #2 KDE konsole via a virtual private network works ok.

Comment 5 Josh Boyer 2013-04-29 13:11:46 UTC
Neil, want to peek at this one?  I know we have another vhost_net issue reported somewhere, but I can't tell if it's the same and just causing macv{tap,lan} trouble or if this is something different.

Comment 6 Reartes Guillermo 2013-05-05 21:29:53 UTC
I tried with a new kernel version:

Host:  3.9.0-301.fc19.x86_64
GuesT: 3.9.0-301.fc19.x86_64

It still happening.

[   99.586170] BUG: unable to handle kernel paging request at ffff88031a03600b
[   99.612013] IP: [<ffffffff8153ba3c>] dev_queue_xmit+0xec/0x470
[   99.634404] PGD 1fc4067 PUD 0 
[   99.648485] Oops: 0000 [#1] SMP 
[   99.663054] Modules linked in: ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat bnep iptable_mangle nf_conntrack_ipv4 bluetooth nf_defrag_ipv4 xt_conntrack rfkill nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device acpi_cpufreq mperf snd_pcm microcode serio_raw k10temp edac_core edac_mce_amd snd_page_alloc sundance snd_timer mii snd asus_atk0110 soundcore video i2c_nforce2 wmi vhost_net tun macvtap macvlan kvm_amd kvm radeon i2c_algo_bit drm_kms_helper ttm drm firewire_ohci ata_generic i2c_core firewire_core pata_acpi sata_sil24 crc_itu_t pata_amd usb_storage uinput
[   99.911420] CPU 2 
[   99.916957] Pid: 1179, comm: vhost-1178 Not tainted 3.9.0-301.fc19.x86_64 #1 System manufacturer System Product Name/M4N72-E
[   99.963155] RIP: 0010:[<ffffffff8153ba3c>]  [<ffffffff8153ba3c>] dev_queue_xmit+0xec/0x470
[   99.994175] RSP: 0018:ffff8802226d1ca0  EFLAGS: 00010202
[  100.016286] RAX: ffff88021a036ec0 RBX: ffff880222350c00 RCX: ffff88031a035fff
[  100.043903] RDX: 0000000000000b92 RSI: 00000000ffffffff RDI: 0000000000000000
[  100.071542] RBP: ffff8802226d1cd8 R08: 0000000000000000 R09: 0000000000000000
[  100.099173] R10: ffff880227001d00 R11: ffff88020fc223c0 R12: ffff8802238f3800
[  100.126833] R13: ffff880222562a00 R14: 000000000000000c R15: ffff880222350c9c
[  100.154480] FS:  00007ff1fd406700(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
[  100.185067] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  100.208627] CR2: ffff88031a03600b CR3: 000000020f943000 CR4: 00000000000007e0
[  100.236395] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  100.264144] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  100.291865] Process vhost-1178 (pid: 1179, threadinfo ffff8802226d0000, task ffff88021be54650)
[  100.324126] Stack:
[  100.336559]  ffff8802226d1d08 ffff880221749000 ffff880222562a00 ffff88021be49000
[  100.365447]  0000000000000b92 000000000000000c ffff880222562a00 ffff8802226d1d08
[  100.394332]  ffffffffa01ab2d2 ffff88021a3bf400 0000000000000001 0000000000000b92
[  100.423236] Call Trace:
[  100.437180]  [<ffffffffa01ab2d2>] macvlan_start_xmit+0x62/0x100 [macvlan]
[  100.464229]  [<ffffffffa00b91f5>] macvtap_get_user+0x305/0x4a0 [macvtap]
[  100.491028]  [<ffffffffa00b93bb>] macvtap_sendmsg+0x2b/0x30 [macvtap]
[  100.516990]  [<ffffffffa022cbae>] handle_tx+0x31e/0x630 [vhost_net]
[  100.542394]  [<ffffffffa022cef5>] handle_tx_kick+0x15/0x20 [vhost_net]
[  100.568582]  [<ffffffffa022981d>] vhost_worker+0xed/0x190 [vhost_net]
[  100.594523]  [<ffffffffa0229730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net]
[  100.622801]  [<ffffffff81080380>] kthread+0xc0/0xd0
[  100.644026]  [<ffffffff810802c0>] ? insert_kthread_work+0x40/0x40
[  100.668934]  [<ffffffff8164d76c>] ret_from_fork+0x7c/0xb0
[  100.691781]  [<ffffffff810802c0>] ? insert_kthread_work+0x40/0x40
[  100.716688] Code: 01 c8 41 89 55 28 66 83 78 02 00 74 3b 41 8b b5 c8 00 00 00 41 8b bd d0 00 00 00 48 01 f1 48 29 fe f6 40 06 11 0f 84 6a 03 00 00 <0f> b6 49 0c c0 e9 04 0f b6 c9 8d 0c 8e 0f b7 40 04 83 e8 01 0f 
[  100.783311] RIP  [<ffffffff8153ba3c>] dev_queue_xmit+0xec/0x470
[  100.808087]  RSP <ffff8802226d1ca0>
[  100.825506] CR2: ffff88031a03600b
[  100.892828] ---[ end trace ac2f080b1d8ffef0 ]---

Comment 7 Reartes Guillermo 2013-05-05 22:27:13 UTC
I tried changing the emulated nic from virtio to these types:

* rtl8139
  Unaffected, i tried several times the command, i dropped caches but no crash.

* pcnet
  Affected, but differently (no crash / freeze / panic)

Host: responsive via serial and ssh, but the ssh session ended when the crash was supposed to happen. It was possible to connect again via ssh to the host.
        
Guest: Looks like it was going to crash, output from 'find /' stopped. It is not possible to connect to the guest via ssh anymore. Using virt-manager shows that the guest is alive, one can login via the virt-manager console. Restarting sshd.service on the guest does not improve the situation. In fact, the guest cannot be reached by ping anymore from my F17 host. Restarting NetworkManager.service does not restore the network connectivity (ens3 does not have an ip). Stopping and Restarting NetworkManager.service does restore the network. After this i was able to perform several 'find /' commands without issue. I later rebooted the host and re-tested but i could not reproduce this.
  
* ne2k_pci
  Unaffected, i tried several times the command, i dropped caches but no crash.
  
* e1000
  Unaffected, i tried several times the command, i dropped caches but no crash.

Comment 8 Josh Boyer 2013-05-08 17:54:12 UTC
*** Bug 958936 has been marked as a duplicate of this bug. ***

Comment 9 Josh Boyer 2013-05-15 13:40:15 UTC
Can you try adding a file called /etc/modprobe.d/vhost_net.conf that contains:

options vhost_net experimental_zcopytx=0

and see if you can reproduce the issue?

Comment 10 Reartes Guillermo 2013-05-16 01:37:13 UTC
Created attachment 748544 [details]
new kernel pannic with net kernel and requested vhost_net module parameter

I tried again:

* Host: 3.9.2-301.fc19.x86_64
  /etc/modprobe.d/vhost_net.conf with options vhost_net experimental_zcopytx=0

# lsmod | grep vhost
vhost_net              33940  0 
tun                    27056  2 vhost_net
macvtap                18240  1 vhost_net

# modinfo vhost_net
filename:       /lib/modules/3.9.2-301.fc19.x86_64/kernel/drivers/vhost/vhost_net.ko
alias:          devname:vhost-net
alias:          char-major-10-238
description:    Host kernel accelerator for virtio net
author:         Michael S. Tsirkin
license:        GPL v2
version:        0.0.1
srcversion:     DF2CCD5DF82536A9CAD3CBB
depends:        tun,macvtap
intree:         Y
vermagic:       3.9.2-301.fc19.x86_64 SMP mod_unload 
signer:         Fedora kernel signing key
sig_key:        26:38:0E:8A:E1:3E:2D:5A:7F:97:3D:47:D8:53:EA:19:5D:8F:2A:E5
sig_hashalgo:   sha256
parm:           experimental_zcopytx:Enable Zero Copy TX; 1 -Enable; 0 - Disable (int)

A very handy one-liner i found over the net:

# cat /proc/modules | cut -f 1 -d " " | while read module; do  echo "Module: $module";  if [ -d "/sys/module/$module/parameters" ]; then   ls /sys/module/$module/parameters/ | while read parameter; do    echo -n "Parameter: $parameter --> ";    cat /sys/module/$module/parameters/$parameter;   done;  fi;  echo; done | grep -A 2  vhost_net
Module: vhost_net
Parameter: experimental_zcopytx --> 1

* Guest: 3.9.2-301.fc19.x86_64

Issuing a '# find /' on the Guest does still lead to a kernel panic on the host.
Attached captured serial port output.

Comment 11 Reartes Guillermo 2013-05-16 01:53:19 UTC
It seems i forgot to put the 'options' part...

# cat /etc/modprobe.d/vhost_net.conf
options vhost_net experimental_zcopytx=0

#  cat /proc/modules | cut -f 1 -d " " | while read module; do  echo "Module: $module";  if [ -d "/sys/module/$module/parameters" ]; then   ls /sys/module/$module/parameters/ | while read parameter; do    echo -n "Parameter: $parameter --> ";    cat /sys/module/$module/parameters/$parameter;   done;  fi;  echo; done | grep -A 2  vhost_net
Module: vhost_net
Parameter: experimental_zcopytx --> 0

It still happens. I will get the serial output tomorrow to confirm, but most likely yes.

Comment 12 Josh Boyer 2013-05-16 12:45:33 UTC
Jason, Michael, have either of you seen this before?  We have this bug reported, plus bug 950002 and 918015 all involving vhost_net.

Is there something upstream that isn't in 3.9 that might solve these issues?

Comment 13 Michael S. Tsirkin 2013-05-16 13:12:07 UTC
and all involving macvtap, I note.
curious if this triggers with bridge as well.

Comment 14 jason wang 2013-05-16 14:34:57 UTC
3.10 should solve the problem, I'm working on net-next and haven't tried to use 3.9 to reproduce the issue.

Comment 15 Josh Boyer 2013-05-16 15:42:16 UTC
(In reply to comment #14)
> 3.10 should solve the problem, I'm working on net-next and haven't tried to
> use 3.9 to reproduce the issue.

Great!  Can you point me to which commit fixes it?

Comment 16 Josh Boyer 2013-05-17 01:23:02 UTC
Also, we now have 963966 which is a duplicate most likely.  The reporter says it happens with 3.10-rc1.  See the oops in the dmesg in https://bugzilla.redhat.com/show_bug.cgi?id=963966#c3

Comment 17 Chris Murphy 2013-05-17 02:44:34 UTC
I mention this in the other bug report, but this is reproducible F19 qemu-kvm guest on F19 baremetal host. It is not reproducible with the same guest, on the same host running fully updated Fedora 18 with kernel 3.9.2-200.fc18.x86_64.

So I wonder if the newer qemu toosl from F19 are poking things differently, to trigger this bug in a way that F18 qemu/libvirt/virt-manager tools are not.

Comment 18 Josh Boyer 2013-05-17 14:53:51 UTC
*** Bug 961358 has been marked as a duplicate of this bug. ***

Comment 19 jason wang 2013-05-20 08:30:27 UTC
(In reply to Josh Boyer from comment #15)
> (In reply to comment #14)
> > 3.10 should solve the problem, I'm working on net-next and haven't tried to
> > use 3.9 to reproduce the issue.
> 
> Great!  Can you point me to which commit fixes it?

Commits:

9b4d669bc06c215d64f56f1eb0d4eb96e14d689d
38502af77e07b5d6650b9ff99a0b482d86366592
c1aad275b0293d2b1905ec95a945422262470684
f9ca8f74399f9195fd8e01f67a8424a8d33efa55
15e5a030716468dce4032fa0f398d840fa2756f6

Comment 20 Josh Boyer 2013-05-20 13:27:05 UTC
(In reply to jason wang from comment #19)
> (In reply to Josh Boyer from comment #15)
> > (In reply to comment #14)
> > > 3.10 should solve the problem, I'm working on net-next and haven't tried to
> > > use 3.9 to reproduce the issue.
> > 
> > Great!  Can you point me to which commit fixes it?
> 
> Commits:
> 
> 9b4d669bc06c215d64f56f1eb0d4eb96e14d689d
> 38502af77e07b5d6650b9ff99a0b482d86366592
> c1aad275b0293d2b1905ec95a945422262470684
> f9ca8f74399f9195fd8e01f67a8424a8d33efa55
> 15e5a030716468dce4032fa0f398d840fa2756f6

Hm.  So all of those appear to be in 3.10-rc1, yet we still are getting reports of this issue with rawhide kernels at that or newer.  See comment #16.

Also, none of these are in the stable kernel releases.  Should they be?

Comment 21 Chris Murphy 2013-05-20 17:31:50 UTC
Created attachment 750659 [details]
dmesg + 3.10rc1 + sysrq+w

3.10.0-0.rc1.git6.1.fc20.x86_64
echo w > /proc/sysrq-trigger

Comment 22 Chris Murphy 2013-05-20 17:33:33 UTC
Created attachment 750660 [details]
ps aux | grep qemu

qemu command line for dmesg event in comment 12.

Comment 23 Chris Murphy 2013-05-20 17:35:12 UTC
Created attachment 750661 [details]
virsh dumpxml file

Comment 24 jason wang 2013-05-21 03:08:07 UTC
(In reply to Josh Boyer from comment #20)
> (In reply to jason wang from comment #19)
> > (In reply to Josh Boyer from comment #15)
> > > (In reply to comment #14)
> > > > 3.10 should solve the problem, I'm working on net-next and haven't tried to
> > > > use 3.9 to reproduce the issue.
> > > 
> > > Great!  Can you point me to which commit fixes it?
> > 
> > Commits:
> > 
> > 9b4d669bc06c215d64f56f1eb0d4eb96e14d689d
> > 38502af77e07b5d6650b9ff99a0b482d86366592
> > c1aad275b0293d2b1905ec95a945422262470684
> > f9ca8f74399f9195fd8e01f67a8424a8d33efa55
> > 15e5a030716468dce4032fa0f398d840fa2756f6
> 
> Hm.  So all of those appear to be in 3.10-rc1, yet we still are getting
> reports of this issue with rawhide kernels at that or newer.  See comment
> #16.
> 
> Also, none of these are in the stable kernel releases.  Should they be?

Maybe but it has some dependencies, or with another choice to disable precise packet length from untrusted source. Let me ask the preference upstream. 

Thanks

Comment 25 Cole Robinson 2013-05-25 19:03:38 UTC
*** Bug 963966 has been marked as a duplicate of this bug. ***

Comment 26 Andrew Jones 2013-06-03 08:43:57 UTC
*** Bug 968587 has been marked as a duplicate of this bug. ***

Comment 27 Suren Karapetyan 2013-06-10 08:22:32 UTC
I can reproduce this on a 3.9.4-200.fc18.x86_64 and qemu-system-x86-1.2.2-11.fc18.x86_64 host with a 2.6.32-358.6.1.el6.x86_64 guest. Also using macvtap (both: with VEPA and Bridge)

Comment 28 Michael S. Tsirkin 2013-06-10 09:05:10 UTC
These are the patches to fix this I think:

http://patchwork.ozlabs.org/patch/249261/

and

http://www.gossamer-threads.com/lists/linux/kernel/1727159
(why isn't it in patchwork? Jason do you know?)

Comment 29 Josh Boyer 2013-06-11 11:59:29 UTC
(In reply to Michael S. Tsirkin from comment #28)
> These are the patches to fix this I think:
> 
> http://patchwork.ozlabs.org/patch/249261/

OK, that one Dave grabbed and queued for stable.  I'll pick it up today.

> and
> 
> http://www.gossamer-threads.com/lists/linux/kernel/1727159
> (why isn't it in patchwork? Jason do you know?)

I can't find the actual patch email anywhere.  Only your reply to it.  Perhaps it wasn't originally sent to any lists?

Comment 30 Josh Boyer 2013-06-11 13:20:53 UTC
http://patchwork.ozlabs.org/patch/250547/

magic ;)

Comment 31 Josh Boyer 2013-06-11 13:29:26 UTC
I've grabbed both of those patches now.  They will be included in the next update for each release.

Comment 32 Fedora Update System 2013-06-11 21:41:37 UTC
kernel-3.9.5-301.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.9.5-301.fc19

Comment 33 Fedora Update System 2013-06-11 21:47:13 UTC
kernel-3.9.5-101.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/FEDORA-2013-9123/kernel-3.9.5-101.fc17

Comment 34 Fedora Update System 2013-06-11 21:57:06 UTC
kernel-3.9.5-201.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.9.5-201.fc18

Comment 35 Fedora Update System 2013-06-12 19:12:44 UTC
Package kernel-3.9.5-301.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.9.5-301.fc19'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-10689/kernel-3.9.5-301.fc19
then log in and leave karma (feedback).

Comment 36 Fedora Update System 2013-06-13 06:03:40 UTC
kernel-3.9.5-201.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 37 Paul Fee 2013-06-13 13:58:50 UTC
Applied kernel update to F18 system, using virtio NIC within KVM guest.  Hard Kernel panic still occurs, power cycle required to recover.

Workaround: Change NIC "hardware" present to guest from virtio to e1000.

Kernel version: 3.9.5-201.fc18.x86_64

Comment 38 Josh Boyer 2013-06-13 15:06:15 UTC
(In reply to Paul Fee from comment #37)
> Applied kernel update to F18 system, using virtio NIC within KVM guest. 
> Hard Kernel panic still occurs, power cycle required to recover.
> 
> Workaround: Change NIC "hardware" present to guest from virtio to e1000.
> 
> Kernel version: 3.9.5-201.fc18.x86_64

Can you please post the panic backtrace?

Comment 39 Paul Fee 2013-06-13 15:10:31 UTC
Created attachment 760756 [details]
Photo of screen after panic - Kernel 3.9.5-201.fc18

Kernel panic when using virtio NICs within KVM guests.  Host kernel 3.5.9-201.fc18.

Comment 40 Tomas Kopecek 2013-06-13 15:15:51 UTC
It failed in 70% of cases to such state, that host hungs and /var/log/messages is not synced. In the rest output looks like this.

Kernel: 3.9.5-201.fc18.x86_64
libvirt: 0.10.2.5-1.fc18.x86_64

Jun 13 16:47:02 localhost kernel: [  494.765547] general protection fault: 0000 [#1] SMP 
Jun 13 16:47:02 localhost kernel: [  494.767777] Modules linked in: ebtable_nat xt_CHECKSUM ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle bridge stp llc lockd sunrpc bnep bluetooth ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_hdmi arc4 iwldvm mac80211 snd_hda_codec_conexant iwlwifi cfg80211 thinkpad_acpi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e uvcvideo snd_page_alloc snd_timer snd rfkill videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media mei iTCO_wdt iTCO_vendor_support lpc_ich ptp pps_core acpi_cpufreq mperf coretemp soundcore microcode i2c_i801 mfd_core vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc dm_crypt crc32_pclmul crc32c_intel i915 ghash_clmulni_intel i2c_algo_bit drm_kms_helper sdhci_pci drm sdhci mmc_core i2c_core wmi video
Jun 13 16:47:02 localhost kernel: [  494.787393] CPU 1                                                                                                                                                          
Jun 13 16:47:02 localhost kernel: [  494.787405] Pid: 2719, comm: qemu-kvm Not tainted 3.9.5-201.fc18.x86_64 #1 LENOVO 4291EJ3/4291EJ3                                                                          
Jun 13 16:47:02 localhost kernel: [  494.789378] RIP: 0010:[<ffffffff81186869>]  [<ffffffff81186869>] kfree+0x69/0x190                                                                                          
Jun 13 16:47:02 localhost kernel: [  494.790500] RSP: 0018:ffff8801a44e9b28  EFLAGS: 00010207                                                                                                                   
Jun 13 16:47:02 localhost kernel: [  494.793572] RAX: 037ab5e000008000 RBX: dead000000200200 RCX: 00003ffffffff000                                                                                              
Jun 13 16:47:02 localhost kernel: [  494.796911] RDX: 000077ff80000000 RSI: 0000000000000080 RDI: dead000000200200                                                                                              
Jun 13 16:47:02 localhost kernel: [  494.800306] RBP: ffff8801a44e9b68 R08: 0000000000000004 R09: 0000000000000002                                                                                              
Jun 13 16:47:02 localhost kernel: [  494.803705] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8801fa895940                                                                                              
Jun 13 16:47:02 localhost kernel: [  494.807101] R13: 037a9fe000008000 R14: 0000000000000001 R15: ffff8801a44e9cc8                                                                                              
Jun 13 16:47:02 localhost kernel: [  494.810570] FS:  00007fbc91c2b700(0000) GS:ffff88021e240000(0000) knlGS:0000000000000000                                                                                   
Jun 13 16:47:02 localhost kernel: [  494.814039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                              
Jun 13 16:47:02 localhost kernel: [  494.817561] CR2: 0000000000000000 CR3: 00000001ba3d5000 CR4: 00000000000427e0                                                                                              
Jun 13 16:47:02 localhost kernel: [  494.821137] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                                              
Jun 13 16:47:02 localhost kernel: [  494.824775] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400                                                                                              
Jun 13 16:47:02 localhost kernel: [  494.828420] Process qemu-kvm (pid: 2719, threadinfo ffff8801a44e8000, task ffff88019a4bc650)                                                                               
Jun 13 16:47:02 localhost kernel: [  494.832111] Stack:                                                                                                                                                         
Jun 13 16:47:02 localhost kernel: [  494.835753]  ffff8801f9516880 0000000000000001 0000000000000001 ffffc9000577a000                                                                                           
Jun 13 16:47:02 localhost kernel: [  494.838161]  ffff8801fa895940 0000000000000001 0000000000000001 ffff8801a44e9cc8                                                                                           
Jun 13 16:47:02 localhost kernel: [  494.841744]  ffff8801a44e9b98 ffffffff8116f24a 0000000000000001 0000000000000400                                                                                           
Jun 13 16:47:02 localhost kernel: [  494.845422] Call Trace:                                                                                                                                                    
Jun 13 16:47:03 localhost kernel: [  494.849098]  [<ffffffff8116f24a>] __vunmap+0xea/0x110                                                                                                                      
Jun 13 16:47:03 localhost kernel: [  494.852868]  [<ffffffff8116f378>] vfree+0x28/0x30                                                                                                                          
Jun 13 16:47:03 localhost kernel: [  494.856638]  [<ffffffff8116f1f4>] __vunmap+0x94/0x110                                                                                                                      
Jun 13 16:47:03 localhost kernel: [  494.860402]  [<ffffffff8116f378>] vfree+0x28/0x30                                                                                                                                                       
Jun 13 16:47:03 localhost kernel: [  494.864180]  [<ffffffffa0185845>] kvm_kvfree+0x35/0x40 [kvm]                                                                                                                                            
Jun 13 16:47:03 localhost kernel: [  494.867927]  [<ffffffffa019f265>] kvm_arch_free_memslot+0x55/0x100 [kvm]                                                                                                                                
Jun 13 16:47:03 localhost kernel: [  494.871760]  [<ffffffffa01863d6>] __kvm_set_memory_region+0x376/0x790 [kvm]                                                                                                                             
Jun 13 16:47:03 localhost kernel: [  494.875532]  [<ffffffffa0186831>] kvm_set_memory_region+0x41/0x70 [kvm]                                                                                                                                 
Jun 13 16:47:03 localhost kernel: [  494.879381]  [<ffffffffa018687b>] kvm_vm_ioctl_set_memory_region+0x1b/0x20 [kvm]                                                                                                                        
Jun 13 16:47:03 localhost kernel: [  494.883188]  [<ffffffffa0186cd5>] kvm_vm_ioctl+0x455/0x580 [kvm]                                                                                                                                        
Jun 13 16:47:03 localhost kernel: [  494.887054]  [<ffffffff81071c01>] ? dequeue_signal+0x41/0x170                                                                                                                                           
Jun 13 16:47:03 localhost kernel: [  494.890836]  [<ffffffff811b17e7>] do_vfs_ioctl+0x97/0x580                                                                                                                                               
Jun 13 16:47:03 localhost kernel: [  494.894623]  [<ffffffff810751d4>] ? sys_rt_sigtimedwait+0xd4/0x100                                                                                                                                      
Jun 13 16:47:03 localhost kernel: [  494.898390]  [<ffffffff811b1d61>] sys_ioctl+0x91/0xb0
Jun 13 16:47:03 localhost kernel: [  494.902235]  [<ffffffff8166a2d9>] system_call_fastpath+0x16/0x1b
Jun 13 16:47:03 localhost kernel: [  494.905986] Code: 00 00 00 80 ff 77 00 00 49 bd 00 00 00 00 00 ea ff ff 48 01 d8 48 0f 42 15 b5 c7 a8 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 49 01 c5 <49> 8b 45 00 f6 c4 80 0f 85 06 01 00 00 49 8b 45 00 a8 80 0f 84 
Jun 13 16:47:03 localhost kernel: [  494.914446] RIP  [<ffffffff81186869>] kfree+0x69/0x190
Jun 13 16:47:03 localhost kernel: [  494.918627]  RSP <ffff8801a44e9b28>
Jun 13 16:47:03 localhost kernel: [  494.933715] ---[ end trace cb6a4a6df843e00b ]---

Comment 41 Michael S. Tsirkin 2013-06-13 15:57:33 UTC
were any warnings in log before that?

Comment 42 Paul Fee 2013-06-13 16:21:07 UTC
I see no warnings proceeding the crash in /var/log/messages.  The crash is sudden and severe.

BTW, as suggested earlier /etc/modprobe.d/vhost_net.conf contains:
  options vhost_net experimental_zcopytx=0

I can see it's taken effect with:
$ cat /sys/module/vhost_net/parameters/experimental_zcopytx
0

However that workaround hasn't helped avoid crashes on my system.  Changing the guest NIC type to e1000 has proven to be a more reliable workaround, but incurs a performance hit.

Comment 43 Michael S. Tsirkin 2013-06-13 21:15:45 UTC
How about disabling vhost_net?
Does this help when using virtio?

Comment 44 Fedora Update System 2013-06-14 04:50:01 UTC
kernel-3.9.5-301.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 45 Paul Fee 2013-06-14 08:59:30 UTC
I've disabled vhost_net and restarted my VMs.

Procedure:
# virsh edit <guest name>

For each interface section, beside "<model type='virtio'/>", inserted:
  <driver name='qemu'/>

Confirmed that QEMU process doesn't mention "vhost":
# ps -ef | grep qemu | grep vhost

Previously vhost_net would cause a kernel panic within a couple of hours of use.  I'll report back later.  If this works, it will be a better workaround than using e1000.

Comment 46 Michael S. Tsirkin 2013-06-14 11:26:51 UTC
Another thing to try (in any case) is debug kernel.
You might get some warnings before the crash with that.

Comment 47 Paul Fee 2013-06-14 12:40:55 UTC
My F18 host has been running for almost four hours now without vhost induced crash, therefore disabling vhost_net but continuing to use virtio NICs within the guest VMs is proving to be a suitable workaround.

Comment 48 Paul Fee 2013-06-14 14:32:19 UTC
Perhaps I spoke to soon.  I've had a couple of freezes now, console remains showing X desktop, doesn't switch to text mode with kernel panic dump.  Keyboard LEDs stopped working and machine was no longer ping-able.  I can't be certain it's related to virtio as no debug info is available.

I had installed and booted from debug kernel (3.9.5-201.fc18.x86_64.debug).

No useful additional information in /var/log/messages proceeding reboot.  Though I do see a sequence of 218 null bytes (binary value zero in hex dump) between the last normal message before the freeze and the first message logged by rsyslog following a reboot.

Are there steps I can take to provide capture better debugging information?

Comment 49 Laurent Jacquot 2013-06-14 20:52:47 UTC
you should install kdump: you'll be able to capture the state of the kernel and for example extract the backtrace on each cpu: see the related bug #968587 and http://docs.fedoraproject.org/en-US/Fedora/16/html/System_Administrators_Guide/ch-kdump.html
I hope you're able to debug it further!

Comment 50 Paul Fee 2013-06-14 22:31:53 UTC
Thanks for the kdump tip, that should help a lot.  It'll be likely be Tuesday 2013-06-18 before I can try it.

Comment 51 jason wang 2013-06-17 07:51:13 UTC
Looks like the commits I required were still not in 3.9.y. I will ask Greg to include those explicitly.

Comment 52 Severin Gehwolf 2013-06-18 16:59:20 UTC
I'm running kernel kernel-3.9.5-201.fc18.x86_64 and my host system freezes when I fire up child VMs with virtio as NIC device. Therefore, can't use the kernel which came with lastest F18 update :(

Are you guys aware?

Comment 53 Severin Gehwolf 2013-06-18 17:01:11 UTC
FWIW, 3.9.4-200.fc18.x86_64 does not seem to have this issue.

Comment 54 Stephen John Smoogen 2013-06-18 19:04:06 UTC
I started to have this problem after updating to kernel-3.9.5-301.fc19.x86_64

System is a Lenovo T520 and the lockups occur when trying to install Red Hat EL 5.9 into a virtual machine. System locks up during any intense network action on guest or host.

Comment 55 Paul Fee 2013-06-21 09:46:29 UTC
Hi Laurent,

As suggested in comment #49, I tried installing kdump.  The system-config-kdump GUI failed to configure grub correctly.  I manually added "crashkernel=128M" to /etc/grub2.cfg and that allowed Linux to boot and reserve memory for kdump. "systemctl status kdump" would then indicate the service had started.

However testing with "echo 1 > /proc/sys/kernel/sysrq; echo c > /proc/sysrq-trigger" failed to write the crash dump to disk.

If it's been determined that the patches didn't actually make it into 3.9.5, then I'll wait for the next kernel update.

Comment 56 Paul Fee 2013-06-21 09:47:56 UTC
Hi Jason,

In comment #51 you mentioned the vhost_net patches aren't actually in 3.9.x yet.  I've had crashes with 3.9.5-201.fc18.x86_64.  I've upgraded to kernel-3.9.6-200.fc18.x86_64 and 3.9.7 is on it's way.  Do you know if these have the necessary patches?

Is it worth switching my VM's from e1000 NICs to virtio with vhost_net enabled to test if the fix is working?

Comment 57 jason wang 2013-06-21 11:24:51 UTC
(In reply to Paul Fee from comment #56)
> Hi Jason,
> 
> In comment #51 you mentioned the vhost_net patches aren't actually in 3.9.x
> yet.  I've had crashes with 3.9.5-201.fc18.x86_64.  I've upgraded to
> kernel-3.9.6-200.fc18.x86_64 and 3.9.7 is on it's way.  Do you know if these
> have the necessary patches?
> 
> Is it worth switching my VM's from e1000 NICs to virtio with vhost_net
> enabled to test if the fix is working?

Unfortunately, they were still not in 3.9.7. I've pinged Dave upstream to include those patches. Looks like we still need to wait for those patches.

Comment 58 Josh Boyer 2013-06-21 12:44:59 UTC
Below is a scratch build with all the patches mentioned, plus another that Michael found for a use-after-free issue in vhost_net.  Stephen Smoogen tested it successfully yesterday.  Please test.

http://koji.fedoraproject.org/koji/taskinfo?taskID=5523794

Comment 59 Suren Karapetyan 2013-06-24 21:49:18 UTC
Neither 3.9.6-200.fc18 nor 3.9.7-200.fc18 fix this for me. I can trigger this pretty consistently.

Comment 60 Michael S. Tsirkin 2013-06-25 15:40:39 UTC
Does the scratch build from Comment 58 help?

Comment 61 Suren Karapetyan 2013-06-25 17:18:28 UTC
Haven't tried cause i'm on Fedora 18 and wasn't sure it must work. Showd I rebuild the SRPM or just install the fc19 version?

Comment 62 Pekka Pietikäinen 2013-07-03 12:38:15 UTC
Scratch build is gone already so couldn't test that, #976789 seems to be leftovers from this bug.

Comment 63 Josh Boyer 2013-07-03 13:12:57 UTC
For all of those having trouble with vhost and/or bridging in guests, please try the scratch build below when it completes.  It contains the patch from bug 880035 for the timer fix and the use-after-free fix for vhost-net backported to 3.9.8.

http://koji.fedoraproject.org/koji/taskinfo?taskID=5569247

Comment 64 Josh Boyer 2013-07-03 14:22:53 UTC
Sigh.  Of course, it would help if I didn't typo the patch.  Anyway, here is a scratch build that should actually finish building:

http://koji.fedoraproject.org/koji/taskinfo?taskID=5569571

Comment 65 Josh Boyer 2013-07-03 16:36:42 UTC
Third time is a charm.  This one actually looks like it built.  Sigh, sorry about that.

http://koji.fedoraproject.org/koji/taskinfo?taskID=5569631

Comment 66 Stephen John Smoogen 2013-07-05 23:15:06 UTC
Installed 
http://koji.fedoraproject.org/koji/taskinfo?taskID=5569631 debug kernel. No crash occurred in RHEL-5 install/running.

Comment 67 Suren Karapetyan 2013-07-09 06:49:37 UTC
Can we have a build for fc18 or does kernel-3.9.9-201.fc18 include these fixes?

Comment 68 Josh Boyer 2013-07-09 11:45:50 UTC
(In reply to Suren Karapetyan from comment #67)
> Can we have a build for fc18 or does kernel-3.9.9-201.fc18 include these
> fixes?

It likely should, yes.  None of the original reporters confirmed the scratch build in comment #65 worked, so I did not include this bug in the list that 3.9.9-201.fc18 fixes.  Testing that would be much appreciated.

Comment 69 Suren Karapetyan 2013-07-09 20:35:14 UTC
kernel-3.9.9-201.fc18 at least seems to have fixed this for me (fedora 18 host centos 6.4 guest). The host would crash in a few seconds after starting the guest but it's up for a few minutes now. If no problems arise till the morning i'd say this bug is fixed for me. Thank you.

Comment 70 Suren Karapetyan 2013-07-10 06:48:17 UTC
10 hours uptime - still no crash: kernel-3.9.9-201.fc18 fixes this bug for me.

Comment 71 Josh Boyer 2013-07-10 12:05:24 UTC
Great, thanks for testing.

Comment 72 Josh Boyer 2013-07-12 13:27:02 UTC
The same fixes that went into F18 for this issue are in this F19 update:

https://admin.fedoraproject.org/updates/kernel-3.9.9-302.fc19

Comment 73 Laurent Jacquot 2013-07-13 09:49:18 UTC
kernel-3.9.9-201.fc18 fixes also my issue (described in #968587)

Comment 74 Ivo Sir 2013-07-19 10:00:36 UTC
Hi all,

Yesterday night, I experienced another crash. Both the virtual machine and the host got stuck during a heavier network activity (rsync transfer). Hard reset of the host was needed. Not sure it's related to the bug though...?

host: Fedora 18, kernel 3.9.9-201.fc18.x86_64
guest: Debian 6, kernel 2.6.32-5-amd64

Jul 17 23:14:51 kvm4 kernel: [199727.539014] ------------[ cut here ]------------
Jul 17 23:14:51 kvm4 kernel: [199727.539037] WARNING: at lib/list_debug.c:33 __list_add+0xbe/0xd0()
Jul 17 23:14:51 kvm4 kernel: [199727.539042] Hardware name: PowerEdge M605
Jul 17 23:14:51 kvm4 kernel: [199727.539048] list_add corruption. prev->next should be next (ffffffff81ed9778), but was           (null). (prev=ffff880401caa940).
Jul 17 23:14:51 kvm4 kernel: [199727.539051] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables dm_service_time scsi_dh_rdac bridge stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net nv_tco dcdbas i2c_nforce2 acpi_cpufreq k10temp tun macvtap macvlan shpchp mperf kvm_amd kvm serio_raw amd64_edac_mod edac_core edac_mce_amd bnx2 microcode usb_storage i2c_algo_bit drm_kms_helper ttm mptsas drm mptscsih mptbase i2c_core scsi_transport_sas dm_multipath
Jul 17 23:14:51 kvm4 kernel: [199727.539146] Pid: 0, comm: swapper/0 Not tainted 3.9.9-201.fc18.x86_64 #1
Jul 17 23:14:51 kvm4 kernel: [199727.539150] Call Trace:
Jul 17 23:14:51 kvm4 kernel: [199727.539155]  <IRQ>  [<ffffffff8105efc5>] warn_slowpath_common+0x75/0xa0
Jul 17 23:14:51 kvm4 kernel: [199727.539176]  [<ffffffff8105f0a6>] warn_slowpath_fmt+0x46/0x50
Jul 17 23:14:51 kvm4 kernel: [199727.539184]  [<ffffffff813171fe>] __list_add+0xbe/0xd0
Jul 17 23:14:51 kvm4 kernel: [199727.539192]  [<ffffffff8106e4c3>] __internal_add_timer+0x113/0x130
Jul 17 23:14:51 kvm4 kernel: [199727.539199]  [<ffffffff8106ead0>] internal_add_timer+0x20/0x50
Jul 17 23:14:51 kvm4 kernel: [199727.539206]  [<ffffffff8106fde4>] mod_timer+0x124/0x200
Jul 17 23:14:51 kvm4 kernel: [199727.539229]  [<ffffffffa033d562>] br_multicast_rcv+0x862/0x1330 [bridge]
Jul 17 23:14:51 kvm4 kernel: [199727.539240]  [<ffffffff81581a66>] ? nf_iterate+0x86/0xb0
Jul 17 23:14:51 kvm4 kernel: [199727.539255]  [<ffffffffa03334a0>] ? br_handle_local_finish+0x60/0x60 [bridge]
Jul 17 23:14:51 kvm4 kernel: [199727.539270]  [<ffffffffa03336f2>] br_handle_frame_finish+0x252/0x330 [bridge]
Jul 17 23:14:51 kvm4 kernel: [199727.539284]  [<ffffffffa0333946>] br_handle_frame+0x176/0x280 [bridge]
Jul 17 23:14:51 kvm4 kernel: [199727.539294]  [<ffffffff81553f52>] __netif_receive_skb_core+0x352/0x7f0
Jul 17 23:14:51 kvm4 kernel: [199727.539302]  [<ffffffff81554411>] __netif_receive_skb+0x21/0x70
Jul 17 23:14:51 kvm4 kernel: [199727.539310]  [<ffffffff81554613>] netif_receive_skb+0x33/0xb0
Jul 17 23:14:51 kvm4 kernel: [199727.539317]  [<ffffffff81555038>] napi_gro_receive+0x98/0xd0
Jul 17 23:14:51 kvm4 kernel: [199727.539335]  [<ffffffffa0131694>] bnx2_poll_work+0x804/0x1390 [bnx2]
Jul 17 23:14:51 kvm4 kernel: [199727.539349]  [<ffffffffa0132342>] bnx2_poll+0x62/0x274 [bnx2]
Jul 17 23:14:51 kvm4 kernel: [199727.539358]  [<ffffffff81662c2d>] ? common_interrupt+0x6d/0x6d
Jul 17 23:14:51 kvm4 kernel: [199727.539366]  [<ffffffff81554d19>] net_rx_action+0x149/0x240
Jul 17 23:14:51 kvm4 kernel: [199727.539374]  [<ffffffff81067678>] __do_softirq+0xe8/0x230
Jul 17 23:14:51 kvm4 kernel: [199727.539382]  [<ffffffff81067945>] irq_exit+0xa5/0xb0
Jul 17 23:14:51 kvm4 kernel: [199727.539391]  [<ffffffff8166cc23>] do_IRQ+0x63/0xe0
Jul 17 23:14:51 kvm4 kernel: [199727.539399]  [<ffffffff81662c2d>] common_interrupt+0x6d/0x6d
Jul 17 23:14:51 kvm4 kernel: [199727.539403]  <EOI>  [<ffffffff81044136>] ? native_safe_halt+0x6/0x10
Jul 17 23:14:51 kvm4 kernel: [199727.539419]  [<ffffffff8101c651>] default_idle+0x41/0x100
Jul 17 23:14:51 kvm4 kernel: [199727.539426]  [<ffffffff8101c769>] amd_e400_idle+0x59/0x120
Jul 17 23:14:51 kvm4 kernel: [199727.539434]  [<ffffffff8101d17e>] cpu_idle+0xfe/0x120
Jul 17 23:14:51 kvm4 kernel: [199727.539441]  [<ffffffff81647342>] rest_init+0x72/0x80
Jul 17 23:14:51 kvm4 kernel: [199727.539449]  [<ffffffff81d03ed6>] start_kernel+0x3f2/0x3ff
Jul 17 23:14:51 kvm4 kernel: [199727.539456]  [<ffffffff81d038e3>] ? repair_env_string+0x5e/0x5e
Jul 17 23:14:51 kvm4 kernel: [199727.539467]  [<ffffffff81d035dc>] x86_64_start_reservations+0x2a/0x2c
Jul 17 23:14:51 kvm4 kernel: [199727.539473]  [<ffffffff81d036cf>] x86_64_start_kernel+0xf1/0x100
Jul 17 23:14:51 kvm4 kernel: [199727.539479] ---[ end trace b309ba560eecc4d1 ]---
Jul 17 23:14:51 kvm4 kernel: [199727.539482] ------------[ cut here ]------------
Jul 17 23:14:51 kvm4 kernel: [199727.539489] WARNING: at lib/list_debug.c:36 __list_add+0x9c/0xd0()
Jul 17 23:14:51 kvm4 kernel: [199727.539492] Hardware name: PowerEdge M605
Jul 17 23:14:51 kvm4 kernel: [199727.539497] list_add double add: new=ffff880401caa940, prev=ffff880401caa940, next=ffffffff81ed9778.
Jul 17 23:14:51 kvm4 kernel: [199727.539500] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables dm_service_time scsi_dh_rdac bridge stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net nv_tco dcdbas i2c_nforce2 acpi_cpufreq k10temp tun macvtap macvlan shpchp mperf kvm_amd kvm serio_raw amd64_edac_mod edac_core edac_mce_amd bnx2 microcode usb_storage i2c_algo_bit drm_kms_helper ttm mptsas drm mptscsih mptbase i2c_core scsi_transport_sas dm_multipath
Jul 17 23:14:51 kvm4 kernel: [199727.539568] Pid: 0, comm: swapper/0 Tainted: G        W    3.9.9-201.fc18.x86_64 #1
Jul 17 23:14:51 kvm4 kernel: [199727.539572] Call Trace:
Jul 17 23:14:51 kvm4 kernel: [199727.539575]  <IRQ>  [<ffffffff8105efc5>] warn_slowpath_common+0x75/0xa0
Jul 17 23:14:51 kvm4 kernel: [199727.539589]  [<ffffffff8105f0a6>] warn_slowpath_fmt+0x46/0x50
Jul 17 23:14:51 kvm4 kernel: [199727.539597]  [<ffffffff813171dc>] __list_add+0x9c/0xd0
Jul 17 23:14:51 kvm4 kernel: [199727.539604]  [<ffffffff8106e4c3>] __internal_add_timer+0x113/0x130
Jul 17 23:14:51 kvm4 kernel: [199727.539610]  [<ffffffff8106ead0>] internal_add_timer+0x20/0x50
Jul 17 23:14:51 kvm4 kernel: [199727.539617]  [<ffffffff8106fde4>] mod_timer+0x124/0x200
Jul 17 23:14:51 kvm4 kernel: [199727.539634]  [<ffffffffa033d562>] br_multicast_rcv+0x862/0x1330 [bridge]
Jul 17 23:14:51 kvm4 kernel: [199727.539642]  [<ffffffff81581a66>] ? nf_iterate+0x86/0xb0
Jul 17 23:14:51 kvm4 kernel: [199727.539656]  [<ffffffffa03334a0>] ? br_handle_local_finish+0x60/0x60 [bridge]
Jul 17 23:14:51 kvm4 kernel: [199727.539671]  [<ffffffffa03336f2>] br_handle_frame_finish+0x252/0x330 [bridge]
Jul 17 23:14:51 kvm4 kernel: [199727.539685]  [<ffffffffa0333946>] br_handle_frame+0x176/0x280 [bridge]
Jul 17 23:14:51 kvm4 kernel: [199727.539693]  [<ffffffff81553f52>] __netif_receive_skb_core+0x352/0x7f0
Jul 17 23:14:51 kvm4 kernel: [199727.539701]  [<ffffffff81554411>] __netif_receive_skb+0x21/0x70
Jul 17 23:14:51 kvm4 kernel: [199727.539709]  [<ffffffff81554613>] netif_receive_skb+0x33/0xb0
Jul 17 23:14:51 kvm4 kernel: [199727.539716]  [<ffffffff81555038>] napi_gro_receive+0x98/0xd0
Jul 17 23:14:51 kvm4 kernel: [199727.539730]  [<ffffffffa0131694>] bnx2_poll_work+0x804/0x1390 [bnx2]
Jul 17 23:14:51 kvm4 kernel: [199727.539743]  [<ffffffffa0132342>] bnx2_poll+0x62/0x274 [bnx2]
Jul 17 23:14:51 kvm4 kernel: [199727.539751]  [<ffffffff81662c2d>] ? common_interrupt+0x6d/0x6d
Jul 17 23:14:51 kvm4 kernel: [199727.539759]  [<ffffffff81554d19>] net_rx_action+0x149/0x240
Jul 17 23:14:51 kvm4 kernel: [199727.539767]  [<ffffffff81067678>] __do_softirq+0xe8/0x230
Jul 17 23:14:51 kvm4 kernel: [199727.539774]  [<ffffffff81067945>] irq_exit+0xa5/0xb0
Jul 17 23:14:51 kvm4 kernel: [199727.539782]  [<ffffffff8166cc23>] do_IRQ+0x63/0xe0
Jul 17 23:14:51 kvm4 kernel: [199727.539790]  [<ffffffff81662c2d>] common_interrupt+0x6d/0x6d
Jul 17 23:14:51 kvm4 kernel: [199727.539793]  <EOI>  [<ffffffff81044136>] ? native_safe_halt+0x6/0x10
Jul 17 23:14:51 kvm4 kernel: [199727.539807]  [<ffffffff8101c651>] default_idle+0x41/0x100
Jul 17 23:14:51 kvm4 kernel: [199727.539814]  [<ffffffff8101c769>] amd_e400_idle+0x59/0x120
Jul 17 23:14:51 kvm4 kernel: [199727.539822]  [<ffffffff8101d17e>] cpu_idle+0xfe/0x120
Jul 17 23:14:51 kvm4 kernel: [199727.539828]  [<ffffffff81647342>] rest_init+0x72/0x80
Jul 17 23:14:51 kvm4 kernel: [199727.539835]  [<ffffffff81d03ed6>] start_kernel+0x3f2/0x3ff
Jul 17 23:14:51 kvm4 kernel: [199727.539841]  [<ffffffff81d038e3>] ? repair_env_string+0x5e/0x5e
Jul 17 23:14:51 kvm4 kernel: [199727.539851]  [<ffffffff81d035dc>] x86_64_start_reservations+0x2a/0x2c
Jul 17 23:14:51 kvm4 kernel: [199727.539857]  [<ffffffff81d036cf>] x86_64_start_kernel+0xf1/0x100
Jul 17 23:14:51 kvm4 kernel: [199727.539862] ---[ end trace b309ba560eecc4d2 ]---

Many thanks!

Ivo

Comment 75 Josh Boyer 2013-09-18 20:34:03 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 76 Josh Boyer 2013-10-08 17:22:30 UTC
This was fixed a while ago.