Description of problem: I noticed the kernel oops in the logs this morning. Additionally, a Windows 7 VM is now unable to use the RH VirtIO virtual NIC (Other VMs' networking is functioning just fine). I also attempted to shutdown the Windows 7 VM gracefully. It never got past "Logging Off". Now, "#virsh destroy mnich-win7" fails with: error: Failed to destroy domain mnich-win7 error: operation failed: failed to kill qemu process with SIGTERM virt-manager now cannot connect. I can't kill the qemu-kvm process manually either, i.e. it's a wedged zombie: qemu 3117 8.9 26.0 11249264 8563280 ? D Jun14 391:02 /usr/bin/qemu-kvm -name mnich-win7 -S -M pc-1.2 -enable-kvm -m 8192 -smp 2,sockets=2,cores=1,threads=1 -uuid aba8fb50-d883-d9ce-3706-51574b8dad78 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mnich-win7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-ccid,id=ccid0 -drive file=/raid1/vms/mnich-win7.img,if=none,id=drive-virtio-disk0,format=qed,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/raid1/iso/virtio-win-0.1-59.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:b7:97:fb,bus=pci.0,addr=0x3 -device ccid-card-emulated,backend=nss-emulated,id=smartcard0,bus=ccid0.0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 I tried killing the process with SIGTERM, SIGILL, SIGSEGV, and SIGKILL. It doesn't notice at all. I think rebooting is all I can do now. Additional info: reporter: libreport-2.1.4 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81141851>] put_page+0x11/0x60 PGD 0 Oops: 0000 [#1] SMP Modules linked in: lp parport tcp_lp ebtable_nat xt_CHECKSUM bridge stp llc fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack be2iscsi iscsi_boot_sysfs bnep bluetooth bnx2i cnic uio ebtable_filter cxgb4i cxgb4 ebtables rfkill cxgb3i ip6table_filter ip6_tables cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_hdmi snd_hda_codec_realtek raid1 snd_hda_intel snd_hda_codec iTCO_wdt dcdbas snd_hwdep snd_seq snd_seq_device iTCO_vendor_support snd_pcm vhost_net snd_page_alloc tun macvtap macvlan acpi_cpufreq mperf coretemp snd_timer snd i2c_i801 soundcore e1000e ptp pps_core mei lpc_ich mfd_core usblp kvm_intel kvm microcode tpm_tis joydev serio_raw tpm tpm_bios binfmt_misc uinput dm_crypt radeon crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper ttm ghash_clmulni_intel drm i2c_core video CPU 1 Pid: 3118, comm: vhost-3117 Not tainted 3.9.5-201.fc18.x86_64 #1 Dell Inc. OptiPlex 9010 RIP: 0010:[<ffffffff81141851>] [<ffffffff81141851>] put_page+0x11/0x60 RSP: 0018:ffff8804ba6e7bf8 EFLAGS: 00010292 RAX: ffff8807bd9f1cc0 RBX: 0000000000000000 RCX: ffffea000c6b801c RDX: 0000000000000140 RSI: 0000000000000246 RDI: 0000000000000000 RBP: ffff8804ba6e7c08 R08: ffff8807a3774518 R09: 0000000000000010 R10: 0000000000000000 R11: 00007f0c40000000 R12: ffff880757f5a700 R13: ffffffffa01b3b2c R14: ffff880757f5a700 R15: ffff8807c00e8800 FS: 0000000000000000(0000) GS:ffff88081dc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000696f6e000 CR4: 00000000001427f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process vhost-3117 (pid: 3118, threadinfo ffff8804ba6e6000, task ffff880760768000) Stack: ffffea000c81a540 0000000000000012 ffff8804ba6e7c28 ffffffff81545e3f ffff880757f5a700 ffff880757f5a700 ffff8804ba6e7c48 ffffffff81545ede ffffea000cacbe80 ffff8807a37743d8 ffff8804ba6e7c78 ffffffff81545f96 Call Trace: [<ffffffff81545e3f>] skb_release_data+0x8f/0x110 [<ffffffff81545ede>] __kfree_skb+0x1e/0xa0 [<ffffffff81545f96>] kfree_skb+0x36/0xa0 [<ffffffffa01b3b2c>] tun_get_user+0x69c/0x7a0 [tun] [<ffffffffa01b3c87>] tun_sendmsg+0x57/0x80 [tun] [<ffffffffa02fad87>] handle_tx+0x287/0x680 [vhost_net] [<ffffffffa02fb1b5>] handle_tx_kick+0x15/0x20 [vhost_net] [<ffffffffa02f795d>] vhost_worker+0xed/0x190 [vhost_net] [<ffffffffa02f7870>] ? vhost_work_flush+0x110/0x110 [vhost_net] [<ffffffff81082ae0>] kthread+0xc0/0xd0 [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_flush+0x20/0xb0 [<ffffffff81082a20>] ? kthread_create_on_node+0x120/0x120 [<ffffffff8166a22c>] ret_from_fork+0x7c/0xb0 [<ffffffff81082a20>] ? kthread_create_on_node+0x120/0x120 Code: 45 fc 65 48 01 04 25 70 02 01 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 <48> f7 07 00 c0 00 00 75 34 8b 47 1c 85 c0 74 1a f0 ff 4b 1c 0f RIP [<ffffffff81141851>] put_page+0x11/0x60 RSP <ffff8804ba6e7bf8> CR2: 0000000000000000
Created attachment 762043 [details] File: dmesg
*** Bug 975187 has been marked as a duplicate of this bug. ***
*** Bug 975306 has been marked as a duplicate of this bug. ***
*** Bug 975615 has been marked as a duplicate of this bug. ***
Based on reports this appears to be a regression from 3.9.5-200 to 3.9.5-201, I'm suspicious of Fedora commit commit ce5fae0641cc7b51f5dd43ce2c4b00572d6e5bd0 Author: Josh Boyer <jwboyer> Date: Tue Jun 11 09:26:41 2013 -0400 Add two patches to fix issues with vhost_net and macvlan (rhbz 954181) which is a backport of two patch from Jason Wang. Adding him to CC. I've already added Michael Tsirkin.
Please test this scratch build when it completes: http://koji.fedoraproject.org/koji/taskinfo?taskID=5520279 It contains the currently queued networking patches for 3.9.y stable, which includes a number of patches Jason asked to be included.
I've got this installed. I can pretty much make this happen at will by mounting an iscsi disk exported from the f19 host on a rhel6 VM and then doing something like a find on the mount. I should be able to report back success/failure later today.
No joy. [ 9241.659487] Modules linked in: vhost_net macvtap macvlan hidp fuse ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle tun bridge stp llc rfcomm bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_conntrack nf_conntrack ip6_tables xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm arc4 iwldvm mac80211 mperf coretemp snd_page_alloc btusb kvm_intel bluetooth snd_timer kvm iTCO_wdt iTCO_vendor_support iwlwifi uvcvideo lpc_ich cfg80211 microcode i2c_i801 thinkpad_acpi videobuf2_vmalloc sdhci_pci mei rfkill videobuf2_memops sdhci videobuf2_core videodev mmc_core e1000e ptp snd soundcore pps_core wmi media mfd_core uinput dm_crypt crc32_pclmul [ 9241.659917] crc32c_intel i915 ghash_clmulni_intel i2c_algo_bit drm_kms_helper firewire_ohci drm firewire_core usb_storage crc_itu_t i2c_core video [ 9241.659979] CPU 1 [ 9241.659991] Pid: 4293, comm: vhost-4292 Not tainted 3.9.6-301.4.fc19.x86_64 #1 LENOVO 4243BQ3/4243BQ3 [ 9241.660028] RIP: 0010:[<ffffffff8113caaa>] [<ffffffff8113caaa>] put_page+0xa/0x60 [ 9241.660063] RSP: 0018:ffff8801187adc78 EFLAGS: 00010202 [ 9241.660084] RAX: ffff8801a2666cc0 RBX: 0000000000000012 RCX: ffff8801a2666cc0 [ 9241.660113] RDX: 0000000000000140 RSI: ffff8801a2666cc0 RDI: 0980000100fc0056 [ 9241.660140] RBP: ffff8801187adc80 R08: 0000000000000001 R09: 0000000000000010 [ 9241.660167] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880058711b00 [ 9241.660210] R13: ffffffffa070efd4 R14: 000000000000000c R15: ffff880058711b00 [ 9241.660238] FS: 0000000000000000(0000) GS:ffff88021e240000(0000) knlGS:0000000000000000 [ 9241.660269] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9241.660291] CR2: 00007f2556f21000 CR3: 0000000153e00000 CR4: 00000000000427e0 [ 9241.660336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 9241.660364] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 9241.660391] Process vhost-4292 (pid: 4293, threadinfo ffff8801187ac000, task ffff88015d301770) [ 9241.660423] Stack: [ 9241.660433] 0000000000000012 ffff8801187adca0 ffffffff8152cb87 ffff880058711b00 [ 9241.660467] ffff880058711b00 ffff8801187adcb8 ffffffff8152cc1a ffff8801187c4800 [ 9241.660501] ffff8801187adce0 ffffffff8152cce2 ffff8801187c4800 000000000000c162 [ 9241.660535] Call Trace: [ 9241.660548] [<ffffffff8152cb87>] skb_release_data+0x87/0x100 [ 9241.660572] [<ffffffff8152cc1a>] __kfree_skb+0x1a/0xb0 [ 9241.660593] [<ffffffff8152cce2>] kfree_skb+0x32/0x90 [ 9241.660616] [<ffffffffa070efd4>] tun_get_user+0x724/0x810 [tun] [ 9241.660641] [<ffffffffa070f117>] tun_sendmsg+0x57/0x80 [tun] [ 9241.660666] [<ffffffffa0788a58>] handle_tx+0x1c8/0x640 [vhost_net] [ 9241.660691] [<ffffffffa0788f05>] handle_tx_kick+0x15/0x20 [vhost_net] [ 9241.660718] [<ffffffffa078581d>] vhost_worker+0xed/0x190 [vhost_net] [ 9241.660745] [<ffffffffa0785730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net] [ 9241.660775] [<ffffffff81080200>] kthread+0xc0/0xd0 [ 9241.660796] [<ffffffff81080140>] ? insert_kthread_work+0x40/0x40 [ 9241.660822] [<ffffffff8164ed2c>] ret_from_fork+0x7c/0xb0 [ 9241.660845] [<ffffffff81080140>] ? insert_kthread_work+0x40/0x40 [ 9241.660868] Code: 00 00 01 75 f3 e9 b6 fe ff ff be 47 01 00 00 48 c7 c7 6f 51 9d 81 e8 46 02 f2 ff e9 a0 fe ff ff 90 66 66 66 66 90 55 48 89 e5 53 <48> f7 07 00 c0 00 00 48 89 fb 75 30 8b 47 1c 85 c0 74 16 f0 ff [ 9241.661044] RIP [<ffffffff8113caaa>] put_page+0xa/0x60 [ 9241.661067] RSP <ffff8801187adc78> [ 9241.671158] ---[ end trace bb59724f985e6711 ]---
*** Bug 975765 has been marked as a duplicate of this bug. ***
I could not reproduce the issue, I try both win7 and rhel6 guest. Was the issue reproduced 100%? And I need the following information: - What's the network configuration (firewall, bonding or else) in host? - I notice that NAT were used, does the issue disappear when you use bridge? - What does ifconfig $tap looks like after you meet this issue? Thanks
virbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:1e:b9:2a txqueuelen 0 (Ethernet) RX packets 21587 bytes 1721347 (1.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 23259 bytes 25956588 (24.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vnet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::fc54:ff:feea:2884 prefixlen 64 scopeid 0x20<link> ether fe:54:00:ea:28:84 txqueuelen 500 (Ethernet) RX packets 8191 bytes 746109 (728.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14252 bytes 6931280 (6.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vnet1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::fc54:ff:fec7:b1b prefixlen 64 scopeid 0x20<link> ether fe:54:00:c7:0b:1b txqueuelen 500 (Ethernet) RX packets 13396 bytes 1277456 (1.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 22550 bytes 19703863 (18.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 wlp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.110 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::2677:3ff:fe7a:6da0 prefixlen 64 scopeid 0x20<link> ether 24:77:03:7a:6d:a0 txqueuelen 1000 (Ethernet) RX packets 1072277 bytes 1206021906 (1.1 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 959151 bytes 141807014 (135.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 tun0 as well but I will not post those details and I don't think it relevant. $ sudo iptables -L -n -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE tcp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535 MASQUERADE udp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-65535 MASQUERADE all -- 192.168.122.0/24 !192.168.122.0/24 $ iptables -L -n iptables v1.4.18: can't initialize iptables table `filter': Permission denied (you must be root) Perhaps iptables or your kernel needs to be upgraded. [brad@dhcp-1-109 14-05-2013]$ sudo iptables -L -n [sudo] password for brad: Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:53 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:53 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:67 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:67 Chain FORWARD (policy ACCEPT) target prot opt source destination ACCEPT all -- 0.0.0.0/0 192.168.122.0/24 ctstate RELATED,ESTABLISHED ACCEPT all -- 192.168.122.0/24 0.0.0.0/0 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable $ sudo tgtadm --lld iscsi --op show --mode target Target 1: dhcp-1-109.bne.xxx.com:server.target1 System information: Driver: iscsi State: ready I_T nexus information: I_T nexus: 1 Initiator: iqn.xxxx-05.com.xxx:c42dff198233 alias: rhel6 Connection: 0 IP Address: 192.168.122.118 I_T nexus: 2 Initiator: iqn.xxxx-05.com.xxx:196d64165ef alias: localhost.localdomain Connection: 0 IP Address: 192.168.122.65 LUN information: LUN: 0 Type: controller SCSI ID: IET 00010000 SCSI SN: beaf10 Size: 0 MB, Block size: 1 Online: Yes Removable media: No Prevent removal: No Readonly: No Thin-provisioning: No Backing store type: null Backing store path: None Backing store flags: LUN: 1 Type: disk SCSI ID: IET 00010001 SCSI SN: beaf11 Size: 26844 MB, Block size: 512 Online: Yes Removable media: No Prevent removal: No Readonly: No Thin-provisioning: No Backing store type: rdwr Backing store path: /dev/mapper/vg-scratch Backing store flags: Account information: ACL information: ALL I'll gather the ifconfig next time I see the issue and will consider testing with bridging when I have available time. I'm also working on gathering a vmcore but kdump doesn't like the fact my /home partition is encrypted apparently.
Nothing special, thanks for the information. Please try the following build when it was finished (which contains some debug logs before crash which can help in analysing the root cause) and collect the console output. Thanks http://koji.fedoraproject.org/koji/taskinfo?taskID=5523650
unable to reproduce, but went over related code again, and found a bug. It could be this: http://patchwork.ozlabs.org/patch/252885/
This scratch build contains the same changes the previous one did, plus the new patch Michael pointed to in comment #13 http://koji.fedoraproject.org/koji/taskinfo?taskID=5523794
*** Bug 976363 has been marked as a duplicate of this bug. ***
(In reply to jason wang from comment #10) > I could not reproduce the issue, I try both win7 and rhel6 guest. Was the > issue reproduced 100%? And I need the following information: > > - What's the network configuration (firewall, bonding or else) in host? > - I notice that NAT were used, does the issue disappear when you use bridge? > - What does ifconfig $tap looks like after you meet this issue? > > Thanks In my case, no, it is not a recurring issue. After I rebooted, I have not experienced the problem again (so far). I have left my Win7 VM running (along with several other VMs: Solaris 10, OpenBSD 5.3 (x6) & RHEL{5,6}-{32,64}) as I had done before the issue occurred. Given that the issue happened around 17:01 on a Sunday, I'm wondering if there's a scheduled job either on the F18 system or the Win7 VM (more likely) that triggered the problem. Also, I found it interesting that the virtual network worked with all of the other VMs (BSD/Solaris/RHEL), even when it failed on the Windows VM. I was able to SSH to those other VMs via the virtual network and shut down those VMs/OSes cleanly.
*** Bug 976038 has been marked as a duplicate of this bug. ***
With 3.9.5-201.fc20 [ 408.330642] skb ffff880146adf200frag 0 addr ffffea000436fb00 compound 32768 [ 408.330647] frag 1 addr ffffea000436f740 compound 32768 [ 408.330649] frag 2 addr ffffea000480c5c0 compound 32768 [ 408.330650] frag 3 addr ffffea00043697c0 compound 32768 [ 408.330651] frag 4 addr ffffea000436f840 compound 32768 [ 408.330652] frag 5 addr ffffea000480c5c0 compound 32768 [ 408.330653] frag 6 addr ffffea0004224b40 compound 32768 [ 408.330654] frag 7 addr ffffea0004224b80 compound 32768 [ 408.330655] frag 8 addr ffffea000480c5c0 compound 32768 [ 408.330656] frag 9 addr ffffea0004223dc0 compound 32768 [ 408.330658] frag 10 addr ffffea0004223d80 compound 32768 [ 408.330659] frag 11 addr ffffea000480c5c0 compound 32768 [ 408.330660] frag 12 addr ffffea00044aa800 compound 32768 [ 408.330661] frag 13 addr ffffea00044aa7c0 compound 32768 [ 408.330662] frag 14 addr ffffea000480c5c0 compound 32768 [ 408.330663] frag 15 addr ffffea00044aedc0 compound 32768 [ 408.330664] frag 16 addr ffffea00044af0c0 compound 32768 [ 408.330684] BUG: unable to handle kernel NULL pointer dereference at (null) [ 408.330725] IP: [<ffffffffa07424a5>] tun_audit_skb+0x65/0xa0 [tun] [ 408.330757] PGD 0 [ 408.330769] Oops: 0000 [#1] SMP [ 408.330789] Modules linked in: vhost_net macvtap macvlan hidp ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM fuse iptable_mangle tun bridge stp llc rfcomm bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_conntrack nf_conntrack ip6_tables xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_conexant acpi_cpufreq mperf snd_usb_audio coretemp kvm_intel snd_usbmidi_lib snd_rawmidi snd_hda_intel snd_hda_codec snd_hwdep snd_seq arc4 iwldvm iTCO_wdt mac80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core kvm snd_seq_device iTCO_vendor_support btusb videodev bluetooth e1000e snd_pcm iwlwifi media microcode snd_page_alloc snd_timer sdhci_pci sdhci lpc_ich i2c_i801 ptp pps_core wmi cfg80211 mmc_core mei thinkpad_acpi [ 408.331197] snd mfd_core rfkill soundcore uinput dm_crypt i915 crc32_pclmul crc32c_intel ghash_clmulni_intel i2c_algo_bit drm_kms_helper firewire_ohci firewire_core drm crc_itu_t usb_storage i2c_core video [ 408.331298] CPU 1 [ 408.331311] Pid: 3863, comm: vhost-3857 Not tainted 3.9.5-201.fc20.x86_64 #1 LENOVO 4243BQ3/4243BQ3 [ 408.331351] RIP: 0010:[<ffffffffa07424a5>] [<ffffffffa07424a5>] tun_audit_skb+0x65/0xa0 [tun] [ 408.331393] RSP: 0018:ffff880146a47cd8 EFLAGS: 00010246 [ 408.331418] RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000000006 [ 408.331451] RDX: 0000000000000000 RSI: 0000000000000011 RDI: ffffffffa07460b0 [ 408.331483] RBP: ffff880146a47ce8 R08: 3320646e756f706d R09: 0000000000000413 [ 408.331515] R10: 3434303030616566 R11: 6f63203063306661 R12: ffff880146adf200 [ 408.331547] R13: 000000000000000c R14: ffff880146adf200 R15: ffff880146880800 [ 408.331579] FS: 0000000000000000(0000) GS:ffff88021e240000(0000) knlGS:0000000000000000 [ 408.331615] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.331642] CR2: 0000000000000000 CR3: 000000020fb3b000 CR4: 00000000000427e0 [ 408.331673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 408.331705] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 408.331738] Process vhost-3857 (pid: 3863, threadinfo ffff880146a46000, task ffff880151254650) [ 408.331776] Stack: [ 408.331786] 000000000000c162 000000000000c16e ffff880146a47d70 ffffffffa074403c [ 408.331825] ffff880146adf200 01ffffffa07bd171 0000000000001042 ffff880146a5ddc0 [ 408.331863] 0000000000000014 ffff880199a71000 ffff8801ffffffa6 00ff880100000001 [ 408.331901] Call Trace: [ 408.331918] [<ffffffffa074403c>] tun_get_user+0x6ec/0x7e0 [tun] [ 408.331947] [<ffffffffa0744187>] tun_sendmsg+0x57/0x80 [tun] [ 408.331975] [<ffffffffa07bea58>] handle_tx+0x1c8/0x640 [vhost_net] [ 408.332006] [<ffffffffa07bef05>] handle_tx_kick+0x15/0x20 [vhost_net] [ 408.332036] [<ffffffffa07bb81d>] vhost_worker+0xed/0x190 [vhost_net] [ 408.332067] [<ffffffffa07bb730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net] [ 408.332102] [<ffffffff8107fcc0>] kthread+0xc0/0xd0 [ 408.332126] [<ffffffff8107fc00>] ? insert_kthread_work+0x40/0x40 [ 408.332156] [<ffffffff8164f22c>] ret_from_fork+0x7c/0xb0 [ 408.332183] [<ffffffff8107fc00>] ? insert_kthread_work+0x40/0x40 [ 408.332210] Code: 00 00 00 80 38 00 74 47 66 0f 1f 44 00 00 48 63 d3 89 de 48 c7 c7 b0 60 74 a0 48 83 c2 03 83 c3 01 48 c1 e2 04 48 8b 14 10 31 c0 <8b> 0a 81 e1 00 c0 00 00 e8 96 a6 ef e0 41 8b 84 24 d8 00 00 00 [ 408.332376] RIP [<ffffffffa07424a5>] tun_audit_skb+0x65/0xa0 [tun] [ 408.332407] RSP <ffff880146a47cd8> [ 408.332424] CR2: 0000000000000000 [ 408.342024] ---[ end trace 4be7caf4081c8d7f ]---
With 3.9.6-301.6.fc19 [ 336.562144] Modules linked in: vhost_net macvtap macvlan hidp ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle fuse tun bridge stp llc rfcomm bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_conntrack nf_conntrack xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_hwdep arc4 iwldvm snd_seq mac80211 mperf snd_seq_device coretemp kvm_intel iwlwifi snd_pcm uvcvideo kvm iTCO_wdt iTCO_vendor_support videobuf2_vmalloc btusb cfg80211 mei videobuf2_memops videobuf2_core videodev snd_page_alloc snd_timer microcode sdhci_pci lpc_ich media sdhci mmc_core thinkpad_acpi bluetooth i2c_i801 e1000e mfd_core snd ptp wmi pps_core [ 336.562477] rfkill soundcore uinput dm_crypt crc32_pclmul crc32c_intel ghash_clmulni_intel i915 firewire_ohci firewire_core crc_itu_t i2c_algo_bit usb_storage drm_kms_helper drm i2c_core video [ 336.562542] CPU 0 [ 336.562551] Pid: 3892, comm: vhost-3890 Not tainted 3.9.6-301.6.fc19.x86_64 #1 LENOVO 4243BQ3/4243BQ3 [ 336.562582] RIP: 0010:[<ffffffff8113caaa>] [<ffffffff8113caaa>] put_page+0xa/0x60 [ 336.562610] RSP: 0018:ffff8801b0711c78 EFLAGS: 00010202 [ 336.562626] RAX: ffff8801a537cec0 RBX: 0000000000000012 RCX: 0000000000000003 [ 336.562647] RDX: 0000000000000140 RSI: 0000000000000246 RDI: 0000000000000002 [ 336.562667] RBP: ffff8801b0711c80 R08: 0000000000000001 R09: 0000000000000010 [ 336.562688] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011d935900 [ 336.562708] R13: ffffffffa074afd4 R14: 000000000000000c R15: ffff88011d935900 [ 336.562729] FS: 0000000000000000(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000 [ 336.562752] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 336.562768] CR2: 0000000000000002 CR3: 0000000150821000 CR4: 00000000000427e0 [ 336.562789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 336.562809] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 336.562829] Process vhost-3890 (pid: 3892, threadinfo ffff8801b0710000, task ffff88015083ddc0) [ 336.562853] Stack: [ 336.562861] 0000000000000012 ffff8801b0711ca0 ffffffff8152cb87 ffff88011d935900 [ 336.562886] ffff88011d935900 ffff8801b0711cb8 ffffffff8152cc1a ffff88015089c800 [ 336.562912] ffff8801b0711ce0 ffffffff8152cce2 ffff88015089c800 000000000000c162 [ 336.562937] Call Trace: [ 336.562948] [<ffffffff8152cb87>] skb_release_data+0x87/0x100 [ 336.562965] [<ffffffff8152cc1a>] __kfree_skb+0x1a/0xb0 [ 336.562981] [<ffffffff8152cce2>] kfree_skb+0x32/0x90 [ 336.562999] [<ffffffffa074afd4>] tun_get_user+0x724/0x810 [tun] [ 336.563018] [<ffffffffa074b117>] tun_sendmsg+0x57/0x80 [tun] [ 336.563036] [<ffffffffa07bda78>] handle_tx+0x1c8/0x640 [vhost_net] [ 336.563056] [<ffffffffa07bdf25>] handle_tx_kick+0x15/0x20 [vhost_net] [ 336.563076] [<ffffffffa07ba81d>] vhost_worker+0xed/0x190 [vhost_net] [ 336.563096] [<ffffffffa07ba730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net] [ 336.563119] [<ffffffff81080200>] kthread+0xc0/0xd0 [ 336.563134] [<ffffffff81080140>] ? insert_kthread_work+0x40/0x40 [ 336.563153] [<ffffffff8164ed2c>] ret_from_fork+0x7c/0xb0 [ 336.563170] [<ffffffff81080140>] ? insert_kthread_work+0x40/0x40 [ 336.563188] Code: 00 00 01 75 f3 e9 b6 fe ff ff be 47 01 00 00 48 c7 c7 6f 51 9d 81 e8 46 02 f2 ff e9 a0 fe ff ff 90 66 66 66 66 90 55 48 89 e5 53 <48> f7 07 00 c0 00 00 48 89 fb 75 30 8b 47 1c 85 c0 74 16 f0 ff [ 336.563322] RIP [<ffffffff8113caaa>] put_page+0xa/0x60 [ 336.563340] RSP <ffff8801b0711c78> [ 336.563350] CR2: 0000000000000002 [ 336.571304] ---[ end trace 81ec45b5866408e5 ]--- I notice these in the dmesg output and they seem to "roughly" (I'm afraid I don't have a hell of a lot of time for this at the moment, even on the weekends!) coincide with the period this has been happening. [ 141.324100] virbr0: topology change detected, propagating [ 141.324107] virbr0: port 2(vnet0) entered forwarding state [ 141.324122] virbr0: port 2(vnet0) entered forwarding state [ 141.554272] qemu-system-x86: sending ioctl 5326 to a partition! [ 141.554286] qemu-system-x86: sending ioctl 80200204 to a partition! [ 143.860809] device vnet1 entered promiscuous mode [ 143.865931] virbr0: topology change detected, propagating [ 143.865937] virbr0: port 3(vnet1) entered forwarding state [ 143.865946] virbr0: port 3(vnet1) entered forwarding state [ 143.991838] qemu-system-x86: sending ioctl 5326 to a partition! [ 143.991850] qemu-system-x86: sending ioctl 80200204 to a partition!
Very interesting. Can this be triggered on debug build? We clearly hit some errors on data path, on debug build there will be some info printks that tell us what's causing this.
OK, I'll try to install the latest debug kernel tomorrow.
So, it turns out I can reproduce the problem by kicking off a Windows backup in the guest Windows 7 VM. The symptoms are the same as before. The backup destination is a locally hosted SMB (samba) share on the virtual network. I have installed the debug kernel, but I'm not sure exactly what else I can/should gather. Give me some instructions/commands to run and I'll post the info. --James # uname -a Linux mnich.quotient-inc.com 3.9.6-200.fc18.x86_64.debug #1 SMP Thu Jun 13 18:47:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [ 2990.175623] general protection fault: 0000 [#1] SMP [ 2990.176239] Modules linked in: fuse ebtable_nat xt_CHECKSUM bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i bnep cxgb3 ebtable_filter ebtables bluetooth mdio libcxgbi ip6table_filter ib_iser ip6_tables rfkill rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel vhost_net tun macvtap snd_hda_codec snd_hwdep acpi_cpufreq iTCO_wdt snd_seq snd_seq_device macvlan snd_pcm raid1 mperf coretemp e1000e tpm_tis iTCO_vendor_support joydev dcdbas tpm i2c_i801 [ 2990.178474] tpm_bios snd_page_alloc snd_timer mei ptp pps_core lpc_ich mfd_core usblp microcode serio_raw snd kvm_intel soundcore kvm uinput binfmt_misc dm_crypt crc32_pclmul crc32c_intel ghash_clmulni_intel radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core video [ 2990.180050] CPU 3 [ 2990.180058] Pid: 6011, comm: vhost-6009 Not tainted 3.9.6-200.fc18.x86_64.debug #1 Dell Inc. OptiPlex 9010/0M9KCM [ 2990.181628] RIP: 0010:[<ffffffff81175d71>] [<ffffffff81175d71>] put_page+0x11/0x60 [ 2990.182433] RSP: 0018:ffff8807b1e69b58 EFLAGS: 00010292 [ 2990.183235] RAX: ffff8807afb37bd0 RBX: cccccccccccccccc RCX: 0000000000000006 [ 2990.184043] RDX: 0000000000000140 RSI: ffff88079def8700 RDI: cccccccccccccccc [ 2990.184855] RBP: ffff8807b1e69b68 R08: 0000000000000000 R09: 0000000000000000 [ 2990.185674] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880784a94b40 [ 2990.186496] R13: ffffffffa035cb54 R14: 000000000000f57e R15: ffff880784a94b40 [ 2990.187320] FS: 0000000000000000(0000) GS:ffff8807fd400000(0000) knlGS:0000000000000000 [ 2990.188154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2990.188993] CR2: 000000000a450000 CR3: 000000079fe2d000 CR4: 00000000001427e0 [ 2990.189838] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2990.190685] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2990.191539] Process vhost-6009 (pid: 6011, threadinfo ffff8807b1e68000, task ffff88079def8000) [ 2990.192446] Stack: [ 2990.193303] ffffea0015929040 0000000000000012 ffff8807b1e69b88 ffffffff815dcf8f [ 2990.194169] ffff880784a94b40 ffff880784a94b40 ffff8807b1e69ba8 ffffffff815dd35e [ 2990.195050] ffffea0015903240 ffff8807a2bccb00 ffff8807b1e69bd8 ffffffff815dc9da [ 2990.195929] Call Trace: [ 2990.196799] [<ffffffff815dcf8f>] skb_release_data+0x8f/0x110 [ 2990.197690] [<ffffffff815dd35e>] __kfree_skb+0x1e/0xa0 [ 2990.198579] [<ffffffff815dc9da>] kfree_skb+0x3a/0x170 [ 2990.199463] [<ffffffffa035cb54>] tun_get_user+0x724/0x860 [tun] [ 2990.200343] [<ffffffffa035cced>] tun_sendmsg+0x5d/0x90 [tun] [ 2990.201223] [<ffffffffa0364cb4>] handle_tx+0x294/0x6f0 [vhost_net] [ 2990.202107] [<ffffffffa0365145>] handle_tx_kick+0x15/0x20 [vhost_net] [ 2990.203121] [<ffffffffa0361ad8>] vhost_worker+0xf8/0x1a0 [vhost_net] [ 2990.204091] [<ffffffffa03619e0>] ? translate_desc.isra.8+0x270/0x270 [vhost_net] [ 2990.205032] [<ffffffff8109626d>] kthread+0xed/0x100 [ 2990.205972] [<ffffffff81096180>] ? kthread_create_on_node+0x160/0x160 [ 2990.206916] [<ffffffff8173f02c>] ret_from_fork+0x7c/0xb0 [ 2990.207860] [<ffffffff81096180>] ? kthread_create_on_node+0x160/0x160 [ 2990.208805] Code: 45 fc 65 48 01 04 25 f0 06 1d 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 <48> f7 07 00 c0 00 00 75 34 8b 47 1c 85 c0 74 1a f0 ff 4b 1c 0f [ 2990.209925] RIP [<ffffffff81175d71>] put_page+0x11/0x60 [ 2990.210962] RSP <ffff8807b1e69b58> [ 2990.218669] ---[ end trace da97fde49d42ce05 ]--- [ 3159.494278] virbr0: port 2(vnet0) entered disabled state [ 3159.496652] device vnet0 left promiscuous mode [ 3159.496671] virbr0: port 2(vnet0) entered disabled state [ 3195.260761] virbr0: port 3(vnet1) entered disabled state [ 3195.261617] device vnet1 left promiscuous mode [ 3195.261636] virbr0: port 3(vnet1) entered disabled state Jun 24 16:17:45 mnich libvirtd[1477]: 2013-06-24 20:17:45.261+0000: 1477: error : qemuMonitorIO:607 : internal error End of file from monitor Jun 24 16:35:48 mnich libvirtd[1477]: 2013-06-24 20:35:48.053+0000: 1524: warning : qemuProcessKill:3903 : Timed out waiting after SIGTERM to process 6009, sending SIGKILL Jun 24 16:35:53 mnich libvirtd[1477]: 2013-06-24 20:35:53.058+0000: 1524: warning : qemuProcessKill:3935 : Timed out waiting after SIGKILL to process 6009 Jun 24 16:35:53 mnich libvirtd[1477]: 2013-06-24 20:35:53.058+0000: 1524: error : qemuDomainDestroyFlags:1967 : operation failed: failed to kill qemu process with SIGTERM Jun 24 16:36:04 mnich libvirtd[1477]: 2013-06-24 20:36:04.424+0000: 1477: error : virNetSocketReadWire:1185 : End of file while reading data: Input/output error
Debug kernel hasn't really given us any additional information AFAICT although, the fact the debug kernel is not detecting any memory hijinks may tell us something in itself. I'm including my output although with James' comment mine is redundant. We really need a vmcore I think but, with my encrypted /home partition this is problematic for me and probably involved me mounting a new, temporary /home whilst I generate the core. I'll try and get this done anyway. Is it the influence of the debug kernel that changes it from a null pointer dereference to a GPF? [ 704.771850] Modules linked in: vhost_net macvtap macvlan hidp ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle fuse tun bridge stp llc rfcomm bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_conntrack nf_conntrack ip6_tables xfs libcrc32c snd_usb_audio snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_usbmidi_lib snd_hda_codec snd_rawmidi mperf snd_hwdep coretemp arc4 iwldvm snd_seq mac80211 uvcvideo kvm_intel kvm snd_seq_device snd_pcm videobuf2_vmalloc videobuf2_memops videobuf2_core snd_page_alloc thinkpad_acpi iTCO_wdt iTCO_vendor_support lpc_ich videodev snd_timer mei iwlwifi media cfg80211 snd i2c_i801 btusb microcode mfd_core e1000e sdhci_pci sdhci mmc_core bluetooth ptp soundcore pps_core [ 704.772158] rfkill wmi uinput dm_crypt crc32_pclmul crc32c_intel ghash_clmulni_intel i915 i2c_algo_bit drm_kms_helper firewire_ohci firewire_core usb_storage crc_itu_t drm i2c_core video [ 704.772229] CPU 1 [ 704.772245] Pid: 4307, comm: vhost-4306 Not tainted 3.9.6-301.fc19.x86_64.debug #1 LENOVO 4243BQ3/4243BQ3 [ 704.772276] RIP: 0010:[<ffffffff81171bea>] [<ffffffff81171bea>] put_page+0xa/0x60 [ 704.772304] RSP: 0018:ffff8801143bfbd8 EFLAGS: 00010202 [ 704.772321] RAX: ffff8801d1718de0 RBX: 0000000000000012 RCX: 0000000000000006 [ 704.772341] RDX: 0000000000000140 RSI: ffff88010c152bb0 RDI: cccccccccccccccc [ 704.772362] RBP: ffff8801143bfbe0 R08: 0000000000000001 R09: 0000000000000000 [ 704.772383] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8802113d9d40 [ 704.772403] R13: ffffffffa07e9a84 R14: 000000000000000c R15: ffff8802113d9d40 [ 704.772428] FS: 0000000000000000(0000) GS:ffff880214e00000(0000) knlGS:0000000000000000 [ 704.772452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 704.772469] CR2: 00007fabf03a6000 CR3: 000000010db57000 CR4: 00000000000427e0 [ 704.772490] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 704.772511] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 704.772532] Process vhost-4306 (pid: 4307, threadinfo ffff8801143be000, task ffff88010c1524b0) [ 704.772556] Stack: [ 704.772563] 0000000000000012 ffff8801143bfc00 ffffffff815c3067 ffff8802113d9d40 [ 704.772590] ffff8802113d9d40 ffff8801143bfc18 ffffffff815c30fa ffff8801143b4b00 [ 704.772619] ffff8801143bfc40 ffffffff815c31c6 ffff8801143b4b00 000000000000c162 [ 704.772645] Call Trace: [ 704.772657] [<ffffffff815c3067>] skb_release_data+0x87/0x100 [ 704.772675] [<ffffffff815c30fa>] __kfree_skb+0x1a/0xb0 [ 704.772691] [<ffffffff815c31c6>] kfree_skb+0x36/0x160 [ 704.772710] [<ffffffffa07e9a84>] tun_get_user+0x744/0x880 [tun] [ 704.772730] [<ffffffffa07e9c14>] tun_sendmsg+0x54/0x80 [tun] [ 704.772749] [<ffffffffa085ead7>] handle_tx+0x1f7/0x6d0 [vhost_net] [ 704.772769] [<ffffffffa085efe5>] handle_tx_kick+0x15/0x20 [vhost_net] [ 704.772789] [<ffffffffa085a560>] vhost_worker+0xf0/0x190 [vhost_net] [ 704.772810] [<ffffffffa085a470>] ? vhost_zerocopy_done_signal+0x20/0x20 [vhost_net] [ 704.772835] [<ffffffff8109482d>] kthread+0xed/0x100 [ 704.772852] [<ffffffff81094740>] ? insert_kthread_work+0x80/0x80 [ 704.772871] [<ffffffff817241ec>] ret_from_fork+0x7c/0xb0 [ 704.772888] [<ffffffff81094740>] ? insert_kthread_work+0x80/0x80 [ 704.772906] Code: e0 ff ff 01 e9 a8 fe ff ff be 47 01 00 00 48 c7 c7 ac f1 9e 81 e8 f7 73 ef ff e9 76 fe ff ff 66 90 66 66 66 66 90 55 48 89 e5 53 <48> f7 07 00 c0 00 00 48 89 fb 75 30 8b 47 1c 85 c0 74 16 f0 ff [ 704.773047] RIP [<ffffffff81171bea>] put_page+0xa/0x60 [ 704.773065] RSP <ffff8801143bfbd8> [ 704.780604] ---[ end trace 400da5f479419cd4 ]---
Ahhh, but I tell a lie. It has told us something. In both cases %rdi looks to be a poisoned value so this is likely a use after free of a page pointer. I guess that probably also explains the NPD vs. GPF.
there should be a KERN_INFO (level 6) message that should be displayed before the failure. Guessing that is masked now. Can you please enable level 6 messages so we can see if it's there?
these two patches should treat the crash on error observed here. http://patchwork.ozlabs.org/patch/253566/ http://patchwork.ozlabs.org/patch/253563/ but it's still not clear to me what's the source of the error.
(In reply to Michael S. Tsirkin from comment #25) > there should be a KERN_INFO (level 6) message that should > be displayed before the failure. > Guessing that is masked now. > Can you please enable level 6 messages so we can see if it's there? (In reply to Michael S. Tsirkin from comment #26) > > but it's still not clear to me what's the source of the error. # uname -a Linux mnich.quotient-inc.com 3.9.6-200.fc18.x86_64.debug #1 SMP Thu Jun 13 18:47:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux # sysctl -a |grep printk kernel.printk = 4 4 1 7 # sysctl -w kernel.printk=7 kernel.printk = 7 # sysctl -a |grep printk kernel.printk = 7 4 1 7 I did the above in hope that it will show the level 6 messages too... (please correct me if need to do something else) Jun 25 12:28:34 mnich kernel: [ 991.278430] device vnet0 left promiscuous mode Jun 25 12:28:34 mnich kernel: [ 991.278465] virbr0: port 2(vnet0) entered disabled state Jun 25 12:31:20 mnich kernel: [ 1157.192303] virbr0: port 3(vnet1) entered disabled state Jun 25 12:31:20 mnich kernel: [ 1157.193792] device vnet1 left promiscuous mode Jun 25 12:31:20 mnich kernel: [ 1157.193844] virbr0: port 3(vnet1) entered disabled state Jun 25 12:46:25 mnich kernel: [ 2062.829758] general protection fault: 0000 [#1] SMP Jun 25 12:46:25 mnich kernel: [ 2062.830373] Modules linked in: fuse ebtable_nat xt_CHECKSUM bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle be2iscsi iscsi_boot_sysfs nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack bnx2i cnic uio ebtable_filter bnep cxgb4i cxgb4 ebtables bluetooth cxgb3i cxgb3 ip6table_filter rfkill mdio ip6_tables libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep acpi_cpufreq snd_seq mperf snd_seq_device raid1 snd_pcm snd_page_alloc mei snd_timer i2c_i801 iTCO_wdt iTCO_vendor_support coretemp lpc_ich e1000e snd dcdbas ptp pps_core serio_raw mfd_core usblp microcode joydev soundcore tpm_tis tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc dm_crypt crc32_pclmul crc32c_intel ghash_clmulni_intel radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core video Jun 25 12:46:25 mnich kernel: [ 2062.834157] CPU 1 Jun 25 12:46:25 mnich kernel: [ 2062.834165] Pid: 4151, comm: vhost-4150 Not tainted 3.9.6-200.fc18.x86_64.debug #1 Dell Inc. OptiPlex 9010/0M9KCM Jun 25 12:46:25 mnich kernel: [ 2062.835705] RIP: 0010:[<ffffffff81175d71>] [<ffffffff81175d71>] put_page+0x11/0x60 Jun 25 12:46:25 mnich kernel: [ 2062.836481] RSP: 0018:ffff88078594fb58 EFLAGS: 00010292 Jun 25 12:46:25 mnich kernel: [ 2062.837270] RAX: ffff8807aac21470 RBX: cccccccccccccccc RCX: 0000000000000006 Jun 25 12:46:25 mnich kernel: [ 2062.838068] RDX: 0000000000000140 RSI: ffff8807805d8700 RDI: cccccccccccccccc Jun 25 12:46:25 mnich kernel: [ 2062.838863] RBP: ffff88078594fb68 R08: 0000000000000000 R09: 0000000000000000 Jun 25 12:46:25 mnich kernel: [ 2062.839701] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8807aa3ba640 Jun 25 12:46:25 mnich kernel: [ 2062.840502] R13: ffffffffa016db54 R14: 000000000000f57e R15: ffff8807aa3ba640 Jun 25 12:46:25 mnich kernel: [ 2062.841311] FS: 0000000000000000(0000) GS:ffff8807fd000000(0000) knlGS:0000000000000000 Jun 25 12:46:25 mnich kernel: [ 2062.842130] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 25 12:46:25 mnich kernel: [ 2062.842941] CR2: 0000000006e10000 CR3: 000000079e356000 CR4: 00000000001427f0 Jun 25 12:46:25 mnich kernel: [ 2062.843805] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 25 12:46:25 mnich kernel: [ 2062.844630] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jun 25 12:46:25 mnich kernel: [ 2062.845451] Process vhost-4150 (pid: 4151, threadinfo ffff88078594e000, task ffff8807805d8000) Jun 25 12:46:25 mnich kernel: [ 2062.846280] Stack: Jun 25 12:46:25 mnich kernel: [ 2062.847105] ffffea0016a56480 0000000000000012 ffff88078594fb88 ffffffff815dcf8f Jun 25 12:46:25 mnich kernel: [ 2062.847970] ffff8807aa3ba640 ffff8807aa3ba640 ffff88078594fba8 ffffffff815dd35e Jun 25 12:46:25 mnich kernel: [ 2062.848817] ffffea0016a32680 ffff8807a0fe0b00 ffff88078594fbd8 ffffffff815dc9da Jun 25 12:46:25 mnich kernel: [ 2062.849691] Call Trace: Jun 25 12:46:25 mnich kernel: [ 2062.850533] [<ffffffff815dcf8f>] skb_release_data+0x8f/0x110 Jun 25 12:46:25 mnich kernel: [ 2062.851405] [<ffffffff815dd35e>] __kfree_skb+0x1e/0xa0 Jun 25 12:46:25 mnich kernel: [ 2062.852277] [<ffffffff815dc9da>] kfree_skb+0x3a/0x170 Jun 25 12:46:25 mnich kernel: [ 2062.853131] [<ffffffffa016db54>] tun_get_user+0x724/0x860 [tun] Jun 25 12:46:25 mnich kernel: [ 2062.853993] [<ffffffffa016dced>] tun_sendmsg+0x5d/0x90 [tun] Jun 25 12:46:25 mnich kernel: [ 2062.854852] [<ffffffffa020acb4>] handle_tx+0x294/0x6f0 [vhost_net] Jun 25 12:46:25 mnich kernel: [ 2062.855711] [<ffffffffa020b145>] handle_tx_kick+0x15/0x20 [vhost_net] Jun 25 12:46:25 mnich kernel: [ 2062.856571] [<ffffffffa0207ad8>] vhost_worker+0xf8/0x1a0 [vhost_net] Jun 25 12:46:25 mnich kernel: [ 2062.857451] [<ffffffffa02079e0>] ? translate_desc.isra.8+0x270/0x270 [vhost_net] Jun 25 12:46:25 mnich kernel: [ 2062.858316] [<ffffffff8109626d>] kthread+0xed/0x100 Jun 25 12:46:25 mnich kernel: [ 2062.859179] [<ffffffff81096180>] ? kthread_create_on_node+0x160/0x160 Jun 25 12:46:25 mnich kernel: [ 2062.860066] [<ffffffff8173f02c>] ret_from_fork+0x7c/0xb0 Jun 25 12:46:25 mnich kernel: [ 2062.860932] [<ffffffff81096180>] ? kthread_create_on_node+0x160/0x160 Jun 25 12:46:25 mnich kernel: [ 2062.861836] Code: 45 fc 65 48 01 04 25 f0 06 1d 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 <48> f7 07 00 c0 00 00 75 34 8b 47 1c 85 c0 74 1a f0 ff 4b 1c 0f Jun 25 12:46:25 mnich kernel: [ 2062.862858] RIP [<ffffffff81175d71>] put_page+0x11/0x60 Jun 25 12:46:25 mnich kernel: [ 2062.863823] RSP <ffff88078594fb58> Jun 25 12:46:25 mnich kernel: [ 2062.870684] ---[ end trace ca7c6b1a2e74659f ]--- Jun 25 12:47:17 mnich kernel: [ 2114.716969] qemu-kvm (6152) used greatest stack depth: 2192 bytes left Jun 25 12:56:21 mnich kernel: [ 2659.820219] virbr0: port 7(vnet5) entered disabled state Jun 25 12:56:21 mnich kernel: [ 2659.821036] device vnet5 left promiscuous mode Jun 25 12:56:21 mnich kernel: [ 2659.821057] virbr0: port 7(vnet5) entered disabled state Jun 25 12:56:22 mnich kernel: [ 2660.391553] virbr0: port 6(vnet4) entered disabled state Jun 25 12:56:22 mnich kernel: [ 2660.392103] device vnet4 left promiscuous mode Jun 25 12:56:22 mnich kernel: [ 2660.392122] virbr0: port 6(vnet4) entered disabled state Jun 25 12:56:36 mnich kernel: [ 2674.443919] virbr0: port 8(vnet6) entered disabled state Jun 25 12:56:36 mnich kernel: [ 2674.444784] device vnet6 left promiscuous mode Jun 25 12:56:36 mnich kernel: [ 2674.444808] virbr0: port 8(vnet6) entered disabled state Jun 25 12:56:41 mnich kernel: [ 2679.619934] virbr0: port 9(vnet7) entered disabled state Jun 25 12:56:41 mnich kernel: [ 2679.620901] device vnet7 left promiscuous mode Jun 25 12:56:41 mnich kernel: [ 2679.620929] virbr0: port 9(vnet7) entered disabled state Jun 25 12:56:44 mnich kernel: [ 2682.289503] virbr0: port 11(vnet9) entered disabled state Jun 25 12:56:44 mnich kernel: [ 2682.290488] device vnet9 left promiscuous mode Jun 25 12:56:44 mnich kernel: [ 2682.290509] virbr0: port 11(vnet9) entered disabled state Jun 25 12:56:47 mnich kernel: [ 2685.437255] virbr0: port 10(vnet8) entered disabled state Jun 25 12:56:47 mnich kernel: [ 2685.438005] device vnet8 left promiscuous mode Jun 25 12:56:47 mnich kernel: [ 2685.438031] virbr0: port 10(vnet8) entered disabled state After the oops: kernel.printk = 15 4 1 7 I don't know if it would help you or not, but I was able to get the qemu-kvm process to dump core with 'kill -4'.. Abrt wrote it to /var/tmp/abrt/, but it's over 840M even when compressed with XZ. Also, here's the ps, before, during, and after the kill/dump: # ps auxww |grep qemu qemu 4150 16.1 25.2 11236168 8305748 ? Sl 12:23 7:50 /usr/bin/qemu-kvm -name mnich-win7 -S -M pc-1.2 -enable-kvm -m 8192 -smp 2,sockets=2,cores=1,threads=1 -uuid aba8fb50-d883-d9ce-3706-51574b8dad78 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mnich-win7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-ccid,id=ccid0 -drive file=/raid1/vms/mnich-win7.img,if=none,id=drive-virtio-disk0,format=qed,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/raid1/iso/virtio-win-0.1-59.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:b7:97:fb,bus=pci.0,addr=0x3 -device ccid-card-emulated,backend=nss-emulated,id=smartcard0,bus=ccid0.0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -spice port=5910,addr=127.0.0.1,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 root 8759 0.0 0.0 109180 916 pts/0 S+ 13:12 0:00 grep --color=auto qemu # kill -4 4150 # ps auxww |grep 4150 qemu 4150 16.2 25.2 11236168 8305776 ? Sl 12:23 7:59 /usr/bin/qemu-kvm -name mnich-win7 -S -M pc-1.2 -enable-kvm -m 8192 -smp 2,sockets=2,cores=1,threads=1 -uuid aba8fb50-d883-d9ce-3706-51574b8dad78 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mnich-win7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device usb-ccid,id=ccid0 -drive file=/raid1/vms/mnich-win7.img,if=none,id=drive-virtio-disk0,format=qed,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/raid1/iso/virtio-win-0.1-59.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:b7:97:fb,bus=pci.0,addr=0x3 -device ccid-card-emulated,backend=nss-emulated,id=smartcard0,bus=ccid0.0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -spice port=5910,addr=127.0.0.1,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 root 4157 0.0 0.0 0 0 ? S 12:23 0:00 [kvm-pit/4150] root 8784 94.9 0.0 41016 1352 ? R 13:12 0:10 /usr/libexec/abrt-hook-ccpp 4 0 4150 107 107 1372180374 e root 8799 0.0 0.0 109180 920 pts/0 S+ 13:13 0:00 grep --color=auto 4150 # ps auxww |grep 4150 qemu 4150 15.9 0.0 0 0 ? D 12:23 8:03 [qemu-kvm] root 8850 0.0 0.0 109184 984 pts/0 S+ 13:14 0:00 grep --color=auto 4150
I think you need to dump log with dmesg.
also can you pls try this: for file in /sys/class/net/tun0/statistics/*; do echo $file; cat $file;done replace tun0 with name of the tun device used for your guest.
#28: The syslogs appear to have exactly the same information as dmesg. #29: (Since I'm not sure which is the tun interface... and I'm assuming it wasn't named '*tun*' and somehow destroyed because of the issue) # pwd /sys/class/net # ls em1 lo virbr0 virbr0-nic vnet10 # for X in v*; do find ./${X}/statistics -type f -print -exec cat '{}' \;; done ./virbr0/statistics/rx_fifo_errors 0 ./virbr0/statistics/collisions 0 ./virbr0/statistics/rx_errors 0 ./virbr0/statistics/rx_compressed 0 ./virbr0/statistics/rx_dropped 0 ./virbr0/statistics/tx_packets 4379 ./virbr0/statistics/tx_errors 0 ./virbr0/statistics/rx_missed_errors 0 ./virbr0/statistics/rx_over_errors 0 ./virbr0/statistics/tx_carrier_errors 0 ./virbr0/statistics/tx_heartbeat_errors 0 ./virbr0/statistics/rx_crc_errors 0 ./virbr0/statistics/multicast 0 ./virbr0/statistics/tx_fifo_errors 0 ./virbr0/statistics/tx_aborted_errors 0 ./virbr0/statistics/rx_bytes 2835998 ./virbr0/statistics/tx_compressed 0 ./virbr0/statistics/tx_dropped 0 ./virbr0/statistics/rx_packets 3697 ./virbr0/statistics/tx_bytes 7798275 ./virbr0/statistics/tx_window_errors 0 ./virbr0/statistics/rx_frame_errors 0 ./virbr0/statistics/rx_length_errors 0 ./virbr0-nic/statistics/rx_fifo_errors 0 ./virbr0-nic/statistics/collisions 0 ./virbr0-nic/statistics/rx_errors 0 ./virbr0-nic/statistics/rx_compressed 0 ./virbr0-nic/statistics/rx_dropped 0 ./virbr0-nic/statistics/tx_packets 0 ./virbr0-nic/statistics/tx_errors 0 ./virbr0-nic/statistics/rx_missed_errors 0 ./virbr0-nic/statistics/rx_over_errors 0 ./virbr0-nic/statistics/tx_carrier_errors 0 ./virbr0-nic/statistics/tx_heartbeat_errors 0 ./virbr0-nic/statistics/rx_crc_errors 0 ./virbr0-nic/statistics/multicast 0 ./virbr0-nic/statistics/tx_fifo_errors 0 ./virbr0-nic/statistics/tx_aborted_errors 0 ./virbr0-nic/statistics/rx_bytes 0 ./virbr0-nic/statistics/tx_compressed 0 ./virbr0-nic/statistics/tx_dropped 0 ./virbr0-nic/statistics/rx_packets 0 ./virbr0-nic/statistics/tx_bytes 0 ./virbr0-nic/statistics/tx_window_errors 0 ./virbr0-nic/statistics/rx_frame_errors 0 ./virbr0-nic/statistics/rx_length_errors 0 ./vnet10/statistics/rx_fifo_errors 0 ./vnet10/statistics/collisions 0 ./vnet10/statistics/rx_errors 0 ./vnet10/statistics/rx_compressed 0 ./vnet10/statistics/rx_dropped 1 ./vnet10/statistics/tx_packets 1071 ./vnet10/statistics/tx_errors 0 ./vnet10/statistics/rx_missed_errors 0 ./vnet10/statistics/rx_over_errors 0 ./vnet10/statistics/tx_carrier_errors 0 ./vnet10/statistics/tx_heartbeat_errors 0 ./vnet10/statistics/rx_crc_errors 0 ./vnet10/statistics/multicast 0 ./vnet10/statistics/tx_fifo_errors 0 ./vnet10/statistics/tx_aborted_errors 0 ./vnet10/statistics/rx_bytes 2275036 ./vnet10/statistics/tx_compressed 0 ./vnet10/statistics/tx_dropped 5665 ./vnet10/statistics/rx_packets 640 ./vnet10/statistics/tx_bytes 112819 ./vnet10/statistics/tx_window_errors 0 ./vnet10/statistics/rx_frame_errors 0 ./vnet10/statistics/rx_length_errors 0 # for X in v* ; do ifconfig ${X}; done virbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.100.1 netmask 255.255.255.0 broadcast 192.168.100.255 ether 52:54:00:ca:49:ef txqueuelen 0 (Ethernet) RX packets 3697 bytes 2835998 (2.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 4379 bytes 7798275 (7.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr0-nic: flags=4098<BROADCAST,MULTICAST> mtu 1500 ether 52:54:00:ca:49:ef txqueuelen 500 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vnet10: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::fc54:ff:feb7:97fb prefixlen 64 scopeid 0x20<link> ether fe:54:00:b7:97:fb txqueuelen 500 (Ethernet) RX packets 640 bytes 2275036 (2.1 MiB) RX errors 0 dropped 1 overruns 0 frame 0 TX packets 1071 bytes 112819 (110.1 KiB) TX errors 0 dropped 5735 overruns 0 carrier 0 collisions 0
(In reply to Michael S. Tsirkin from comment #29) > also can you pls try this: > for file in /sys/class/net/tun0/statistics/*; do echo $file; cat $file;done > > replace tun0 with name of the tun device used for your guest. tun0 was on my system (vpn connection) and should not be relevant to the guest as the guest doesn't use it for transferring data to/from the host. My output came straight from dmesg so I don't think the message is being masked.
(In reply to James Boyle from comment #30) > #28: The syslogs appear to have exactly the same information as dmesg. > > #29: (Since I'm not sure which is the tun interface... and I'm assuming it > wasn't named '*tun*' and somehow destroyed because of the issue) > > # pwd > /sys/class/net > # ls > em1 lo virbr0 virbr0-nic vnet10 > # for X in v*; do find ./${X}/statistics -type f -print -exec cat '{}' \;; > done > > ./virbr0/statistics/rx_fifo_errors > 0 > ./virbr0/statistics/collisions > 0 > ./virbr0/statistics/rx_errors > 0 > ./virbr0/statistics/rx_compressed > 0 > ./virbr0/statistics/rx_dropped > 0 > ./virbr0/statistics/tx_packets > 4379 > ./virbr0/statistics/tx_errors > 0 > ./virbr0/statistics/rx_missed_errors > 0 > ./virbr0/statistics/rx_over_errors > 0 > ./virbr0/statistics/tx_carrier_errors > 0 > ./virbr0/statistics/tx_heartbeat_errors > 0 > ./virbr0/statistics/rx_crc_errors > 0 > ./virbr0/statistics/multicast > 0 > ./virbr0/statistics/tx_fifo_errors > 0 > ./virbr0/statistics/tx_aborted_errors > 0 > ./virbr0/statistics/rx_bytes > 2835998 > ./virbr0/statistics/tx_compressed > 0 > ./virbr0/statistics/tx_dropped > 0 > ./virbr0/statistics/rx_packets > 3697 > ./virbr0/statistics/tx_bytes > 7798275 > ./virbr0/statistics/tx_window_errors > 0 > ./virbr0/statistics/rx_frame_errors > 0 > ./virbr0/statistics/rx_length_errors > 0 > ./virbr0-nic/statistics/rx_fifo_errors > 0 > ./virbr0-nic/statistics/collisions > 0 > ./virbr0-nic/statistics/rx_errors > 0 > ./virbr0-nic/statistics/rx_compressed > 0 > ./virbr0-nic/statistics/rx_dropped > 0 > ./virbr0-nic/statistics/tx_packets > 0 > ./virbr0-nic/statistics/tx_errors > 0 > ./virbr0-nic/statistics/rx_missed_errors > 0 > ./virbr0-nic/statistics/rx_over_errors > 0 > ./virbr0-nic/statistics/tx_carrier_errors > 0 > ./virbr0-nic/statistics/tx_heartbeat_errors > 0 > ./virbr0-nic/statistics/rx_crc_errors > 0 > ./virbr0-nic/statistics/multicast > 0 > ./virbr0-nic/statistics/tx_fifo_errors > 0 > ./virbr0-nic/statistics/tx_aborted_errors > 0 > ./virbr0-nic/statistics/rx_bytes > 0 > ./virbr0-nic/statistics/tx_compressed > 0 > ./virbr0-nic/statistics/tx_dropped > 0 > ./virbr0-nic/statistics/rx_packets > 0 > ./virbr0-nic/statistics/tx_bytes > 0 > ./virbr0-nic/statistics/tx_window_errors > 0 > ./virbr0-nic/statistics/rx_frame_errors > 0 > ./virbr0-nic/statistics/rx_length_errors > 0 > ./vnet10/statistics/rx_fifo_errors > 0 > ./vnet10/statistics/collisions > 0 > ./vnet10/statistics/rx_errors > 0 > ./vnet10/statistics/rx_compressed > 0 > ./vnet10/statistics/rx_dropped > 1 So there's one packet dropped in rx. Looks like Michael's path may help here. Please try this build which contains all patches mentioned above: http://koji.fedoraproject.org/koji/taskinfo?taskID=5541985 Thanks
(In reply to jason wang from comment #32) > So there's one packet dropped in rx. Looks like Michael's path may help here. > > Please try this build which contains all patches mentioned above: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=5541985 > > Thanks I downloaded the rpms from the link above, installed, and ran a quick test. # uname -a Linux mnich.quotient-inc.com 3.9.6-301.6.fc20.x86_64 #1 SMP Wed Jun 26 04:21:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux from dmesg: [ 162.550324] hrtimer: interrupt took 3294 ns [ 866.145123] virbr0: port 5(vnet3) entered disabled state [ 866.146414] device vnet3 left promiscuous mode [ 866.146421] virbr0: port 5(vnet3) entered disabled state [ 901.154890] virbr0: port 4(vnet2) entered disabled state [ 901.155641] device vnet2 left promiscuous mode [ 901.155648] virbr0: port 4(vnet2) entered disabled state [ 939.288118] BUG: unable to handle kernel NULL pointer dereference at (null) [ 939.288147] IP: [<ffffffff8113caaa>] put_page+0xa/0x60 [ 939.288165] PGD 0 [ 939.288173] Oops: 0000 [#1] SMP [ 939.288185] Modules linked in: fuse ebtable_nat xt_CHECKSUM bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat be2iscsi iscsi_boot_sysfs bnx2i iptable_mangle cnic uio cxgb4i cxgb4 nf_conntrack_ipv4 cxgb3i nf_defrag_ipv4 cxgb3 xt_conntrack mdio libcxgbi nf_conntrack bnep ebtable_filter ib_iser ebtables rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core bluetooth ip6table_filter rfkill ip6_tables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi acpi_cpufreq snd_hda_codec_hdmi mperf coretemp snd_hda_codec_realtek joydev snd_hda_intel snd_hda_codec snd_hwdep e1000e ptp iTCO_wdt iTCO_vendor_support vhost_net microcode snd_seq snd_seq_device pps_core snd_pcm snd_page_alloc snd_timer snd [ 939.288443] mei dcdbas lpc_ich mfd_core raid1 tpm_tis usblp i2c_i801 soundcore tun macvtap macvlan tpm tpm_bios serio_raw kvm_intel kvm binfmt_misc uinput dm_crypt radeon crc32_pclmul i2c_algo_bit drm_kms_helper crc32c_intel ttm ghash_clmulni_intel drm i2c_core video [ 939.288533] CPU 2 [ 939.288541] Pid: 3192, comm: vhost-3191 Not tainted 3.9.6-301.6.fc20.x86_64 #1 Dell Inc. OptiPlex 9010/0M9KCM [ 939.288567] RIP: 0010:[<ffffffff8113caaa>] [<ffffffff8113caaa>] put_page+0xa/0x60 [ 939.288588] RSP: 0018:ffff8807d18d3c78 EFLAGS: 00010202 [ 939.288602] RAX: ffff8807d19776c0 RBX: 0000000000000012 RCX: 0000000000000003 [ 939.288620] RDX: 0000000000000140 RSI: 0000000000000246 RDI: 0000000000000000 [ 939.288638] RBP: ffff8807d18d3c80 R08: 0000000000000003 R09: 0000000000000010 [ 939.288656] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8807ce74aa00 [ 939.288674] R13: ffffffffa01f8fd4 R14: 000000000000000c R15: ffff8807ce74aa00 [ 939.288693] FS: 0000000000000000(0000) GS:ffff88081dd00000(0000) knlGS:0000000000000000 [ 939.288713] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 939.288728] CR2: 0000000000000000 CR3: 00000007d18c8000 CR4: 00000000001427e0 [ 939.288746] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 939.288764] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 939.288783] Process vhost-3191 (pid: 3192, threadinfo ffff8807d18d2000, task ffff8807c8a7c650) [ 939.288804] Stack: [ 939.288810] 0000000000000012 ffff8807d18d3ca0 ffffffff8152cb87 ffff8807ce74aa00 [ 939.288833] ffff8807ce74aa00 ffff8807d18d3cb8 ffffffff8152cc1a ffff8807c286c800 [ 939.288856] ffff8807d18d3ce0 ffffffff8152cce2 ffff8807c286c800 000000000000f572 [ 939.288878] Call Trace: [ 939.288887] [<ffffffff8152cb87>] skb_release_data+0x87/0x100 [ 939.288903] [<ffffffff8152cc1a>] __kfree_skb+0x1a/0xb0 [ 939.288917] [<ffffffff8152cce2>] kfree_skb+0x32/0x90 [ 939.288934] [<ffffffffa01f8fd4>] tun_get_user+0x724/0x810 [tun] [ 939.288952] [<ffffffffa01f9117>] tun_sendmsg+0x57/0x80 [tun] [ 939.288970] [<ffffffffa02e5a78>] handle_tx+0x1c8/0x640 [vhost_net] [ 939.288988] [<ffffffffa02e5f25>] handle_tx_kick+0x15/0x20 [vhost_net] [ 939.289007] [<ffffffffa02e281d>] vhost_worker+0xed/0x190 [vhost_net] [ 939.289025] [<ffffffffa02e2730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net] [ 939.289045] [<ffffffff81080200>] kthread+0xc0/0xd0 [ 939.289059] [<ffffffff81080140>] ? insert_kthread_work+0x40/0x40 [ 939.289076] [<ffffffff8164ed2c>] ret_from_fork+0x7c/0xb0 [ 939.289091] [<ffffffff81080140>] ? insert_kthread_work+0x40/0x40 [ 939.289106] Code: 00 00 01 75 f3 e9 b6 fe ff ff be 47 01 00 00 48 c7 c7 6f 51 9d 81 e8 46 02 f2 ff e9 a0 fe ff ff 90 0f 1f 44 00 00 55 48 89 e5 53 <48> f7 07 00 c0 00 00 48 89 fb 75 30 8b 47 1c 85 c0 74 16 f0 ff [ 939.289229] RIP [<ffffffff8113caaa>] put_page+0xa/0x60 [ 939.289244] RSP <ffff8807d18d3c78> [ 939.289254] CR2: 0000000000000000 [ 939.295848] ---[ end trace bab0f053102bcd45 ]--- I'll rerun with the debug kernel later today or tomorrow when I have the time. --James
Since it's reproducible so easily for you, will you be able to try a special kernel with all kind of instrumentation, to figure out what's going on here? If yes I'll put here a patch with this extra debug info.
(In reply to Michael S. Tsirkin from comment #34) > Since it's reproducible so easily for you, will you be able to try > a special kernel with all kind of instrumentation, to figure out > what's going on here? > If yes I'll put here a patch with this extra debug info. Please do Michael, I'm sure either James or I will be able to test it.
(In reply to Michael S. Tsirkin from comment #34) > Since it's reproducible so easily for you, will you be able to try > a special kernel with all kind of instrumentation, to figure out > what's going on here? > If yes I'll put here a patch with this extra debug info. Sure, sign me up!
*** Bug 979644 has been marked as a duplicate of this bug. ***
# sysctl -a |grep printk kernel.printk = 4 4 1 7 kernel.printk_delay = 0 kernel.printk_ratelimit = 5 kernel.printk_ratelimit_burst = 10 # sysctl -w kernel.printk=7 kernel.printk = 7 # sysctl -a |grep printk kernel.printk = 7 4 1 7 kernel.printk_delay = 0 kernel.printk_ratelimit = 5 kernel.printk_ratelimit_burst = 10 # uname -a Linux mnich.quotient-inc.com 3.9.6-301.6.fc20.x86_64.debug #1 SMP Wed Jun 26 03:56:13 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [232223.854747] sysctl: The scan_unevictable_pages sysctl/node-interface has been disabled for lack of a legitimate use case. If you have one, please send an email to linux-mm. [232361.566934] general protection fault: 0000 [#1] SMP [232361.566961] Modules linked in: tcp_lp fuse ebtable_nat xt_CHECKSUM bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE be2iscsi iscsi_boot_sysfs bnx2i ip6table_mangle cnic uio cxgb4i cxgb4 cxgb3i ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 cxgb3 mdio libcxgbi iptable_nat nf_nat_ipv4 nf_nat ib_iser iptable_mangle bnep rdma_cm ib_addr nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iw_cm ib_cm ib_sa ib_mad ib_core ebtable_filter bluetooth rfkill ebtables iscsi_tcp ip6table_filter ip6_tables libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel acpi_cpufreq snd_hda_codec snd_hwdep mperf snd_seq snd_seq_device coretemp raid1 snd_pcm iTCO_wdt iTCO_vendor_support snd_page_alloc snd_timer usblp snd e1000e ptp dcdbas soundcore [232361.567243] i2c_i801 microcode joydev lpc_ich serio_raw pps_core tpm_tis mei vhost_net mfd_core tpm tun macvtap macvlan tpm_bios kvm_intel kvm uinput binfmt_misc dm_crypt crc32_pclmul crc32c_intel ghash_clmulni_intel radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core video [232361.567343] CPU 0 [232361.567352] Pid: 4315, comm: vhost-4314 Not tainted 3.9.6-301.6.fc20.x86_64.debug #1 Dell Inc. OptiPlex 9010/0M9KCM [232361.567382] RIP: 0010:[<ffffffff81171bea>] [<ffffffff81171bea>] put_page+0xa/0x60 [232361.567409] RSP: 0000:ffff88042fef5bd0 EFLAGS: 00010202 [232361.567424] RAX: ffff8807c84457b8 RBX: 0000000000000012 RCX: 0000000000000006 [232361.567445] RDX: 0000000000000140 RSI: ffff8803c9650700 RDI: cccccccccccccccc [232361.567465] RBP: ffff88042fef5bd8 R08: 0000000000000001 R09: 0000000000000000 [232361.567485] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8807e729db00 [232361.567504] R13: ffffffffa0207b24 R14: 000000000000000c R15: ffff8807e729db00 [232361.567525] FS: 0000000000000000(0000) GS:ffff8807fce00000(0000) knlGS:0000000000000000 [232361.567547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [232361.567563] CR2: 00000000072f0000 CR3: 00000003b8ba8000 CR4: 00000000001427e0 [232361.567583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [232361.567603] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [232361.567624] Process vhost-4314 (pid: 4315, threadinfo ffff88042fef4000, task ffff8803c9650000) [232361.567676] Stack: [232361.567690] 0000000000000012 ffff88042fef5bf8 ffffffff815c30f7 ffff8807e729db00 [232361.567743] ffff8807e729db00 ffff88042fef5c10 ffffffff815c318a ffff8803b8e48b00 [232361.567795] ffff88042fef5c38 ffffffff815c3256 ffff8803b8e48b00 000000000000f572 [232361.567847] Call Trace: [232361.567868] [<ffffffff815c30f7>] skb_release_data+0x87/0x100 [232361.567905] [<ffffffff815c318a>] __kfree_skb+0x1a/0xb0 [232361.567938] [<ffffffff815c3256>] kfree_skb+0x36/0x160 [232361.567978] [<ffffffffa0207b24>] tun_get_user+0x7c4/0x900 [tun] [232361.568020] [<ffffffffa0207cb4>] tun_sendmsg+0x54/0x80 [tun] [232361.568062] [<ffffffffa0218af7>] handle_tx+0x1f7/0x6d0 [vhost_net] [232361.568105] [<ffffffffa0219005>] handle_tx_kick+0x15/0x20 [vhost_net] [232361.568149] [<ffffffffa0214560>] vhost_worker+0xf0/0x190 [vhost_net] [232361.568193] [<ffffffffa0214470>] ? vhost_zerocopy_done_signal+0x20/0x20 [vhost_net] [232361.568234] [<ffffffff8109482d>] kthread+0xed/0x100 [232361.568250] [<ffffffff81094740>] ? insert_kthread_work+0x80/0x80 [232361.568289] [<ffffffff8172456c>] ret_from_fork+0x7c/0xb0 [232361.568324] [<ffffffff81094740>] ? insert_kthread_work+0x80/0x80 [232361.568360] Code: e0 ff ff 01 e9 a8 fe ff ff be 47 01 00 00 48 c7 c7 ac f1 9e 81 e8 f7 73 ef ff e9 76 fe ff ff 66 90 0f 1f 44 00 00 55 48 89 e5 53 <48> f7 07 00 c0 00 00 48 89 fb 75 30 8b 47 1c 85 c0 74 16 f0 ff [232361.568622] RIP [<ffffffff81171bea>] put_page+0xa/0x60 [232361.568657] RSP <ffff88042fef5bd0> [232361.582117] ---[ end trace ce5d8fc725094591 ]--- (after the crash) # sysctl -a |grep printk kernel.printk = 15 4 1 7 kernel.printk_delay = 0 kernel.printk_ratelimit = 5 kernel.printk_ratelimit_burst = 10 I'm still happy to try the "specially instrumented" kernel :^)
I am able to hit this issue once when I copy a large file from a win7 guest to remote smb server. But it was not reproduce-able for me since I only see it after about 50 times try. Will continue look at this.
Just to ask the question again: would it be helpful for anyone troubleshooting to have a core dump of the qemu-kvm process?
James. What helped the kernel programmers for at least my part of the pie was to use the rawhide kernel on the host with the extra debugging. This caused it from going from a crash to an oops which pointed where one of the problems might be.
*** Bug 978785 has been marked as a duplicate of this bug. ***
FYI I have this bug and can reproduce on demand. (Win7 64 smb file copy blows up instantly). If anyone wants me to test I can do so. Stephen Degler
Created attachment 768692 [details] debugging patch 1 please try the attached patch - it should still crash but report error cause in dmesg before crash.
Created attachment 768693 [details] debugging patch 2 (obsoletes 1) this patch can't fix crash but should print a diagnostic error in dmesg before the crash.
Progress. I found some time this weekend and did a mock build locally with the patch from comment 45 and got the following. $ cat /sys/class/net/vnet1/statistics/rx_dropped 1 [ 561.307591] unable to build skb from iovec. zerocopy 1 offset 12 count 20 len 49506 [ 561.307619] BUG: unable to handle kernel paging request at 00000099020e001c [ 561.307662] IP: [<ffffffff8113ccaa>] put_page+0xa/0x60 [ 561.307692] PGD 0 [ 561.307704] Oops: 0000 [#1] SMP [ 561.307724] Modules linked in: vhost_net macvtap macvlan hidp ebtable_nat ebtables fuse ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle tun bridge stp llc rfcomm bnep ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables xfs libcrc32c arc4 iwldvm iTCO_wdt iTCO_vendor_support uvcvideo videobuf2_vmalloc mac80211 videobuf2_memops videobuf2_core snd_hda_codec_hdmi snd_hda_codec_conexant mperf videodev media coretemp kvm_intel snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_hda_intel snd_hda_codec snd_hwdep iwlwifi snd_seq snd_seq_device sdhci_pci sdhci mmc_core snd_pcm cfg80211 btusb kvm bluetooth lpc_ich mfd_core snd_page_alloc snd_timer mei i2c_i801 microcode e1000e thinkpad_acpi ptp pps_core snd soundcore [ 561.308075] rfkill wmi uinput dm_crypt crc32_pclmul crc32c_intel ghash_clmulni_intel i915 firewire_ohci firewire_core i2c_algo_bit drm_kms_helper crc_itu_t usb_storage drm i2c_core video [ 561.308145] CPU 1 [ 561.308156] Pid: 4305, comm: vhost-4304 Not tainted 3.9.9-302.bz975065.fc19.x86_64 #1 LENOVO 4243BQ3/4243BQ3 [ 561.308187] RIP: 0010:[<ffffffff8113ccaa>] [<ffffffff8113ccaa>] put_page+0xa/0x60 [ 561.308215] RSP: 0018:ffff8800b0701c78 EFLAGS: 00010202 [ 561.308233] RAX: ffff8801a88e42c0 RBX: 0000000000000012 RCX: 0000000000000003 [ 561.308256] RDX: 0000000000000140 RSI: 0000000000000246 RDI: 00000099020e001c [ 561.308278] RBP: ffff8800b0701c80 R08: 0000000020656c62 R09: 000000000000048d [ 561.308300] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880100e2e400 [ 561.308324] R13: ffffffffa073b054 R14: 000000000000000c R15: ffff880100e2e400 [ 561.308347] FS: 0000000000000000(0000) GS:ffff88021e240000(0000) knlGS:0000000000000000 [ 561.308373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 561.308392] CR2: 00000099020e001c CR3: 00000000b5b98000 CR4: 00000000000427e0 [ 561.308414] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 561.308438] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 561.308470] Process vhost-4304 (pid: 4305, threadinfo ffff8800b0700000, task ffff8802124c1770) [ 561.308497] Stack: [ 561.308505] 0000000000000012 ffff8800b0701ca0 ffffffff8152d0a7 ffff880100e2e400 [ 561.308534] ffff880100e2e400 ffff8800b0701cb8 ffffffff8152d13a ffff8800b07bc800 [ 561.308563] ffff8800b0701ce0 ffffffff8152d202 ffff8800b07bc800 000000000000c162 [ 561.308591] Call Trace: [ 561.308603] [<ffffffff8152d0a7>] skb_release_data+0x87/0x100 [ 561.308622] [<ffffffff8152d13a>] __kfree_skb+0x1a/0xb0 [ 561.308640] [<ffffffff8152d202>] kfree_skb+0x32/0x90 [ 561.308659] [<ffffffffa073b054>] tun_get_user+0x7a4/0x8b0 [tun] [ 561.308680] [<ffffffffa073b1b7>] tun_sendmsg+0x57/0x80 [tun] [ 561.308701] [<ffffffffa07b1a78>] handle_tx+0x1c8/0x640 [vhost_net] [ 561.308722] [<ffffffffa07b1f25>] handle_tx_kick+0x15/0x20 [vhost_net] [ 561.308744] [<ffffffffa07ae81d>] vhost_worker+0xed/0x190 [vhost_net] [ 561.308767] [<ffffffffa07ae730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net] [ 561.308794] [<ffffffff810802a0>] kthread+0xc0/0xd0 [ 561.308811] [<ffffffff810801e0>] ? insert_kthread_work+0x40/0x40 [ 561.308832] [<ffffffff8164f26c>] ret_from_fork+0x7c/0xb0 [ 561.308850] [<ffffffff810801e0>] ? insert_kthread_work+0x40/0x40 [ 561.308870] Code: 00 00 01 75 f3 e9 b6 fe ff ff be 47 01 00 00 48 c7 c7 4e 53 9d 81 e8 76 00 f2 ff e9 a0 fe ff ff 90 66 66 66 66 90 55 48 89 e5 53 <48> f7 07 00 c0 00 00 48 89 fb 75 30 8b 47 1c 85 c0 74 16 f0 ff [ 561.309015] RIP [<ffffffff8113ccaa>] put_page+0xa/0x60 [ 561.309035] RSP <ffff8800b0701c78> [ 561.309047] CR2: 00000099020e001c [ 561.318175] ---[ end trace 8acb97eb9b242669 ]---
Just hit this with 3.9.9-302.fc19. Using WIN7 VM to Samba share on the host.
*** Bug 982437 has been marked as a duplicate of this bug. ***
Looks like something wrong during skb linearize. The head part were miscount into data part which lead nr_frags were unsynchronized with i in zerocopy_sg_from_iovec(). Will provide a patch for test.
Created attachment 770879 [details] Patch to fix the linear part len.
(In reply to Michael S. Tsirkin from comment #45) > Created attachment 768693 [details] > debugging patch 2 (obsoletes 1) > > this patch can't fix crash but should print a diagnostic > error in dmesg before the crash. Output from kernel built with your patch. (took a while because it's been a while since I've had to build from source + have a day job ;^) ) [ 2681.177066] unable to build skb from iovec. zerocopy 1 offset 12 count 21 len 62834 [ 2681.177071] ------------[ cut here ]------------ [ 2681.177076] WARNING: at include/linux/mm.h:280 put_page+0x4c/0x60() [ 2681.177077] Hardware name: OptiPlex 9010 [ 2681.177078] Modules linked in: fuse ebtable_nat xt_CHECKSUM bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE be2iscsi iscsi_boot_sysfs bnx2i ip6table_mangle cnic uio cxgb4i cxgb4 ip6t_REJECT cxgb3i nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat cxgb3 mdio iptable_mangle nf_conntrack_ipv4 libcxgbi nf_defrag_ipv4 ib_iser xt_conntrack nf_conntrack bnep rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core ebtable_filter iscsi_tcp ebtables ip6table_filter bluetooth ip6_tables libiscsi_tcp rfkill libiscsi scsi_transport_iscsi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq acpi_cpufreq raid1 mperf snd_seq_device snd_pcm vhost_net coretemp tun macvtap macvlan snd_page_alloc kvm_intel kvm snd_timer iTCO_wdt joydev dcdbas tpm_tis [ 2681.177108] microcode e1000e iTCO_vendor_support snd cdc_acm i2c_i801 mei usblp ptp pps_core lpc_ich mfd_core uinput soundcore serio_raw tpm tpm_bios binfmt_misc dm_crypt usb_storage radeon crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper ghash_clmulni_intel ttm drm i2c_core video [ 2681.177122] Pid: 4105, comm: vhost-4104 Not tainted 3.9.6-200.jwb.975065.fc18.x86_64 #1 [ 2681.177123] Call Trace: [ 2681.177127] [<ffffffff8105ef85>] warn_slowpath_common+0x75/0xa0 [ 2681.177132] [<ffffffffa0255bcf>] ? tun_get_user+0x73f/0x840 [tun] [ 2681.177134] [<ffffffff8105efca>] warn_slowpath_null+0x1a/0x20 [ 2681.177135] [<ffffffff811418cc>] put_page+0x4c/0x60 [ 2681.177138] [<ffffffff8154612f>] skb_release_data+0x8f/0x110 [ 2681.177140] [<ffffffff815461ce>] __kfree_skb+0x1e/0xa0 [ 2681.177142] [<ffffffff81546286>] kfree_skb+0x36/0xa0 [ 2681.177144] [<ffffffffa0255bcf>] tun_get_user+0x73f/0x840 [tun] [ 2681.177147] [<ffffffffa0255d27>] tun_sendmsg+0x57/0x80 [tun] [ 2681.177150] [<ffffffffa0319d87>] handle_tx+0x287/0x680 [vhost_net] [ 2681.177153] [<ffffffffa031a1b5>] handle_tx_kick+0x15/0x20 [vhost_net] [ 2681.177156] [<ffffffffa031695d>] vhost_worker+0xed/0x190 [vhost_net] [ 2681.177159] [<ffffffffa0316870>] ? vhost_work_flush+0x110/0x110 [vhost_net] [ 2681.177162] [<ffffffff81082af0>] kthread+0xc0/0xd0 [ 2681.177165] [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_flush+0x20/0xb0 [ 2681.177167] [<ffffffff81082a30>] ? kthread_create_on_node+0x120/0x120 [ 2681.177169] [<ffffffff8166a52c>] ret_from_fork+0x7c/0xb0 [ 2681.177171] [<ffffffff81082a30>] ? kthread_create_on_node+0x120/0x120 [ 2681.177172] ---[ end trace fb59906a52dcec67 ]--- [ 2681.177179] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2681.177209] IP: [<ffffffff81141891>] put_page+0x11/0x60 [ 2681.177227] PGD 0 [ 2681.177235] Oops: 0000 [#1] SMP [ 2681.177248] Modules linked in: fuse ebtable_nat xt_CHECKSUM bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE be2iscsi iscsi_boot_sysfs bnx2i ip6table_mangle cnic uio cxgb4i cxgb4 ip6t_REJECT cxgb3i nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat cxgb3 mdio iptable_mangle nf_conntrack_ipv4 libcxgbi nf_defrag_ipv4 ib_iser xt_conntrack nf_conntrack bnep rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core ebtable_filter iscsi_tcp ebtables ip6table_filter bluetooth ip6_tables libiscsi_tcp rfkill libiscsi scsi_transport_iscsi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq acpi_cpufreq raid1 mperf snd_seq_device snd_pcm vhost_net coretemp tun macvtap macvlan snd_page_alloc kvm_intel kvm snd_timer iTCO_wdt joydev dcdbas tpm_tis [ 2681.177808] microcode e1000e iTCO_vendor_support snd cdc_acm i2c_i801 mei usblp ptp pps_core lpc_ich mfd_core uinput soundcore serio_raw tpm tpm_bios binfmt_misc dm_crypt usb_storage radeon crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper ghash_clmulni_intel ttm drm i2c_core video [ 2681.178020] CPU 3 [ 2681.178037] Pid: 4105, comm: vhost-4104 Tainted: G W 3.9.6-200.jwb.975065.fc18.x86_64 #1 Dell Inc. OptiPlex 9010/0M9KCM [ 2681.178110] RIP: 0010:[<ffffffff81141891>] [<ffffffff81141891>] put_page+0x11/0x60 [ 2681.178162] RSP: 0018:ffff8807ce3cdbf8 EFLAGS: 00010292 [ 2681.178196] RAX: ffff8807b91e20c0 RBX: 0000000000000000 RCX: 0000000000000006 [ 2681.178241] RDX: 0000000000000150 RSI: 0000000000000000 RDI: 0000000000000000 [ 2681.178286] RBP: ffff8807ce3cdc08 R08: 000000000000000a R09: 0000000000000481 [ 2681.178330] R10: 0000000000000000 R11: 0000000000000480 R12: ffff8807d6626d00 [ 2681.178374] R13: ffffffffa0255bcf R14: ffff8807d6626d00 R15: ffff8807d59f8800 [ 2681.178419] FS: 0000000000000000(0000) GS:ffff88081dd80000(0000) knlGS:0000000000000000 [ 2681.178470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2681.178506] CR2: 0000000000000000 CR3: 00000007c169e000 CR4: 00000000001427e0 [ 2681.178550] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2681.178595] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2681.178640] Process vhost-4104 (pid: 4105, threadinfo ffff8807ce3cc000, task ffff8807916addc0) [ 2681.178693] Stack: [ 2681.178708] 000000000000f572 0000000000000013 ffff8807ce3cdc28 ffffffff8154612f [ 2681.178762] ffff8807d6626d00 ffff8807d6626d00 ffff8807ce3cdc48 ffffffff815461ce [ 2681.178816] ffff8807ce3cdc88 ffff8807ba8b43d8 ffff8807ce3cdc78 ffffffff81546286 [ 2681.178870] Call Trace: [ 2681.178889] [<ffffffff8154612f>] skb_release_data+0x8f/0x110 [ 2681.178927] [<ffffffff815461ce>] __kfree_skb+0x1e/0xa0 [ 2681.178962] [<ffffffff81546286>] kfree_skb+0x36/0xa0 [ 2681.178999] [<ffffffffa0255bcf>] tun_get_user+0x73f/0x840 [tun] [ 2681.179042] [<ffffffffa0255d27>] tun_sendmsg+0x57/0x80 [tun] [ 2681.179082] [<ffffffffa0319d87>] handle_tx+0x287/0x680 [vhost_net] [ 2681.179126] [<ffffffffa031a1b5>] handle_tx_kick+0x15/0x20 [vhost_net] [ 2681.179172] [<ffffffffa031695d>] vhost_worker+0xed/0x190 [vhost_net] [ 2681.179216] [<ffffffffa0316870>] ? vhost_work_flush+0x110/0x110 [vhost_net] [ 2681.179263] [<ffffffff81082af0>] kthread+0xc0/0xd0 [ 2681.179296] [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_flush+0x20/0xb0 [ 2681.179344] [<ffffffff81082a30>] ? kthread_create_on_node+0x120/0x120 [ 2681.179387] [<ffffffff8166a52c>] ret_from_fork+0x7c/0xb0 [ 2681.179423] [<ffffffff81082a30>] ? kthread_create_on_node+0x120/0x120 [ 2681.179464] Code: 45 fc 65 48 01 04 25 70 02 01 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 <48> f7 07 00 c0 00 00 75 34 8b 47 1c 85 c0 74 1a f0 ff 4b 1c 0f [ 2681.179708] RIP [<ffffffff81141891>] put_page+0x11/0x60 [ 2681.179745] RSP <ffff8807ce3cdbf8> [ 2681.179768] CR2: 0000000000000000 [ 2681.195958] ---[ end trace fb59906a52dcec68 ]---
(In reply to jason wang from comment #50) > Created attachment 770879 [details] > Patch to fix the linear part len. I'll try to build a kernel to test this tonight.
okay, these two should fix this crash: http://patchwork.ozlabs.org/patch/257972/ http://patchwork.ozlabs.org/patch/257973/
(In reply to Michael S. Tsirkin from comment #53) > okay, these two should fix this crash: > http://patchwork.ozlabs.org/patch/257972/ > http://patchwork.ozlabs.org/patch/257973/ I've built with those two patches; it didn't OOPS or crash, but the virtio interface wedged on the Win7 guest (no functionality after the skb error), and I still got this message: Jul 10 13:59:58 mnich kernel: [ 1045.529209] unable to build skb from iovec. zerocopy 1 offset 12 count 20 len 62834
Thanks for the report. Yes I see how this can happen. With crashes finally behind us, looking at the proper fix for the error.
Perhaps I'm seeing this issue aswell. I see this trace on the console of a f19 machine [upgraded from F18] running a few VMs with network set to bridge mode. The machine and VMs are still useable - so the machine didn't crash.[my /var/log/messages is empty - but I'm able to grab stuff from dmesg.] [root@maverick ~]# uname -a Linux maverick.mcs.anl.gov 3.9.9-302.fc19.x86_64 #1 SMP Sat Jul 6 13:41:07 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [root@maverick ~]# uptime 13:29:43 up 7 days, 12 min, 1 user, load average: 1.97, 2.15, 2.13 [root@maverick ~]# [502486.657719] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c [502486.657762] IP: [<ffffffff8113cada>] put_compound_page+0xaa/0x270 [502486.657792] PGD 0 [502486.657802] Oops: 0000 [#1] SMP [502486.657818] Modules linked in: vhost_net macvtap macvlan ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle tun bnep bluetooth bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack snd_hda_codec_analog iTCO_wdt hp_wmi mei sparse_keymap iTCO_vendor_support rfkill ppdev snd_hda_intel snd_hda_codec acpi_cpufreq snd_hwdep e1000e snd_seq snd_seq_device mperf coretemp kvm_intel snd_pcm snd_page_alloc snd_timer snd ptp kvm lpc_ich pps_core mfd_core soundcore microcode tpm_tis parport_pc parport wmi tpm_infineon tpm tpm_bios nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc uinput i915 i2c_algo_bit drm_kms_helper drm i2c_core video [502486.658003] CPU 1 [502486.658003] Pid: 1173, comm: vhost-1172 Not tainted 3.9.9-302.fc19.x86_64 #1 Hewlett-Packard HP Compaq dc5800 Small Form Factor/2820h [502486.658003] RIP: 0010:[<ffffffff8113cada>] [<ffffffff8113cada>] put_compound_page+0xaa/0x270 [502486.658003] RSP: 0018:ffff880164919c50 EFLAGS: 00010286 [502486.658003] RAX: ffff88004e46f800 RBX: ffff88004e46f600 RCX: 000000000000000f [502486.658003] RDX: 0000000000000140 RSI: 0000000000000246 RDI: ffff88004e46f600 [502486.658003] RBP: ffff880164919c68 R08: 0000000000000304 R09: 0000000000000010 [502486.658003] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [502486.658003] R13: ffffffffa04befd4 R14: 000000000000000c R15: ffff880166a6c000 [502486.658003] FS: 0000000000000000(0000) GS:ffff88019bc80000(0000) knlGS:0000000000000000 [502486.658003] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [502486.658003] CR2: 000000000000001c CR3: 00000001649b8000 CR4: 00000000000027f0 [502486.658003] DR0: 0000000000000001 DR1: 0000000000000002 DR2: 0000000000000001 [502486.658003] DR3: 000000000000000a DR6: 00000000ffff0ff0 DR7: 0000000000000400 [502486.658003] Process vhost-1172 (pid: 1173, threadinfo ffff880164918000, task ffff88017ea24650) [502486.658003] Stack: [502486.658003] ffff88004e46f600 ffff880166a6c000 ffffffffa04befd4 ffff880164919c80 [502486.658003] ffffffff8113cceb 0000000000000012 ffff880164919ca0 ffffffff8152d0a7 [502486.658003] ffff880166a6c000 ffff880166a6c000 ffff880164919cb8 ffffffff8152d13a [502486.658003] Call Trace: [502486.658003] [<ffffffffa04befd4>] ? tun_get_user+0x724/0x810 [tun] [502486.658003] [<ffffffff8113cceb>] put_page+0x4b/0x60 [502486.658003] [<ffffffff8152d0a7>] skb_release_data+0x87/0x100 [502486.658003] [<ffffffff8152d13a>] __kfree_skb+0x1a/0xb0 [502486.658003] [<ffffffff8152d202>] kfree_skb+0x32/0x90 [502486.658003] [<ffffffffa04befd4>] tun_get_user+0x724/0x810 [tun] [502486.658003] [<ffffffffa04bf117>] tun_sendmsg+0x57/0x80 [tun] [502486.658003] [<ffffffffa050fa78>] handle_tx+0x1c8/0x640 [vhost_net] [502486.658003] [<ffffffffa050ff05>] handle_tx_net+0x15/0x20 [vhost_net] [502486.658003] [<ffffffffa050c81d>] vhost_worker+0xed/0x190 [vhost_net] [502486.658003] [<ffffffffa050c730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net] [502486.658003] [<ffffffff810802a0>] kthread+0xc0/0xd0 [502486.658003] [<ffffffff810801e0>] ? insert_kthread_work+0x40/0x40 [502486.658003] [<ffffffff8164f26c>] ret_from_fork+0x7c/0xb0 [502486.658003] [<ffffffff810801e0>] ? insert_kthread_work+0x40/0x40 [502486.658003] Code: 48 c7 c7 27 08 9f 81 e8 55 02 f2 ff 48 89 df e8 1d f7 ff ff 85 c0 74 a4 eb 9a 4c 8b 67 30 48 8b 07 f6 c4 80 74 cd 4c 39 e7 74 c8 <41> 8b 54 24 1c 85 d2 74 bf 8d 4a 01 89 d0 f0 41 0f b1 4c 24 1c [502486.658003] RIP [<ffffffff8113cada>] put_compound_page+0xaa/0x270 [502486.658003] RSP <ffff880164919c50> [502486.658003] CR2: 000000000000001c [502486.789205] ---[ end trace 62bd766243ff78d0 ]---
(In reply to Satish Balay from comment #56) > The machine and VMs are still useable - so the machine didn't crash. I mis-spoke. One of the VMs [bsd with virtio for network/disk] is not responding.
(In reply to Satish Balay from comment #57) > (In reply to Satish Balay from comment #56) > > > The machine and VMs are still useable - so the machine didn't crash. > > I mis-spoke. One of the VMs [bsd with virtio for network/disk] is not > responding. Yep, that fits the profile.
happening with freebsd 9 guest here, repeatably when trying to run ezjail-admin update -b -p ..
Created attachment 774591 [details] /var/log/message of failure with Freebsd 9.1 guest with virtio blk/net , running ezjail-admin update -b -p
Created attachment 775369 [details] updated-patches-3.9.9 I updated to the 3.9.9 kernel (f18) and rebuilt with the attached. (I applied the last hunk manually, just wanted to keep it to one attachment) $ uname -a Linux mnich.quotient-inc.com 3.9.9-201.bz975065.jwb.fc18.x86_64 #1 SMP Wed Jul 17 10:56:19 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux Jul 18 10:07:55 mnich kernel: [57896.472550] unable to build skb from iovec. zerocopy 1 offset 12 count 20 len 62834 So, nothing new, I just wanted to check on the newer kernel. It looks the same.
Those should fix this issue: http://patchwork.ozlabs.org/patch/259997/ http://patchwork.ozlabs.org/patch/259996/ Please test those with patches mentioned in comment #53 Thanks
(In reply to jason wang from comment #62) > Those should fix this issue: > > http://patchwork.ozlabs.org/patch/259997/ > http://patchwork.ozlabs.org/patch/259996/ > > Please test those with patches mentioned in comment #53 > > Thanks $ uname -a Linux mnich.quotient-inc.com 3.9.9-201.bz975065.jwb.2.fc18.x86_64 #1 SMP Thu Jul 18 23:42:21 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux Jul 19 09:44:27 mnich kernel: [ 1022.788596] unable to build skb from iovec. zerocopy 1 offset 12 count 19 len 62834 Jul 19 09:44:46 mnich kernel: [ 1041.745407] unable to build skb from iovec. zerocopy 1 offset 12 count 19 len 62834 I rebuilt with the patches indicated; The VirtIO interface took longer to seize this time, and I got two skb messages instead of one.
(In reply to James Boyle from comment #63) > (In reply to jason wang from comment #62) > > Those should fix this issue: > > > > http://patchwork.ozlabs.org/patch/259997/ > > http://patchwork.ozlabs.org/patch/259996/ > > > > Please test those with patches mentioned in comment #53 > > > > Thanks > > $ uname -a > Linux mnich.quotient-inc.com 3.9.9-201.bz975065.jwb.2.fc18.x86_64 #1 SMP Thu > Jul 18 23:42:21 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux > > Jul 19 09:44:27 mnich kernel: [ 1022.788596] unable to build skb from iovec. > zerocopy 1 offset 12 count 19 len 62834 > Jul 19 09:44:46 mnich kernel: [ 1041.745407] unable to build skb from iovec. > zerocopy 1 offset 12 count 19 len 62834 > > > I rebuilt with the patches indicated; The VirtIO interface took longer to > seize this time, and I got two skb messages instead of one. Thanks a lot for the testing. Does the network still usable after hitting this message. btw, I didn't hit this error after about 2 hours testing.
This patch looks like it might resolve the remaining issue: http://patchwork.ozlabs.org/patch/252885/
(In reply to jason wang from comment #64) > (In reply to James Boyle from comment #63) > > (In reply to jason wang from comment #62) > > > Those should fix this issue: > > > > > > http://patchwork.ozlabs.org/patch/259997/ > > > http://patchwork.ozlabs.org/patch/259996/ > > > > > > Please test those with patches mentioned in comment #53 > > > > > > Thanks > > > > $ uname -a > > Linux mnich.quotient-inc.com 3.9.9-201.bz975065.jwb.2.fc18.x86_64 #1 SMP Thu > > Jul 18 23:42:21 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux > > > > Jul 19 09:44:27 mnich kernel: [ 1022.788596] unable to build skb from iovec. > > zerocopy 1 offset 12 count 19 len 62834 > > Jul 19 09:44:46 mnich kernel: [ 1041.745407] unable to build skb from iovec. > > zerocopy 1 offset 12 count 19 len 62834 > > > > > > I rebuilt with the patches indicated; The VirtIO interface took longer to > > seize this time, and I got two skb messages instead of one. > > Thanks a lot for the testing. Does the network still usable after hitting > this message. > > btw, I didn't hit this error after about 2 hours testing. I'm going by memory at this point, but I think virtio networking worked briefly after the first skb message. I'll test it again to confirm, with ping and added timestamps... --James Oh! I was just about to submit the above when BZ showed Rik's message. I'll try that patch as well.
(In reply to Rik van Riel from comment #65) > This patch looks like it might resolve the remaining issue: > > http://patchwork.ozlabs.org/patch/252885/ F18: [jwboyer@zod kernel]$ grep use-after kernel.spec Patch25062: vhost-net-fix-use-after-free-in-vhost_net_flush.patch ApplyPatch vhost-net-fix-use-after-free-in-vhost_net_flush.patch F19: [jwboyer@zod kernel]$ grep use-after kernel.spec Patch25062: vhost-net-fix-use-after-free-in-vhost_net_flush.patch ApplyPatch vhost-net-fix-use-after-free-in-vhost_net_flush.patch We've been carrying that since Jul 5.
Hi Josh: Please try this kernel which has more verbose debug info that may help analysis the issue. Thanks a lot. http://koji.fedoraproject.org/koji/taskinfo?taskID=5643506
(In reply to jason wang from comment #68) > Hi Josh: > > Please try this kernel which has more verbose debug info that may help > analysis the issue. Thanks a lot. > > http://koji.fedoraproject.org/koji/taskinfo?taskID=5643506 I'm unable to reproduce the issue with the provided kernel (3.10.2-301.fc20.x86_64) & associated packages. :^) Thanks! --James
(In reply to James Boyle from comment #69) > I'm unable to reproduce the issue with the provided kernel > (3.10.2-301.fc20.x86_64) & associated packages. :^) > > Thanks! > --James Thanks for all your testing James, it's appreciated.
I'm glad that I could help :^) --James
Description of problem: Travailler avec Virt Manager en executant la commande mkfs sur une partition iscsi Version-Release number of selected component: kernel Additional info: reporter: libreport-2.1.5 cmdline: BOOT_IMAGE=/vmlinuz-3.9.9-302.fc19.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap rd.md=0 rd.dm=0 rd.lvm.lv=fedora/usr vconsole.keymap=fr rd.luks=0 rd.lvm.lv=fedora/root rhgb quiet LANG=fr_FR.utf8 kernel: 3.9.9-302.fc19.x86_64 runlevel: N 5 type: Kerneloops Truncated backtrace: BUG: unable to handle kernel NULL pointer dereference at 000000000000037f IP: [<ffffffff8113ccaa>] put_page+0xa/0x60 PGD 0 Oops: 0000 [#1] SMP Modules linked in: rfcomm fuse vhost_net macvtap macvlan ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc bnep bluetooth rfkill acpi_cpufreq mperf coretemp kvm_intel kvm snd_hda_codec_hdmi crc32_pclmul snd_hda_codec_realtek crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_codec microcode iTCO_wdt iTCO_vendor_support snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer gspca_vc032x gspca_main videodev media r8169 i2c_i801 serio_raw intel_ips snd mii soundcore lpc_ich mfd_core binfmt_misc uinput ata_generic pata_acpi i915 video i2c_algo_bit drm_kms_helper drm pata_jmicron i2c_core CPU 3 Pid: 1541, comm: vhost-1539 Not tainted 3.9.9-302.fc19.x86_64 #1 BIOSTAR Group TH55B HD/TH55B HD RIP: 0010:[<ffffffff8113ccaa>] [<ffffffff8113ccaa>] put_page+0xa/0x60 RSP: 0018:ffff8802124bbc78 EFLAGS: 00010202 RAX: ffff880186f33cc0 RBX: 0000000000000012 RCX: 0000000000000003 RDX: 0000000000000140 RSI: 0000000000000246 RDI: 000000000000037f RBP: ffff8802124bbc80 R08: 0000000000000001 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8802103dea00 R13: ffffffffa0398fd4 R14: 000000000000000c R15: ffff8802103dea00 FS: 0000000000000000(0000) GS:ffff88023fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000000000037f CR3: 0000000206110000 CR4: 00000000000027e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process vhost-1539 (pid: 1541, threadinfo ffff8802124ba000, task ffff880233cc8000) Stack: 0000000000000012 ffff8802124bbca0 ffffffff8152d0a7 ffff8802103dea00 ffff8802103dea00 ffff8802124bbcb8 ffffffff8152d13a ffff880212468800 ffff8802124bbce0 ffffffff8152d202 ffff880212468800 000000000000c162 Call Trace: [<ffffffff8152d0a7>] skb_release_data+0x87/0x100 [<ffffffff8152d13a>] __kfree_skb+0x1a/0xb0 [<ffffffff8152d202>] kfree_skb+0x32/0x90 [<ffffffffa0398fd4>] tun_get_user+0x724/0x810 [tun] [<ffffffffa0399117>] tun_sendmsg+0x57/0x80 [tun] [<ffffffffa0403a78>] handle_tx+0x1c8/0x640 [vhost_net] [<ffffffffa0403f25>] handle_tx_kick+0x15/0x20 [vhost_net] [<ffffffffa040081d>] vhost_worker+0xed/0x190 [vhost_net] [<ffffffffa0400730>] ? __vhost_add_used_n+0x100/0x100 [vhost_net] [<ffffffff810802a0>] kthread+0xc0/0xd0 [<ffffffff810801e0>] ? insert_kthread_work+0x40/0x40 [<ffffffff8164f26c>] ret_from_fork+0x7c/0xb0 [<ffffffff810801e0>] ? insert_kthread_work+0x40/0x40 Code: 00 00 01 75 f3 e9 b6 fe ff ff be 47 01 00 00 48 c7 c7 47 53 9d 81 e8 76 00 f2 ff e9 a0 fe ff ff 90 66 66 66 66 90 55 48 89 e5 53 <48> f7 07 00 c0 00 00 48 89 fb 75 30 8b 47 1c 85 c0 74 16 f0 ff RIP [<ffffffff8113ccaa>] put_page+0xa/0x60 RSP <ffff8802124bbc78> CR2: 000000000000037f
+1 for 3.10.3-300.fc19.x86_64: it has been behaving well. I have used a Windows 7 64-bit VM with CIFS (to a share on the host) and internet access overnight; it has not triggered the oops. This VM used to trigger both this oops and the other infamous 980254. Thank you very much to both fedora kernel team and testers.
Put 3.10.3-300.fc19.x86_64 onto two of my desktops with KVM, no problems so far after 24 hours of running. Thanks !
We've not seen any recent reports of these issues. I believe they are now fixed in all supported releases with the 3.10.y and 3.11-rcX kernels. Is anyone still having this issue?
No problems at all for me after almost three weeks of running.
(In reply to Josh Boyer from comment #75) > We've not seen any recent reports of these issues. I believe they are now > fixed in all supported releases with the 3.10.y and 3.11-rcX kernels. > > Is anyone still having this issue? I haven't seen any issues on 3.10.4-100 or the latest 3.10.6-100 for Fedora 18. Everything looks OK! Thanks again, --James