Bug 976789 - vhost_net experimental_zcopytx results in unable to handle kernel paging request (anon_vma_chain_link+0x12)
Summary: vhost_net experimental_zcopytx results in unable to handle kernel paging requ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Radim Krčmář
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:75b6530a2676cc639f7c5176986...
: 972715 975691 976996 981919 981969 982773 983047 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-21 13:13 UTC by Jakub Hrozek
Modified: 2013-07-18 06:09 UTC (History)
18 users (show)

Fixed In Version: kernel-3.9.10-100.fc17
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-12 03:10:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: dmesg (137.75 KB, text/plain)
2013-06-21 13:13 UTC, Jakub Hrozek
no flags Details
3.9.3 dmesg (78.38 KB, text/plain)
2013-07-02 18:27 UTC, Pekka Pietikäinen
no flags Details
3.9.5-300 dmesg (92.86 KB, text/plain)
2013-07-02 18:28 UTC, Pekka Pietikäinen
no flags Details
3.9.5-301 dmesg (78.88 KB, text/plain)
2013-07-02 18:29 UTC, Pekka Pietikäinen
no flags Details
3.9.6 dmesg (82.55 KB, text/plain)
2013-07-02 18:29 UTC, Pekka Pietikäinen
no flags Details

Description Jakub Hrozek 2013-06-21 13:13:22 UTC
Description of problem:
I don't know for sure. I was running several VMs at the time.

Additional info:
reporter:       libreport-2.1.5
BUG: unable to handle kernel paging request at 00007f819f7bd000
IP: [<ffffffff81164312>] anon_vma_chain_link+0x12/0x40
PGD 2e7bb4067 PUD 2ea906067 PMD 2ca876067 PTE 80000002c41c7065
Oops: 0003 [#1] SMP 
Modules linked in: vhost_net macvtap macvlan hidp fuse ebtable_nat xt_CHECKSUM tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables rfcomm bnep btusb bluetooth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep iTCO_wdt snd_seq iTCO_vendor_support snd_seq_device snd_pcm acpi_cpufreq mperf coretemp kvm_intel arc4 kvm microcode iwldvm uvcvideo videobuf2_vmalloc mac80211 videobuf2_memops videobuf2_core videodev media i2c_i801 iwlwifi sdhci_pci sdhci cfg80211 mmc_core snd_page_alloc snd_timer thinkpad_acpi e1000e wmi snd rfkill lpc_ich ptp mei soundcore pps_core mfd_core uinput binfmt_misc dm_crypt crc32_pclmul crc32c_intel i915 ghash_clmulni_intel i2c_algo_bit drm_kms_helper drm i2c_core video
CPU 3 
Pid: 2189, comm: gnome-terminal- Not tainted 3.9.5-301.fc19.x86_64 #1 LENOVO 2429BP3/2429BP3
RIP: 0010:[<ffffffff81164312>]  [<ffffffff81164312>] anon_vma_chain_link+0x12/0x40
RSP: 0018:ffff8802ca9b5d58  EFLAGS: 00010246
RAX: ffff8802e7355348 RBX: 00007f819f7bd000 RCX: ffff8802ca9b5fd8
RDX: ffff8802e7355340 RSI: 00007f819f7bd000 RDI: ffff88030a366170
RBP: ffff8802ca9b5d68 R08: 0000000000016d60 R09: ffffffff81165fd9
R10: 0000000000000006 R11: ffffffffffffffd0 R12: ffff8802e7355340
R13: ffff8802e7355340 R14: ffff8802e7355340 R15: 00007f819f7bd000
FS:  00007f819f8eea00(0000) GS:ffff88031e2c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f819f7bd000 CR3: 00000002ca949000 CR4: 00000000001427f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gnome-terminal- (pid: 2189, threadinfo ffff8802ca9b4000, task ffff8802cd402ee0)
Stack:
 ffff8802e7355340 ffff8802e7933500 ffff8802ca9b5db0 ffffffff81166012
 ffff8802f80e9580 ffff88030a366170 ffff8802f80e9508 0000000000000000
 ffff8802f80e9508 ffff88030a366170 ffff88030a366170 ffff8802ca9b5de8
Call Trace:
 [<ffffffff81166012>] anon_vma_clone+0x82/0x140
 [<ffffffff811660fe>] anon_vma_fork+0x2e/0x100
 [<ffffffff8105a536>] dup_mm+0x266/0x660
 [<ffffffff8105b33d>] copy_process.part.24+0x9dd/0x13d0
 [<ffffffff8105be2d>] do_fork+0xad/0x330
 [<ffffffff811b5450>] ? get_unused_fd_flags+0x30/0x40
 [<ffffffff8105c136>] sys_clone+0x16/0x20
 [<ffffffff8164eaf9>] stub_clone+0x69/0x90
 [<ffffffff8164e799>] ? system_call_fastpath+0x16/0x1b
Code: d5 b8 f4 ff ff ff 45 31 e4 eb cb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 d4 53 48 89 f3 <48> 89 3e 48 89 53 08 48 8b 57 78 48 8d 77 78 48 8d 7b 10 e8 66 
RIP  [<ffffffff81164312>] anon_vma_chain_link+0x12/0x40
 RSP <ffff8802ca9b5d58>
CR2: 00007f819f7bd000

Potential duplicate: bug 972715

Comment 1 Jakub Hrozek 2013-06-21 13:13:29 UTC
Created attachment 763837 [details]
File: dmesg

Comment 2 Andrew Jones 2013-06-21 13:19:14 UTC
Do you have ksmtuned enabled?

Comment 3 Jakub Hrozek 2013-06-21 14:41:30 UTC
The ksm package is not even install, so I guess I don't.

Comment 4 Andrew Jones 2013-06-24 06:42:38 UTC
(In reply to Jakub Hrozek from comment #3)
> The ksm package is not even install, so I guess I don't.

There is no ksm package. ksmtuned is in the qemu package, qemu-common (at least for f18) to be precise. You can check if it's active with

systemctl status ksmtuned

Comment 5 Andrew Jones 2013-06-24 08:56:11 UTC
*** Bug 976996 has been marked as a duplicate of this bug. ***

Comment 6 Andrew Jones 2013-06-24 09:06:00 UTC
Based on information from bug 976996 this appears to be a regression since 3.9.2 and ksm doesn't need to be running.

Comment 7 Jakub Hrozek 2013-06-24 09:16:43 UTC
(In reply to Andrew Jones from comment #4)
> (In reply to Jakub Hrozek from comment #3)
> > The ksm package is not even install, so I guess I don't.
> 
> There is no ksm package. ksmtuned is in the qemu package, qemu-common (at
> least for f18) to be precise. You can check if it's active with
> 

In F19 there seems to be a separate package. I didn't have the ksmtuned service installed at all, so I ran yum provides \*ksmtuned.service and got:

2:ksm-1.4.2-3.fc19.x86_64 : Kernel Samepage Merging services
Repo        : fedora
Matched from:
Filename    : /lib/systemd/system/ksmtuned.service


> systemctl status ksmtuned

Should say "systemctl status ksmtuned.service" I guess? :-) But it's not running, no.

Thank you for the investigation so far.

Comment 8 Pekka Pietikäinen 2013-06-28 09:38:51 UTC
Could be f18 -> f19 userspace changes (I have a hunch that it may relate to qxl kms stuff if so), but 

reboot   system boot  3.9.4-200.fc18.x Sat Jun  1 18:19 - 22:18 (13+03:58)

was fine and 

reboot   system boot  3.9.6-301.fc19.x Tue Jun 25 14:42 - 14:42  (00:00)

is when I started getting problems. F18 with the 19 kernel and see if that works to find out? (I'll try the reverse and report back)

Comment 9 Pekka Pietikäinen 2013-07-02 18:25:57 UTC
Ok, so with a workload of logging in, starting up 5 vm's (win8, f19, 3x centos 6.4) and then randomly rebooting, stopping, restarting etc. them. Tests weren't THAT long, so "works" might be "didn't manage to trigger it"

3.9.3-200.fc18.x86_64 and 3.9.3-301.fc19.x86_64 both seem to work (so probably not gcc 4.8 miscompiling something)

3.9.5-300.fc19 seems to work (and running now for longer-term tests)

3.9.5-301.fc19, 3.9.6-301.fc19 reproduce the anon_vma_chain_link+0x12 thing in minutes.

3.9.8-300.fc19 had a different backtrace, tg_something_something. Smells like the same bug. Didn't get logged, alas.

Now, it appears I'm hitting #973185 with -300, but at least I'm not crashing.

Attaching dmesgs (3.9.5-301 crash apparently didn't get logged on disk)

Comment 10 Pekka Pietikäinen 2013-07-02 18:27:46 UTC
Created attachment 767890 [details]
3.9.3 dmesg

Comment 11 Pekka Pietikäinen 2013-07-02 18:28:32 UTC
Created attachment 767891 [details]
3.9.5-300 dmesg

Comment 12 Pekka Pietikäinen 2013-07-02 18:29:15 UTC
Created attachment 767892 [details]
3.9.5-301 dmesg

Comment 13 Pekka Pietikäinen 2013-07-02 18:29:46 UTC
Created attachment 767893 [details]
3.9.6 dmesg

Comment 14 Pekka Pietikäinen 2013-07-03 12:15:43 UTC
modprobe vhost_net experimental_zcopytx=0 appears to make 3.9.8-300 work a-ok with my workload, so this was almost certainly introduced by 

- Add two patches to fix issues with vhost_net and macvlan (rhbz 954181)

Comment 15 Josh Boyer 2013-07-03 13:12:42 UTC
For all of those having trouble with vhost and/or bridging in guests, please try the scratch build below when it completes.  It contains the patch from bug 880035 for the timer fix and the use-after-free fix for vhost-net backported to 3.9.8.

http://koji.fedoraproject.org/koji/taskinfo?taskID=5569247

Comment 16 Josh Boyer 2013-07-03 14:22:47 UTC
Sigh.  Of course, it would help if I didn't typo the patch.  Anyway, here is a scratch build that should actually finish building:

http://koji.fedoraproject.org/koji/taskinfo?taskID=5569571

Comment 17 Eric Paris 2013-07-03 14:59:43 UTC
*** Bug 972715 has been marked as a duplicate of this bug. ***

Comment 18 Eric Paris 2013-07-03 15:11:44 UTC
*** Bug 975691 has been marked as a duplicate of this bug. ***

Comment 19 Josh Boyer 2013-07-03 16:36:51 UTC
Third time is a charm.  This one actually looks like it built.  Sigh, sorry about that.

http://koji.fedoraproject.org/koji/taskinfo?taskID=5569631

Comment 20 Pekka Pietikäinen 2013-07-03 17:19:04 UTC
Running the same workload as before, the problem seems to be gone with the build from comment 19.

Comment 21 Adam Williamson 2013-07-03 19:32:07 UTC
is the change going to be applied to rawhide kernels? I'm on Rawhide.

Comment 22 Adam Williamson 2013-07-05 00:17:48 UTC
I gave the scratch build a bit of a run, tried some scp's from guests and some force resets, and haven't crashed the host yet. Looks good.

Comment 23 Josh Boyer 2013-07-05 13:07:55 UTC
I've applied the fixes to F17-rawhide now.  Should be in the next builds.

Comment 24 Fedora Update System 2013-07-05 19:04:15 UTC
kernel-3.9.9-201.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.9.9-201.fc18

Comment 25 Fedora Update System 2013-07-07 01:38:55 UTC
Package kernel-3.9.9-201.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.9.9-201.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-12530/kernel-3.9.9-201.fc18
then log in and leave karma (feedback).

Comment 26 Tomoaki Nakajima 2013-07-09 03:23:19 UTC
*** Bug 981919 has been marked as a duplicate of this bug. ***

Comment 27 Josh Boyer 2013-07-10 17:11:39 UTC
*** Bug 983047 has been marked as a duplicate of this bug. ***

Comment 28 Björn 'besser82' Esser 2013-07-11 14:09:24 UTC
*** Bug 981969 has been marked as a duplicate of this bug. ***

Comment 29 Fedora Update System 2013-07-11 22:15:41 UTC
kernel-3.9.9-302.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.9.9-302.fc19

Comment 30 Felipe van Schaik Willig 2013-07-12 00:08:09 UTC
Description of problem:
I was with one centos vm running within virt-manager with kvm. pressed the super key and clicked on the dock to open firefox. then the system froze.

Version-Release number of selected component:
kernel

Additional info:
reporter:       libreport-2.1.5
cmdline:        BOOT_IMAGE=/vmlinuz-3.9.9-301.fc19.x86_64 root=/dev/mapper/ubuntu-root ro rd.lvm.lv=ubuntu/swap_1 rd.md=0 rd.dm=0 rd.luks.uuid=luks-e93a533a-bdf1-40da-8271-55e8f8a3a097 vconsole.font=latarcyrheb-sun16 vconsole.keymap=br-abnt2 rd.lvm.lv=ubuntu/root rhgb quiet LANG=en_US.UTF-8
kernel:         3.9.9-301.fc19.x86_64
runlevel:       N 5
type:           Kerneloops

Truncated backtrace:
BUG: unable to handle kernel paging request at 0000003d3dc25000
IP: [<ffffffff81164582>] anon_vma_chain_link+0x12/0x40
PGD 20f074067 PUD 20ec4b067 PMD 1f1aba067 PTE 80000001ee5f6065
Oops: 0003 [#1] SMP 
Modules linked in: vhost_net macvtap macvlan tun fuse ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack bridge stp llc rfcomm ebtable_filter ebtables ip6table_filter bnep ip6_tables arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm iTCO_wdt snd_page_alloc snd_timer acpi_cpufreq mperf coretemp kvm_intel kvm tg3 ptp btusb bluetooth dell_wmi sparse_keymap ppdev iwlwifi iTCO_vendor_support joydev snd dell_laptop pps_core dcdbas sdhci_pci soundcore sdhci mmc_core microcode cfg80211 intel_ips rfkill wmi parport_pc parport lpc_ich i2c_i801 mfd_core mei uinput dm_crypt i915 i2c_algo_bit drm_kms_helper crc32c_intel firewire_ohci drm firewire_core crc_itu_t i2c_core video
CPU 1 
Pid: 1509, comm: gnome-shell Not tainted 3.9.9-301.fc19.x86_64 #1 Dell Inc. Latitude E5410/05C67D
RIP: 0010:[<ffffffff81164582>]  [<ffffffff81164582>] anon_vma_chain_link+0x12/0x40
RSP: 0018:ffff880206357da0  EFLAGS: 00010246
RAX: ffff8801f19f8c88 RBX: 0000003d3dc25000 RCX: ffff880206357fd8
RDX: ffff8801e80a1f00 RSI: 0000003d3dc25000 RDI: ffff8801f1a63730
RBP: ffff880206357db0 R08: 0000000000016d60 R09: ffffffff811663b1
R10: 0000000000000047 R11: ffff8801f1998000 R12: ffff8801e80a1f00
R13: ffff880206351958 R14: ffff8801f1a63730 R15: 0000003d3dc25000
FS:  00007f832e6caa00(0000) GS:ffff88021bc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003d3dc25000 CR3: 00000001ecd63000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gnome-shell (pid: 1509, threadinfo ffff880206356000, task ffff88020e174650)
Stack:
 ffff8801e80a1f00 0000000000000000 ffff880206357de8 ffffffff811663eb
 ffff880206351958 ffff88020df24280 ffff8801f1a63398 0000000000000002
 ffff8801f1a63730 ffff880206357e58 ffffffff8105a566 ffff88020df242e8
Call Trace:
 [<ffffffff811663eb>] anon_vma_fork+0xab/0x100
 [<ffffffff8105a566>] dup_mm+0x266/0x660
 [<ffffffff8105b36d>] copy_process.part.24+0x9dd/0x13d0
 [<ffffffff8105be5d>] do_fork+0xad/0x330
 [<ffffffff811b56c0>] ? get_unused_fd_flags+0x30/0x40
 [<ffffffff8105c166>] sys_clone+0x16/0x20
 [<ffffffff8164f679>] stub_clone+0x69/0x90
 [<ffffffff8164f319>] ? system_call_fastpath+0x16/0x1b
Code: d5 b8 f4 ff ff ff 45 31 e4 eb cb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 41 54 49 89 d4 53 48 89 f3 <48> 89 3e 48 89 53 08 48 8b 57 78 48 8d 77 78 48 8d 7b 10 e8 36 
RIP  [<ffffffff81164582>] anon_vma_chain_link+0x12/0x40
 RSP <ffff880206357da0>
CR2: 0000003d3dc25000

Comment 31 Fedora Update System 2013-07-12 03:10:13 UTC
kernel-3.9.9-201.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 32 Josh Boyer 2013-07-14 01:12:55 UTC
*** Bug 982773 has been marked as a duplicate of this bug. ***

Comment 33 Fedora Update System 2013-07-14 03:29:44 UTC
kernel-3.9.9-302.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 34 Fedora Update System 2013-07-14 11:21:54 UTC
kernel-3.9.10-100.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.9.10-100.fc17

Comment 35 Fedora Update System 2013-07-18 06:09:30 UTC
kernel-3.9.10-100.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.