Bug 480822
Summary: | pvmmu causes net driver oops during guest install (DEBUG_PAGEALLOC, pvmmu, missing flush_lazy_mmu_mode() in change_page_attr()) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mark McLoughlin <markmc> | ||||||||||||||
Component: | kvm | Assignee: | Marcelo Tosatti <mtosatti> | ||||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||
Priority: | low | ||||||||||||||||
Version: | rawhide | CC: | berrange, bugproxy, clalance, dcantrell, gcosta, jlaska, markmc, mtosatti, quintela, rolandd, virt-maint | ||||||||||||||
Target Milestone: | --- | ||||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | All | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | |||||||||||||||||
: | 480850 (view as bug list) | Environment: | |||||||||||||||
Last Closed: | 2009-02-13 18:45:25 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 476774, 480594 | ||||||||||||||||
Attachments: |
|
similar reports dating back to 2.6.26: http://www.kerneloops.org/oops.php?number=39986 http://www.kerneloops.org/oops.php?number=38367 (gdb) l *0xffffffff810d5b8c 0xffffffff810d5b8c is in new_slab (mm/slub.c:1139). warning: Source file is more recent than executable. 1134 __SetPageSlubDebug(page); 1135 1136 start = page_address(page); 1137 1138 if (unlikely(s->flags & SLAB_POISON)) 1139 memset(start, POISON_INUSE, PAGE_SIZE << compound_order(page)); 1140 1141 last = start; 1142 for_each_object(p, s, start, page->objects) { 1143 setup_object(s, page, last); (gdb) x/i 0xffffffff810d5b8c 0xffffffff810d5b8c <new_slab+353>: rep stos %al,%es:(%rdi) PTE flags == 0x160 == ACCESSED|DIRTY|GLOBAL i.e. PRESENT (bit zero) not set, hence the page fault Unfortunately I dont' think this is something we can get fixed in time for alpha, and it's not a major wide case. Moving along to Beta. Created attachment 330358 [details]
oops.txt
Just observed a similar oops during an install
(In reply to comment #5) > Just observed a similar oops during an install this is with f11-alpha.2, kernel is 2.6.29-0.53.rc2.git1.f11 Bug #480850 is similar, but with rtl8139 rather than virtio_net *** Bug 480850 has been marked as a duplicate of this bug. *** Created attachment 330663 [details]
e1000-oops
Ooops while installing using e1000
James, Mark, Can you please disable paravirt mmu for a test? For convenience i've attached a library compiled on FC10 x86_64 so that you can "LD_PRELOAD=nopvmmu.so qemu-kvm parameters" instead of recompiling qemu-kvm. All reports I've seen so far seem to be under KVM, so there is the possibility that the kernel pagetable is not being setup correctly. Created attachment 330902 [details]
ioctl(CHECK_EXTENSION, KVM_CAP_PV_MMU) returns false
(In reply to comment #10) > Can you please disable paravirt mmu for a test? For convenience i've attached a > library compiled on FC10 x86_64 so that you can "LD_PRELOAD=nopvmmu.so qemu-kvm > parameters" instead of recompiling qemu-kvm. Sneaky! I like it :-) I'm failing to reproduce now even without nopvmmu.so, though. I've just done installs of 20090205 rawhide without both virtio_net and rtl8139. No oops Have they disabled CONFIG_DEBUG_PAGEALLOC recently? I've successfully completed one install using `LD_PRELOAD=/tmp/nopvmmu.so qemu-kvm ...` Testing again on the same rawhide without using nopvmmu.so and I continue to panic http://fpaste.org/paste/2945 Thanks James, I think that confirms Marcelo's suspicion that it is a pvmmu problem Attaching a patch that should fix it. You can also find a kernel rpm (currently building) at: http://koji.fedoraproject.org/koji/taskinfo?taskID=1115980 Mark, i'm not sure what the best way is for James to test this? Created attachment 331361 [details]
only batch user pte updates
Is this fix intended for the guest or the host? If the guest, I _think_ I can re-build the rawhide install images using the kernel you've referenced about using the scripts/upd-kernel utility provided by anaconda. http://git.fedorahosted.org/git/?p=anaconda.git;a=blob;f=scripts/upd-kernel;h=30871135eff0b4ad87bad8e0c84498a46bbf05d0;hb=HEAD I'll take a look once the build finishes. Thanks. (In reply to comment #18) > Is this fix intended for the guest or the host? If the guest, I _think_ I can > re-build the rawhide install images using the kernel you've referenced about > using the scripts/upd-kernel utility provided by anaconda. > > http://git.fedorahosted.org/git/?p=anaconda.git;a=blob;f=scripts/upd-kernel;h=30871135eff0b4ad87bad8e0c84498a46bbf05d0;hb=HEAD > > I'll take a look once the build finishes. Thanks. Yep, the patch is for the guest kernel. It'd be awesome if you could rebuild the install images and test the fix for Marcelo. upd-kernel is what you want alright, but last time I used it I had to fix it first so let me know if you've any problems Test results so far ... not entirely what I expected. [PASSED] - rawhide (w/ updated kernel from comment#16) > I've done 6 kvm installs using the upd-kernel script and the supplied kernel build (x86_64). [PASSED] - rawhide (w/ stock rawhide kernel) > Oddly enough, I also am *not* hitting the reported problem. I'm confused here ... timing? [FAILED] - F-11-Alpha-x86_64 > confirmed that I still hit the reported bug I'm unclear why unmodified rawhide no longer fails for me. I'll keep testing to and dbl checking my work. James, What are the kernel versions for stock rawhide and F-11-Alpha-x84_64 ? rawhide-x86_64 - kernel-2.6.29-0.99.rc4.git1.fc11.x86_64 F-11-Alpha-x86_64 - kernel-2.6.29-0.66.rc3.fc11.x86_64 Okay, some discussion upstream on Marcelo's patch has yielded a patch from Jeremy Fitzhardinge: http://patchwork.kernel.org/patch/6531/ I imagine Jeremy will push this to ingo/linus soon for 2.6.29 Ingo has it queued up in tip:x86/urgent: http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commitdiff;h=9cf161a01f Okay, it's merged in Linus's tree now too. Since James can't reproduce this with current rawhide, and the fix will be in rawhide pretty soon anyway, I'm going to close this. James - re-open if you see it again, of course. *** Bug 455097 has been marked as a duplicate of this bug. *** *** Bug 491631 has been marked as a duplicate of this bug. *** Pretty sure this is a guest pvmmu issue in the F11Alpha kernel which has since been fixed. Please re-open if you see it again in the beta See also: http://www.mail-archive.com/kvm@vger.kernel.org/msg10312.html *** This bug has been marked as a duplicate of 480822 *** Created attachment 336609 [details]
total boot log
*** Bug 480929 has been marked as a duplicate of this bug. *** ------- Comment From pavan.naregundi.com 2009-04-09 01:34 EDT------- Could not reproduce this issue in F11beta. Thanks Pavan ------- Comment From anoop.vijayan.com 2009-04-09 02:21 EDT------- (In reply to comment #15) > Could not reproduce this issue in F11beta. > > Thanks > Pavan > Thanks for verifying .. Closing.. |
Created attachment 329497 [details] virtio-oops Tried an x86_64 KVM guest install on x86_64 host, passing "console=ttyS0 vnc" to the guest and got this virtio_net oops: BUG: unable to handle kernel paging request at ffff8800080c8000 IP: [<ffffffff810d5b8c>] new_slab+0x161/0x1d5 PGD 202063 PUD 206063 PMD 140067 PTE 80c8160 ... Call Trace: <IRQ> <0> [<ffffffff810d61a9>] __slab_alloc+0x246/0x3b5 [<ffffffff812e30b5>] ? __netdev_alloc_skb+0x31/0x4d [<ffffffff810d7130>] ? __kmalloc_node_track_caller+0x91/0x136 [<ffffffff810d7171>] __kmalloc_node_track_caller+0xd2/0x136 [<ffffffff812e30b5>] ? __netdev_alloc_skb+0x31/0x4d [<ffffffff812e24dc>] __alloc_skb+0x6f/0x130 [<ffffffff812e30b5>] __netdev_alloc_skb+0x31/0x4d [<ffffffffa019861a>] try_fill_recv_maxbufs+0x5a/0x20d [virtio_net] [<ffffffffa01987ef>] try_fill_recv+0x22/0x17e [virtio_net] [<ffffffff812e8cb9>] ? netif_receive_skb+0x491/0x4a3 [<ffffffff812e894c>] ? netif_receive_skb+0x124/0x4a3 [<ffffffffa019945a>] virtnet_poll+0x57d/0x5eb [virtio_net] ... Full oops attached