Bug 480822

Summary: pvmmu causes net driver oops during guest install (DEBUG_PAGEALLOC, pvmmu, missing flush_lazy_mmu_mode() in change_page_attr())
Product: [Fedora] Fedora Reporter: Mark McLoughlin <markmc>
Component: kvmAssignee: Marcelo Tosatti <mtosatti>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: berrange, bugproxy, clalance, dcantrell, gcosta, jlaska, markmc, mtosatti, quintela, rolandd, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 480850 (view as bug list) Environment:
Last Closed: 2009-02-13 18:45:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 476774, 480594    
Attachments:
Description Flags
virtio-oops
none
oops.txt
none
e1000-oops
none
ioctl(CHECK_EXTENSION, KVM_CAP_PV_MMU) returns false
none
only batch user pte updates
none
total boot log none

Description Mark McLoughlin 2009-01-20 18:22:43 UTC
Created attachment 329497 [details]
virtio-oops

Tried an x86_64 KVM guest install on x86_64 host, passing "console=ttyS0 vnc" to the guest and got this virtio_net oops:

BUG: unable to handle kernel paging request at ffff8800080c8000
IP: [<ffffffff810d5b8c>] new_slab+0x161/0x1d5
PGD 202063 PUD 206063 PMD 140067 PTE 80c8160
...
Call Trace:
 <IRQ> <0> [<ffffffff810d61a9>] __slab_alloc+0x246/0x3b5
 [<ffffffff812e30b5>] ? __netdev_alloc_skb+0x31/0x4d
 [<ffffffff810d7130>] ? __kmalloc_node_track_caller+0x91/0x136
 [<ffffffff810d7171>] __kmalloc_node_track_caller+0xd2/0x136
 [<ffffffff812e30b5>] ? __netdev_alloc_skb+0x31/0x4d
 [<ffffffff812e24dc>] __alloc_skb+0x6f/0x130
 [<ffffffff812e30b5>] __netdev_alloc_skb+0x31/0x4d
 [<ffffffffa019861a>] try_fill_recv_maxbufs+0x5a/0x20d [virtio_net]
 [<ffffffffa01987ef>] try_fill_recv+0x22/0x17e [virtio_net]
 [<ffffffff812e8cb9>] ? netif_receive_skb+0x491/0x4a3
 [<ffffffff812e894c>] ? netif_receive_skb+0x124/0x4a3
 [<ffffffffa019945a>] virtnet_poll+0x57d/0x5eb [virtio_net]
...

Full oops attached

Comment 1 Mark McLoughlin 2009-01-20 22:16:49 UTC
similar reports dating back to 2.6.26:

http://www.kerneloops.org/oops.php?number=39986
http://www.kerneloops.org/oops.php?number=38367

Comment 2 Mark McLoughlin 2009-01-20 23:43:39 UTC
(gdb) l *0xffffffff810d5b8c
0xffffffff810d5b8c is in new_slab (mm/slub.c:1139).
warning: Source file is more recent than executable.
1134			__SetPageSlubDebug(page);
1135	
1136		start = page_address(page);
1137	
1138		if (unlikely(s->flags & SLAB_POISON))
1139			memset(start, POISON_INUSE, PAGE_SIZE << compound_order(page));
1140	
1141		last = start;
1142		for_each_object(p, s, start, page->objects) {
1143			setup_object(s, page, last);
(gdb) x/i 0xffffffff810d5b8c
0xffffffff810d5b8c <new_slab+353>:	rep stos %al,%es:(%rdi)

Comment 3 Mark McLoughlin 2009-01-20 23:56:41 UTC
PTE flags == 0x160 == ACCESSED|DIRTY|GLOBAL

i.e. PRESENT (bit zero) not set, hence the page fault

Comment 4 Jesse Keating 2009-01-21 01:05:15 UTC
Unfortunately I dont' think this is something we can get fixed in time for alpha, and it's not a major wide case.  Moving along to Beta.

Comment 5 Mark McLoughlin 2009-01-29 14:45:11 UTC
Created attachment 330358 [details]
oops.txt

Just observed a similar oops during an install

Comment 6 Mark McLoughlin 2009-01-29 14:50:59 UTC
(In reply to comment #5)

> Just observed a similar oops during an install

this is with f11-alpha.2, kernel is 2.6.29-0.53.rc2.git1.f11

Comment 7 Mark McLoughlin 2009-02-02 18:07:35 UTC
Bug #480850 is similar, but with rtl8139 rather than virtio_net

Comment 8 James Laska 2009-02-02 18:55:00 UTC
*** Bug 480850 has been marked as a duplicate of this bug. ***

Comment 9 James Laska 2009-02-02 18:58:43 UTC
Created attachment 330663 [details]
e1000-oops

Ooops while installing using e1000

Comment 10 Marcelo Tosatti 2009-02-04 18:32:02 UTC
James, Mark,

Can you please disable paravirt mmu for a test? For convenience i've attached a library compiled on FC10 x86_64 so that you can "LD_PRELOAD=nopvmmu.so qemu-kvm parameters" instead of recompiling qemu-kvm.

All reports I've seen so far seem to be under KVM, so there is the possibility that the kernel pagetable is not being setup correctly.

Comment 11 Marcelo Tosatti 2009-02-04 18:33:46 UTC
Created attachment 330902 [details]
ioctl(CHECK_EXTENSION, KVM_CAP_PV_MMU) returns false

Comment 12 Mark McLoughlin 2009-02-05 15:31:09 UTC
(In reply to comment #10)

> Can you please disable paravirt mmu for a test? For convenience i've attached a
> library compiled on FC10 x86_64 so that you can "LD_PRELOAD=nopvmmu.so qemu-kvm
> parameters" instead of recompiling qemu-kvm.

Sneaky! I like it :-)

I'm failing to reproduce now even without nopvmmu.so, though. I've just done installs of 20090205 rawhide without both virtio_net and rtl8139. No oops

Comment 13 Marcelo Tosatti 2009-02-05 16:16:47 UTC
Have they disabled CONFIG_DEBUG_PAGEALLOC recently?

Comment 14 James Laska 2009-02-05 17:30:57 UTC
I've successfully completed one install using `LD_PRELOAD=/tmp/nopvmmu.so qemu-kvm ...`

Testing again on the same rawhide without using nopvmmu.so and I continue to panic http://fpaste.org/paste/2945

Comment 15 Mark McLoughlin 2009-02-05 17:38:28 UTC
Thanks James, I think that confirms Marcelo's suspicion that it is a pvmmu problem

Comment 16 Marcelo Tosatti 2009-02-09 20:34:48 UTC
Attaching a patch that should fix it. You can also find a kernel rpm (currently building) at:

http://koji.fedoraproject.org/koji/taskinfo?taskID=1115980

Mark, i'm not sure what the best way is for James to test this?

Comment 17 Marcelo Tosatti 2009-02-09 20:36:47 UTC
Created attachment 331361 [details]
only batch user pte updates

Comment 18 James Laska 2009-02-09 20:47:31 UTC
Is this fix intended for the guest or the host?  If the guest, I _think_ I can re-build the rawhide install images using the kernel you've referenced about using the scripts/upd-kernel utility provided by anaconda.

http://git.fedorahosted.org/git/?p=anaconda.git;a=blob;f=scripts/upd-kernel;h=30871135eff0b4ad87bad8e0c84498a46bbf05d0;hb=HEAD

I'll take a look once the build finishes.  Thanks.

Comment 19 Mark McLoughlin 2009-02-09 21:23:00 UTC
(In reply to comment #18)
> Is this fix intended for the guest or the host?  If the guest, I _think_ I can
> re-build the rawhide install images using the kernel you've referenced about
> using the scripts/upd-kernel utility provided by anaconda.
> 
> http://git.fedorahosted.org/git/?p=anaconda.git;a=blob;f=scripts/upd-kernel;h=30871135eff0b4ad87bad8e0c84498a46bbf05d0;hb=HEAD
> 
> I'll take a look once the build finishes.  Thanks.

Yep, the patch is for the guest kernel.

It'd be awesome if you could rebuild the install images and test the fix for Marcelo. upd-kernel is what you want alright, but last time I used it I had to fix it first so let me know if you've any problems

Comment 20 James Laska 2009-02-10 16:12:11 UTC
Test results so far ... not entirely what I expected.

[PASSED] - rawhide (w/ updated kernel from comment#16)
  > I've done 6 kvm installs using the upd-kernel script and the supplied kernel build (x86_64).

[PASSED] - rawhide (w/ stock rawhide kernel)
  > Oddly enough, I also am *not* hitting the reported problem.  I'm confused here ... timing?

[FAILED] - F-11-Alpha-x86_64
  > confirmed that I still hit the reported bug

I'm unclear why unmodified rawhide no longer fails for me.  I'll keep testing to and dbl checking my work.

Comment 21 Marcelo Tosatti 2009-02-10 18:25:42 UTC
James,

What are the kernel versions for stock rawhide and F-11-Alpha-x84_64 ?

Comment 22 James Laska 2009-02-10 18:36:17 UTC
rawhide-x86_64 - kernel-2.6.29-0.99.rc4.git1.fc11.x86_64
F-11-Alpha-x86_64 - kernel-2.6.29-0.66.rc3.fc11.x86_64

Comment 23 Mark McLoughlin 2009-02-11 19:45:05 UTC
Okay, some discussion upstream on Marcelo's patch has yielded a patch from Jeremy Fitzhardinge:

  http://patchwork.kernel.org/patch/6531/

I imagine Jeremy will push this to ingo/linus soon for 2.6.29

Comment 24 Mark McLoughlin 2009-02-11 20:23:17 UTC
Ingo has it queued up in tip:x86/urgent:

http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commitdiff;h=9cf161a01f

Comment 25 Mark McLoughlin 2009-02-13 18:45:25 UTC
Okay, it's merged in Linus's tree now too.

Since James can't reproduce this with current rawhide, and the fix will be in rawhide pretty soon anyway, I'm going to close this.

James - re-open if you see it again, of course.

Comment 26 Marcelo Tosatti 2009-03-04 21:50:37 UTC
*** Bug 455097 has been marked as a duplicate of this bug. ***

Comment 27 Mark McLoughlin 2009-03-25 10:16:23 UTC
*** Bug 491631 has been marked as a duplicate of this bug. ***

Comment 28 IBM Bug Proxy 2009-03-25 10:23:17 UTC
Pretty sure this is a guest pvmmu issue in the F11Alpha kernel which has since been fixed. Please re-open if you see it again in the beta

See also:

http://www.mail-archive.com/kvm@vger.kernel.org/msg10312.html

*** This bug has been marked as a duplicate of 480822 ***

Comment 29 IBM Bug Proxy 2009-03-25 10:23:28 UTC
Created attachment 336609 [details]
total boot log

Comment 30 Mark McLoughlin 2009-03-25 17:55:47 UTC
*** Bug 480929 has been marked as a duplicate of this bug. ***

Comment 31 IBM Bug Proxy 2009-04-09 05:41:16 UTC
------- Comment From pavan.naregundi.com 2009-04-09 01:34 EDT-------
Could not reproduce this issue in F11beta.

Thanks
Pavan

Comment 32 IBM Bug Proxy 2009-04-09 06:31:00 UTC
------- Comment From anoop.vijayan.com 2009-04-09 02:21 EDT-------
(In reply to comment #15)
> Could not reproduce this issue in F11beta.
>
> Thanks
> Pavan
>

Thanks for verifying .. Closing..