Bug 653890 - kvm winxp guest crashes host after login
Summary: kvm winxp guest crashes host after login
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 14
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-16 11:58 UTC by Enrico Scholz
Modified: 2012-08-16 18:44 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-16 18:44:04 UTC
Type: ---


Attachments (Terms of Use)
1st oops (12.28 KB, text/plain)
2010-11-16 11:59 UTC, Enrico Scholz
no flags Details
2nd oops (4.36 KB, text/plain)
2010-11-16 11:59 UTC, Enrico Scholz
no flags Details
3rd oops (12.87 KB, text/plain)
2010-11-16 12:00 UTC, Enrico Scholz
no flags Details
debug trace showing bad kfree() (12.29 KB, text/plain)
2010-11-18 16:54 UTC, Enrico Scholz
no flags Details

Description Enrico Scholz 2010-11-16 11:58:13 UTC
Description of problem:

When log into into a WinXP guest (qemu-kvm, 32 bit guest) over RDP,
host stucks.  I was able to record the attached oopses over a serial
console; devices like network or disk do not respond anymore.

The crash happens when:

* I am logged into an X11 session (--> krb5 credentials and access to
  an NFS4 krb5i-sec mounted share); I do not see it when only kdm is
  running and I access the VM from another machine

* I happens immediately after VM tries to access remote home directory.
  E.g. VM runs fine when only the login screen is show.

Basic information of the attached oopses are:

[ 2015.867860] general protection fault: 0000 [#1] SMP
...
[ 2015.868449] RIP: 0010:[<ffffffff810f69ec>]  [<ffffffff810f69ec>] __get_vm_area_node+0x169/0x197
...
[ 2015.868449]  [<ffffffffa007d284>] ? drm_calloc_large+0x51/0x53 [i915]
[ 2015.868449]  [<ffffffff810f7553>] __vmalloc_node+0x6d/0x9b
[ 2015.868449]  [<ffffffffa007d284>] ? drm_calloc_large+0x51/0x53 [i915]
[ 2015.868449]  [<ffffffffa007d0b9>] ? drm_malloc_ab+0x3b/0x53 [i915]
[ 2015.868449]  [<ffffffff810f76bc>] __vmalloc+0x20/0x22
[ 2015.868449]  [<ffffffffa007d284>] drm_calloc_large+0x51/0x53 [i915]
[ 2015.868449]  [<ffffffffa008072e>] i915_gem_do_execbuffer+0x1b7/0xddc [i915]

----

[48665.486793] general protection fault: 0000 [#1] SMP
...
[48666.673347] RIP: 0010:[<ffffffff8110af41>]  [<ffffffff8110af41>] __kmalloc_track_caller+0xe0/0x132
...
[48666.673347]  [<ffffffff811eec56>] ? security_context_to_sid_core+0xd2/0x184
[48666.673347]  [<ffffffff810e549f>] kstrdup+0x31/0x49
[48666.673347]  [<ffffffff811eec56>] security_context_to_sid_core+0xd2/0x184
[48666.673347]  [<ffffffff811b1ac6>] ? ext4_xattr_set+0xaa/0xe5
[48666.673347]  [<ffffffff811ef5f3>] security_context_to_sid_force+0x1c/0x1e

---

[ 2844.193127] general protection fault: 0000 [#1] SMP
...
[ 2845.264272] RIP: 0010:[<ffffffff81109825>]  [<ffffffff81109825>] kmem_cache_alloc+0x83/0x105
...
[ 2845.264272]  [<ffffffff81104d5b>] ksm_scan_thread+0x219/0xa30
[ 2845.264272]  [<ffffffff810663c3>] ? autoremove_wake_function+0x0/0x39
[ 2845.264272]  [<ffffffff81104b42>] ? ksm_scan_thread+0x0/0xa30
[ 2845.264272]  [<ffffffff81065f29>] kthread+0x7f/0x87


Version-Release number of selected component (if applicable):

qemu-0.13.0-1.fc14.x86_64
kernel-2.6.35.6-48.fc14.x86_64


How reproducible:

95%

Comment 1 Enrico Scholz 2010-11-16 11:59:25 UTC
Created attachment 460818 [details]
1st oops

Comment 2 Enrico Scholz 2010-11-16 11:59:50 UTC
Created attachment 460819 [details]
2nd oops

Comment 3 Enrico Scholz 2010-11-16 12:00:18 UTC
Created attachment 460821 [details]
3rd oops

Comment 4 Enrico Scholz 2010-11-17 12:24:40 UTC
some analysis:

the 1st and 3rd oops are both in an inlined slab_alloc() and there in

        local_irq_save(flags);
        c = __this_cpu_ptr(s->cpu_slab);
        object = c->freelist;
                ...
                c->freelist = get_freepointer(s, object);   <<<<<

| get_freepointer():
| /usr/src/debug/kernel-2.6.35.fc14/linux-2.6.35.x86_64/mm/slub.c:258
| ffffffff81109820:       49 63 44 24 18          movslq 0x18(%r12),%rax
| /usr/src/debug/kernel-2.6.35.fc14/linux-2.6.35.x86_64/mm/slub.c:1714
| ffffffff81109825:       48 8b 04 03             mov    (%rbx,%rax,1),%rax <<<<
| ffffffff81109829:       49 89 00                mov    %rax,(%r8)


-->  these two oopses happen while c->freelist[0] is accessed which has the address 0xc502a8c000000000 and 0x5100a8c000000000


The second address is interesting because it is accessed by the 2nd oops:

| static void insert_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
|                               unsigned long flags, void *caller)
| ...
|                 if (tmp->addr >= vm->addr)   <<<<<

| /usr/src/debug/kernel-2.6.35.fc14/linux-2.6.35.x86_64/mm/vmalloc.c:1211
| ffffffff810f69e8:       48 8b 4b 08             mov    0x8(%rbx),%rcx
| ffffffff810f69ec:       48 39 48 08             cmp    %rcx,0x8(%rax) <<<<

with 'vm' or 'tmp' being the 0x5100a8c000000000 above.


Can such (low) addresses really happen in 64 bit linux for kernel structures?

Comment 5 Enrico Scholz 2010-11-18 16:54:49 UTC
Created attachment 461336 [details]
debug trace showing bad kfree()

The seen 0xc502a8c000000000 and 0x5100a8c000000000 "addresses" are ip
numbers. The .debug kernel gives:

[  868.856540] INFO: Allocated in 0x6b6b6b6b6b6b6b6b age=16890 cpu=27499 pid=2600
[  868.856540] INFO: Freed in sg_kfree+0x18/0x1a age=12887 cpu=0 pid=2600
[  868.856540] INFO: Slab 0xffffea0003edf3a8 objects=30 used=25 fp=0xffff88011f6a37f8 flags=0x400000000000c3
[  868.856540] INFO: Object 0xffff88011f6a3aa0 @offset=2720 fp=0x6b6b6b6b6b6b6b6b
...
[  868.856540]  [<ffffffff81425f0d>] ? ip_push_pending_frames+0x25e/0x2c2
[  868.856540]  [<ffffffff8111a35e>] kfree+0x102/0x136
[  868.856540]  [<ffffffff81425f0d>] ip_push_pending_frames+0x25e/0x2c2

Comment 6 Enrico Scholz 2010-11-18 20:34:49 UTC
probably unrelated (although the kfree() is done on such a 'cork.opt' object):

net/ipv4/ip_output.c
| 820 	inet->cork.opt = kmalloc(sizeof(struct ip_options) + 40, sk->sk_allocation);
| 824  memcpy(inet->cork.opt, opt, sizeof(struct ip_options)+opt->optlen);

looks unclean (I do not see how opt->optlen is limited to <40) and the size calculation should be replaced by 'optlength(opt)'

Comment 7 Fedora End Of Life 2012-08-16 18:44:07 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping


Note You need to log in before you can comment on or make changes to this bug.