Bug 724995 - xen mmu: fix a race window causing leave_mm BUG()
Summary: xen mmu: fix a race window causing leave_mm BUG()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Radim Krčmář
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-22 14:30 UTC by Radim Krčmář
Modified: 2011-12-06 13:53 UTC (History)
3 users (show)

Fixed In Version: kernel-2.6.32-176.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 13:53:40 UTC
Target Upstream Version:


Attachments (Terms of Use)
xen mmu: fix a race window causing leave_mm BUG() (4.86 KB, patch)
2011-07-22 14:30 UTC, Radim Krčmář
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1530 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update 2011-12-06 01:45:35 UTC

Description Radim Krčmář 2011-07-22 14:30:49 UTC
Created attachment 514712 [details]
xen mmu: fix a race window causing leave_mm BUG()

Cherry-pick 7899891c7d161752f29abcc9bc0a9c6c3a3af26c from upstream.
Original commit message:

    xen mmu: fix a race window causing leave_mm BUG()
    
    There's a race window in xen_drop_mm_ref, where remote cpu may exit
    dirty bitmap between the check on this cpu and the point where remote
    cpu handles drop request. So in drop_other_mm_ref we need check
    whether TLB state is still lazy before calling into leave_mm. This
    bug is rarely observed in earlier kernel, but exaggerated by the
    commit 831d52bc153971b70e64eccfbed2b232394f22f8
    ("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm")
    which clears bitmap after changing the TLB state. the call trace is as below:
    
    ---------------------------------
    kernel BUG at arch/x86/mm/tlb.c:61!
    invalid opcode: 0000 [#1] SMP
    last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
    CPU 1
    Modules linked in: 8021q garp xen_netback xen_blkback blktap blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sb
    Pid: 25581, comm: khelper Not tainted 2.6.32.36fixxen #1 Tecal RH2285
    RIP: e030:[<ffffffff8103a3cb>]  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
    RSP: e02b:ffff88002805be48  EFLAGS: 00010046
    RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88015f8e2da0
    RDX: ffff88002805be78 RSI: 0000000000000000 RDI: 0000000000000001
    RBP: ffff88002805be48 R08: ffff88009d662000 R09: dead000000200200
    R10: dead000000100100 R11: ffffffff814472b2 R12: ffff88009bfc1880
    R13: ffff880028063020 R14: 00000000000004f6 R15: 0000000000000000
    FS:  00007f62362d66e0(0000) GS:ffff880028058000(0000) knlGS:0000000000000000
    CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000003aabc11909 CR3: 000000009b8ca000 CR4: 0000000000002660
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 00
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process khelper (pid: 25581, threadinfo ffff88007691e000, task ffff88009b92db40)
    Stack:
     ffff88002805be68 ffffffff8100e4ae 0000000000000001 ffff88009d733b88
    <0> ffff88002805be98 ffffffff81087224 ffff88002805be78 ffff88002805be78
    <0> ffff88015f808360 00000000000004f6 ffff88002805bea8 ffffffff81010108
    Call Trace:
     <IRQ>
     [<ffffffff8100e4ae>] drop_other_mm_ref+0x2a/0x53
     [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc
     [<ffffffff81010108>] xen_call_function_single_interrupt+0x13/0x28
     [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120
     [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e
     [<ffffffff8128c1c0>] __xen_evtchn_do_upcall+0x1ab/0x27d
     [<ffffffff8128dd11>] xen_evtchn_do_upcall+0x33/0x46
     [<ffffffff81013efe>] xen_do_hyper visor_callback+0x1e/0x30
     <EOI>
     [<ffffffff814472b2>] ? _spin_unlock_irqrestore+0x15/0x17
     [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
     [<ffffffff81113f71>] ? flush_old_exec+0x3ac/0x500
     [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
     [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
     [<ffffffff8115115d>] ? load_elf_binary+0x398/0x17ef
     [<ffffffff81042fcf>] ? need_resched+0x23/0x2d
     [<ffffffff811f4648>] ? process_measurement+0xc0/0xd7
     [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef
     [<ffffffff81113094>] ? search_binary_handler+0xc8/0x255
     [<ffffffff81114362>] ? do_execve+0x1c3/0x29e
     [<ffffffff8101155d>] ? sys_execve+0x43/0x5d
     [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
     [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0
     [<ffffffff 8106fc45>] ? __call_usermodehelper+0x0/0x6f
     [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1
     [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e
     [<ffffffff81013daa>] ? child_rip+0xa/0x20
     [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f
     [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b
     [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6
     [<ffffffff81013da0>] ? child_rip+0x0/0x20
    Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8
    RIP  [<ffffffff8103a3cb>] leave_mm+0x15/0x46
     RSP <ffff88002805be48>
    ---[ end trace ce9cee6832a9c503 ]---

    Tested-by: Maoxiaoyun<tinnycloud>
    Signed-off-by: Kevin Tian <kevin.tian>
    [v1: Fleshed out the git description a bit]
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk>

Comment 2 Qixiang Wan 2011-07-28 11:53:35 UTC
Hi Radim, do you have any instructions for QE to reproduce and verify this bug? or only SanityCheck is enough?

Comment 4 Aristeu Rozanski 2011-08-02 13:58:11 UTC
Patch(es) available on kernel-2.6.32-176.el6

Comment 9 errata-xmlrpc 2011-12-06 13:53:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html


Note You need to log in before you can comment on or make changes to this bug.