Red Hat Bugzilla – Bug 1459056
Boot/Shutdown RT guest with kernel-rt-debug will cause "BUG: sleeping function called from invalid context at kernel/rtmutex.c:818"
Last modified: 2017-08-01 20:26:40 EDT
Description of problem: Install kernel-rt-debug in both host and guest, there will be "Call Trace" and "BUG info" show in #dmesg in host if start or shutdown guest. Version-Release number of selected component (if applicable): 3.10.0-678.rt56.600.el7.x86_64.debug(in both host and guest) qemu-kvm-rhev-2.9.0-7.el7.x86_64 libvirt-3.2.0-6.el7.x86_64 tuned-2.8.0-3.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Boot rt VM, there are error info in host. But no error info in guest. # dmesg ... [ 100.466864] BUG: sleeping function called from invalid context at kernel/rtmutex.c:818 [ 100.466865] in_atomic(): 0, irqs_disabled(): 1, pid: 2674, name: qemu-kvm [ 100.466865] INFO: lockdep is turned off. [ 100.466866] irq event stamp: 228992 [ 100.466870] hardirqs last enabled at (228991): [<ffffffff8176ef65>] _raw_spin_unlock_irqrestore+0x55/0xb0 [ 100.466871] hardirqs last disabled at (228992): [<ffffffff8176f093>] _raw_spin_lock_irqsave+0x43/0xc0 [ 100.466875] softirqs last enabled at (0): [<ffffffff8108488a>] copy_process+0x82a/0x1e60 [ 100.466876] softirqs last disabled at (0): [< (null)>] (null) [ 100.466877] CPU: 5 PID: 2674 Comm: qemu-kvm Not tainted 3.10.0-675.rt56.597.el7.x86_64.debug #1 [ 100.466878] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.0.1 04/11/2016 [ 100.466880] ffff881052554000 00000000ebb8d451 ffff8810447cbc68 ffffffff817657fb [ 100.466881] ffff8810447cbc90 ffffffff810cfb0d ffff881035f8f910 ffff881035f8f910 [ 100.466882] 00007fff71b84ac0 ffff8810447cbcb0 ffffffff8176e274 ffff881035f8f910 [ 100.466883] Call Trace: [ 100.466886] [<ffffffff817657fb>] dump_stack+0x19/0x1b [ 100.466890] [<ffffffff810cfb0d>] __might_sleep+0x12d/0x1f0 [ 100.466892] [<ffffffff8176e274>] rt_spin_lock+0x24/0x60 [ 100.466906] [<ffffffffc05c69a6>] __get_kvmclock_ns+0x36/0x110 [kvm] [ 100.466909] [<ffffffff811159d3>] ? futex_wait_queue_me+0x103/0x160 [ 100.466920] [<ffffffffc05d37a2>] kvm_arch_vm_ioctl+0xa2/0xd70 [kvm] [ 100.466923] [<ffffffff8111637c>] ? futex_wait+0x1ac/0x2a0 [ 100.466931] [<ffffffffc05bd104>] kvm_vm_ioctl+0xa4/0x850 [kvm] [ 100.466933] [<ffffffff811157d3>] ? futex_wake+0x93/0x190 [ 100.466940] [<ffffffffc05b980a>] ? kvm_dev_ioctl+0x1ca/0x740 [kvm] [ 100.466942] [<ffffffff81254805>] do_vfs_ioctl+0x365/0x5b0 [ 100.466944] [<ffffffff8126184f>] ? fget_light+0xef/0x500 [ 100.466945] [<ffffffff81254af1>] SyS_ioctl+0xa1/0xc0 [ 100.466948] [<ffffffff817781ed>] tracesys+0xdd/0xe2 [ 100.466948] --------------------------- [ 100.466949] | preempt count: 00000000 ] [ 100.466949] | 0-level deep critical section nesting: [ 100.466950] ---------------------------------------- Actual results: There are error log in host. Expected results: There should be no error. Additional info: 1. 3.10.0-327.rt56.204.el7.x86_64.debug also hit issues. Both host and guest show 'Call Trace' and 'BUG' info in #dmesg like below. So perhaps this is not a regression. # dmesg [ 2600.420836] BUG: sleeping function called from invalid context at kernel/rtmutex.c:729 [ 2600.420837] in_atomic(): 1, irqs_disabled(): 0, pid: 13201, name: bash [ 2600.420837] INFO: lockdep is turned off. [ 2600.420839] CPU: 0 PID: 13201 Comm: bash Not tainted 3.10.0-327.rt56.204.el7.x86_64.debug #1 [ 2600.420839] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.7 06/16/2016 [ 2600.420842] ffff8807a4e0d060 000000008b20534a ffff88078b067bd0 ffffffff816e679c [ 2600.420843] ffff88078b067bf8 ffffffff810be49d ffffc9010eb33098 0000000000000000 [ 2600.420844] ffffc9010eb33070 ffff88078b067c18 ffffffff816ed9b4 0000000000000000 [ 2600.420845] Call Trace: [ 2600.420850] [<ffffffff816e679c>] dump_stack+0x19/0x1b [ 2600.420852] [<ffffffff810be49d>] __might_sleep+0x12d/0x1f0 [ 2600.420855] [<ffffffff816ed9b4>] rt_spin_lock+0x24/0x60 [ 2600.420857] [<ffffffff8111f091>] res_counter_uncharge_until+0x51/0xc0 [ 2600.420858] [<ffffffff8111f113>] res_counter_uncharge+0x13/0x20 [ 2600.420860] [<ffffffff811fcf33>] drain_stock.isra.28+0x43/0x80 [ 2600.420863] [<ffffffff81202693>] __mem_cgroup_try_charge+0x603/0x9d0 [ 2600.420864] [<ffffffff81202360>] ? __mem_cgroup_try_charge+0x2d0/0x9d0 [ 2600.420866] [<ffffffff812031c6>] mem_cgroup_charge_common+0x46/0x90 [ 2600.420868] [<ffffffff812055bf>] mem_cgroup_newpage_charge+0x3f/0x60 [ 2600.420870] [<ffffffff811c4494>] do_wp_page+0x194/0x8f0 [ 2600.420872] [<ffffffff811c72cb>] ? handle_mm_fault+0x26b/0xdf0 [ 2600.420873] [<ffffffff811c767c>] handle_mm_fault+0x61c/0xdf0 [ 2600.420875] [<ffffffff816f2a26>] ? __do_page_fault+0x216/0x530 [ 2600.420876] [<ffffffff816f2a88>] __do_page_fault+0x278/0x530 [ 2600.420879] [<ffffffff810f1e0d>] ? trace_hardirqs_on+0xd/0x10 [ 2600.420880] [<ffffffff816f2d63>] do_page_fault+0x23/0x80 [ 2600.420882] [<ffffffff816ef018>] page_fault+0x28/0x30 2. This bug was found when verifying bug[1] [1]Bug 1416403 - kvm-rt: change async pagefault code locking for rt-preempt
Additional info(cont): 3. Start or shutdown guest will both cause call trace in host.
Re-assining to Luis Claudio, he's already backported the fix and is preparing a build.
==Verification== Versions: 3.10.0-679.rt56.602.el7.x86_64.debug(in both host&guest) qemu-kvm-rhev-2.9.0-7.el7.x86_64 libvirt-3.2.0-6.el7.x86_64 tuned-2.8.0-4.el7.noarch Steps: 1. Reboot rt host 2. First boot rt guest hit issue[1] 3. Repeat boot/shutdown/reboot rt guest several times, no error info shows in both host and guest. Hi Luis, there is still a problem with kernel-rt-debug in step2, as the issue looks different, so I filed a new bug[1]. Please feel free to set it as a duplicate bug and fail on_qa this bug. [1] Bug 1459739 - Call Trace shows in rt host with kernel-rt-debug when first booting rt guest Best Regards, Pei
==Re-verification== Versions: 3.10.0-680.rt56.604.el7.x86_64.debug(in both host and guest) tuned-2.8.0-5.el7.noarch qemu-kvm-rhev-2.9.0-9.el7.x86_64 libvirt-3.2.0-9.el7.x86_64 Steps: 1. Boot rt host 2. Boot rt guest 3. Repeat reboot/shutdown/boot rt guest several times, no error info shows in both host and guest. So this bug has been fixed well. Thanks. Move status of this bug to 'VERIFIED'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2077