Bug 1459056

Summary: Boot/Shutdown RT guest with kernel-rt-debug will cause "BUG: sleeping function called from invalid context at kernel/rtmutex.c:818"
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: kernel-rtAssignee: Luis Claudio R. Goncalves <lgoncalv>
kernel-rt sub component: KVM QA Contact: Pei Zhang <pezhang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: bhu, chayang, juzhang, knoel, lgoncalv, michen, mtosatti, pagupta, riel, virt-maint, williams, xiywang
Version: 7.4   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 19:05:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1175461    
Bug Blocks: 1353018    

Description Pei Zhang 2017-06-06 08:06:43 UTC
Description of problem:
Install kernel-rt-debug in both host and guest, there will be "Call Trace" and "BUG info" show in #dmesg in host if start or shutdown guest.


Version-Release number of selected component (if applicable):
3.10.0-678.rt56.600.el7.x86_64.debug(in both host and guest)
qemu-kvm-rhev-2.9.0-7.el7.x86_64
libvirt-3.2.0-6.el7.x86_64
tuned-2.8.0-3.el7.noarch


How reproducible:
100%


Steps to Reproduce:
1. Boot rt VM, there are error info in host. But no error info in guest.

# dmesg
...
[  100.466864] BUG: sleeping function called from invalid context at kernel/rtmutex.c:818
[  100.466865] in_atomic(): 0, irqs_disabled(): 1, pid: 2674, name: qemu-kvm
[  100.466865] INFO: lockdep is turned off.
[  100.466866] irq event stamp: 228992
[  100.466870] hardirqs last  enabled at (228991): [<ffffffff8176ef65>] _raw_spin_unlock_irqrestore+0x55/0xb0
[  100.466871] hardirqs last disabled at (228992): [<ffffffff8176f093>] _raw_spin_lock_irqsave+0x43/0xc0
[  100.466875] softirqs last  enabled at (0): [<ffffffff8108488a>] copy_process+0x82a/0x1e60
[  100.466876] softirqs last disabled at (0): [<          (null)>]           (null)
[  100.466877] CPU: 5 PID: 2674 Comm: qemu-kvm Not tainted 3.10.0-675.rt56.597.el7.x86_64.debug #1
[  100.466878] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.0.1 04/11/2016
[  100.466880]  ffff881052554000 00000000ebb8d451 ffff8810447cbc68 ffffffff817657fb
[  100.466881]  ffff8810447cbc90 ffffffff810cfb0d ffff881035f8f910 ffff881035f8f910
[  100.466882]  00007fff71b84ac0 ffff8810447cbcb0 ffffffff8176e274 ffff881035f8f910
[  100.466883] Call Trace:
[  100.466886]  [<ffffffff817657fb>] dump_stack+0x19/0x1b
[  100.466890]  [<ffffffff810cfb0d>] __might_sleep+0x12d/0x1f0
[  100.466892]  [<ffffffff8176e274>] rt_spin_lock+0x24/0x60
[  100.466906]  [<ffffffffc05c69a6>] __get_kvmclock_ns+0x36/0x110 [kvm]
[  100.466909]  [<ffffffff811159d3>] ? futex_wait_queue_me+0x103/0x160
[  100.466920]  [<ffffffffc05d37a2>] kvm_arch_vm_ioctl+0xa2/0xd70 [kvm]
[  100.466923]  [<ffffffff8111637c>] ? futex_wait+0x1ac/0x2a0
[  100.466931]  [<ffffffffc05bd104>] kvm_vm_ioctl+0xa4/0x850 [kvm]
[  100.466933]  [<ffffffff811157d3>] ? futex_wake+0x93/0x190
[  100.466940]  [<ffffffffc05b980a>] ? kvm_dev_ioctl+0x1ca/0x740 [kvm]
[  100.466942]  [<ffffffff81254805>] do_vfs_ioctl+0x365/0x5b0
[  100.466944]  [<ffffffff8126184f>] ? fget_light+0xef/0x500
[  100.466945]  [<ffffffff81254af1>] SyS_ioctl+0xa1/0xc0
[  100.466948]  [<ffffffff817781ed>] tracesys+0xdd/0xe2
[  100.466948] ---------------------------
[  100.466949] | preempt count: 00000000 ]
[  100.466949] | 0-level deep critical section nesting:
[  100.466950] ----------------------------------------

Actual results:
There are error log in host.

Expected results:
There should be no error.

Additional info:
1. 3.10.0-327.rt56.204.el7.x86_64.debug also hit issues. Both host and guest show 'Call Trace' and 'BUG' info in #dmesg like below. So perhaps this is not a regression.

# dmesg
[ 2600.420836] BUG: sleeping function called from invalid context at kernel/rtmutex.c:729
[ 2600.420837] in_atomic(): 1, irqs_disabled(): 0, pid: 13201, name: bash
[ 2600.420837] INFO: lockdep is turned off.
[ 2600.420839] CPU: 0 PID: 13201 Comm: bash Not tainted 3.10.0-327.rt56.204.el7.x86_64.debug #1
[ 2600.420839] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.7 06/16/2016
[ 2600.420842]  ffff8807a4e0d060 000000008b20534a ffff88078b067bd0 ffffffff816e679c
[ 2600.420843]  ffff88078b067bf8 ffffffff810be49d ffffc9010eb33098 0000000000000000
[ 2600.420844]  ffffc9010eb33070 ffff88078b067c18 ffffffff816ed9b4 0000000000000000
[ 2600.420845] Call Trace:
[ 2600.420850]  [<ffffffff816e679c>] dump_stack+0x19/0x1b
[ 2600.420852]  [<ffffffff810be49d>] __might_sleep+0x12d/0x1f0
[ 2600.420855]  [<ffffffff816ed9b4>] rt_spin_lock+0x24/0x60
[ 2600.420857]  [<ffffffff8111f091>] res_counter_uncharge_until+0x51/0xc0
[ 2600.420858]  [<ffffffff8111f113>] res_counter_uncharge+0x13/0x20
[ 2600.420860]  [<ffffffff811fcf33>] drain_stock.isra.28+0x43/0x80
[ 2600.420863]  [<ffffffff81202693>] __mem_cgroup_try_charge+0x603/0x9d0
[ 2600.420864]  [<ffffffff81202360>] ? __mem_cgroup_try_charge+0x2d0/0x9d0
[ 2600.420866]  [<ffffffff812031c6>] mem_cgroup_charge_common+0x46/0x90
[ 2600.420868]  [<ffffffff812055bf>] mem_cgroup_newpage_charge+0x3f/0x60
[ 2600.420870]  [<ffffffff811c4494>] do_wp_page+0x194/0x8f0
[ 2600.420872]  [<ffffffff811c72cb>] ? handle_mm_fault+0x26b/0xdf0
[ 2600.420873]  [<ffffffff811c767c>] handle_mm_fault+0x61c/0xdf0
[ 2600.420875]  [<ffffffff816f2a26>] ? __do_page_fault+0x216/0x530
[ 2600.420876]  [<ffffffff816f2a88>] __do_page_fault+0x278/0x530
[ 2600.420879]  [<ffffffff810f1e0d>] ? trace_hardirqs_on+0xd/0x10
[ 2600.420880]  [<ffffffff816f2d63>] do_page_fault+0x23/0x80
[ 2600.420882]  [<ffffffff816ef018>] page_fault+0x28/0x30

2. This bug was found when verifying bug[1]
[1]Bug 1416403 - kvm-rt: change async pagefault code locking for rt-preempt

Comment 2 Pei Zhang 2017-06-06 08:26:38 UTC
Additional info(cont):
3. Start or shutdown guest will both cause call trace in host.

Comment 7 Luiz Capitulino 2017-06-07 13:22:46 UTC
Re-assining to Luis Claudio, he's already backported the fix and is preparing a build.

Comment 10 Pei Zhang 2017-06-08 03:37:17 UTC
==Verification==
Versions:
3.10.0-679.rt56.602.el7.x86_64.debug(in both host&guest)
qemu-kvm-rhev-2.9.0-7.el7.x86_64
libvirt-3.2.0-6.el7.x86_64
tuned-2.8.0-4.el7.noarch


Steps:
1. Reboot rt host

2. First boot rt guest hit issue[1]

3. Repeat boot/shutdown/reboot rt guest several times, no error info shows in both host and guest.


Hi Luis, there is still a problem with kernel-rt-debug in step2, as the issue looks different, so I filed a new bug[1]. Please feel free to set it as a duplicate bug and fail on_qa this bug.


[1] Bug 1459739 - Call Trace shows in rt host with kernel-rt-debug when first booting rt guest



Best Regards,
Pei

Comment 11 Pei Zhang 2017-06-14 01:56:42 UTC
==Re-verification==

Versions:
3.10.0-680.rt56.604.el7.x86_64.debug(in both host and guest)
tuned-2.8.0-5.el7.noarch
qemu-kvm-rhev-2.9.0-9.el7.x86_64
libvirt-3.2.0-9.el7.x86_64


Steps:

1. Boot rt host

2. Boot rt guest

3. Repeat reboot/shutdown/boot rt guest several times, no error info shows in both host and guest.


So this bug has been fixed well. Thanks.


Move status of this bug to 'VERIFIED'.

Comment 12 errata-xmlrpc 2017-08-01 19:05:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077

Comment 13 errata-xmlrpc 2017-08-02 00:26:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077