Bug 1459056 - Boot/Shutdown RT guest with kernel-rt-debug will cause "BUG: sleeping function called from invalid context at kernel/rtmutex.c:818"
Boot/Shutdown RT guest with kernel-rt-debug will cause "BUG: sleeping functio...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt (Show other bugs)
7.4
x86_64 Linux
medium Severity medium
: rc
: ---
Assigned To: Luis Claudio R. Goncalves
Pei Zhang
:
Depends On: 1175461
Blocks: 1353018
  Show dependency treegraph
 
Reported: 2017-06-06 04:06 EDT by Pei Zhang
Modified: 2017-08-01 20:26 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 15:05:29 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pei Zhang 2017-06-06 04:06:43 EDT
Description of problem:
Install kernel-rt-debug in both host and guest, there will be "Call Trace" and "BUG info" show in #dmesg in host if start or shutdown guest.


Version-Release number of selected component (if applicable):
3.10.0-678.rt56.600.el7.x86_64.debug(in both host and guest)
qemu-kvm-rhev-2.9.0-7.el7.x86_64
libvirt-3.2.0-6.el7.x86_64
tuned-2.8.0-3.el7.noarch


How reproducible:
100%


Steps to Reproduce:
1. Boot rt VM, there are error info in host. But no error info in guest.

# dmesg
...
[  100.466864] BUG: sleeping function called from invalid context at kernel/rtmutex.c:818
[  100.466865] in_atomic(): 0, irqs_disabled(): 1, pid: 2674, name: qemu-kvm
[  100.466865] INFO: lockdep is turned off.
[  100.466866] irq event stamp: 228992
[  100.466870] hardirqs last  enabled at (228991): [<ffffffff8176ef65>] _raw_spin_unlock_irqrestore+0x55/0xb0
[  100.466871] hardirqs last disabled at (228992): [<ffffffff8176f093>] _raw_spin_lock_irqsave+0x43/0xc0
[  100.466875] softirqs last  enabled at (0): [<ffffffff8108488a>] copy_process+0x82a/0x1e60
[  100.466876] softirqs last disabled at (0): [<          (null)>]           (null)
[  100.466877] CPU: 5 PID: 2674 Comm: qemu-kvm Not tainted 3.10.0-675.rt56.597.el7.x86_64.debug #1
[  100.466878] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.0.1 04/11/2016
[  100.466880]  ffff881052554000 00000000ebb8d451 ffff8810447cbc68 ffffffff817657fb
[  100.466881]  ffff8810447cbc90 ffffffff810cfb0d ffff881035f8f910 ffff881035f8f910
[  100.466882]  00007fff71b84ac0 ffff8810447cbcb0 ffffffff8176e274 ffff881035f8f910
[  100.466883] Call Trace:
[  100.466886]  [<ffffffff817657fb>] dump_stack+0x19/0x1b
[  100.466890]  [<ffffffff810cfb0d>] __might_sleep+0x12d/0x1f0
[  100.466892]  [<ffffffff8176e274>] rt_spin_lock+0x24/0x60
[  100.466906]  [<ffffffffc05c69a6>] __get_kvmclock_ns+0x36/0x110 [kvm]
[  100.466909]  [<ffffffff811159d3>] ? futex_wait_queue_me+0x103/0x160
[  100.466920]  [<ffffffffc05d37a2>] kvm_arch_vm_ioctl+0xa2/0xd70 [kvm]
[  100.466923]  [<ffffffff8111637c>] ? futex_wait+0x1ac/0x2a0
[  100.466931]  [<ffffffffc05bd104>] kvm_vm_ioctl+0xa4/0x850 [kvm]
[  100.466933]  [<ffffffff811157d3>] ? futex_wake+0x93/0x190
[  100.466940]  [<ffffffffc05b980a>] ? kvm_dev_ioctl+0x1ca/0x740 [kvm]
[  100.466942]  [<ffffffff81254805>] do_vfs_ioctl+0x365/0x5b0
[  100.466944]  [<ffffffff8126184f>] ? fget_light+0xef/0x500
[  100.466945]  [<ffffffff81254af1>] SyS_ioctl+0xa1/0xc0
[  100.466948]  [<ffffffff817781ed>] tracesys+0xdd/0xe2
[  100.466948] ---------------------------
[  100.466949] | preempt count: 00000000 ]
[  100.466949] | 0-level deep critical section nesting:
[  100.466950] ----------------------------------------

Actual results:
There are error log in host.

Expected results:
There should be no error.

Additional info:
1. 3.10.0-327.rt56.204.el7.x86_64.debug also hit issues. Both host and guest show 'Call Trace' and 'BUG' info in #dmesg like below. So perhaps this is not a regression.

# dmesg
[ 2600.420836] BUG: sleeping function called from invalid context at kernel/rtmutex.c:729
[ 2600.420837] in_atomic(): 1, irqs_disabled(): 0, pid: 13201, name: bash
[ 2600.420837] INFO: lockdep is turned off.
[ 2600.420839] CPU: 0 PID: 13201 Comm: bash Not tainted 3.10.0-327.rt56.204.el7.x86_64.debug #1
[ 2600.420839] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.7 06/16/2016
[ 2600.420842]  ffff8807a4e0d060 000000008b20534a ffff88078b067bd0 ffffffff816e679c
[ 2600.420843]  ffff88078b067bf8 ffffffff810be49d ffffc9010eb33098 0000000000000000
[ 2600.420844]  ffffc9010eb33070 ffff88078b067c18 ffffffff816ed9b4 0000000000000000
[ 2600.420845] Call Trace:
[ 2600.420850]  [<ffffffff816e679c>] dump_stack+0x19/0x1b
[ 2600.420852]  [<ffffffff810be49d>] __might_sleep+0x12d/0x1f0
[ 2600.420855]  [<ffffffff816ed9b4>] rt_spin_lock+0x24/0x60
[ 2600.420857]  [<ffffffff8111f091>] res_counter_uncharge_until+0x51/0xc0
[ 2600.420858]  [<ffffffff8111f113>] res_counter_uncharge+0x13/0x20
[ 2600.420860]  [<ffffffff811fcf33>] drain_stock.isra.28+0x43/0x80
[ 2600.420863]  [<ffffffff81202693>] __mem_cgroup_try_charge+0x603/0x9d0
[ 2600.420864]  [<ffffffff81202360>] ? __mem_cgroup_try_charge+0x2d0/0x9d0
[ 2600.420866]  [<ffffffff812031c6>] mem_cgroup_charge_common+0x46/0x90
[ 2600.420868]  [<ffffffff812055bf>] mem_cgroup_newpage_charge+0x3f/0x60
[ 2600.420870]  [<ffffffff811c4494>] do_wp_page+0x194/0x8f0
[ 2600.420872]  [<ffffffff811c72cb>] ? handle_mm_fault+0x26b/0xdf0
[ 2600.420873]  [<ffffffff811c767c>] handle_mm_fault+0x61c/0xdf0
[ 2600.420875]  [<ffffffff816f2a26>] ? __do_page_fault+0x216/0x530
[ 2600.420876]  [<ffffffff816f2a88>] __do_page_fault+0x278/0x530
[ 2600.420879]  [<ffffffff810f1e0d>] ? trace_hardirqs_on+0xd/0x10
[ 2600.420880]  [<ffffffff816f2d63>] do_page_fault+0x23/0x80
[ 2600.420882]  [<ffffffff816ef018>] page_fault+0x28/0x30

2. This bug was found when verifying bug[1]
[1]Bug 1416403 - kvm-rt: change async pagefault code locking for rt-preempt
Comment 2 Pei Zhang 2017-06-06 04:26:38 EDT
Additional info(cont):
3. Start or shutdown guest will both cause call trace in host.
Comment 7 Luiz Capitulino 2017-06-07 09:22:46 EDT
Re-assining to Luis Claudio, he's already backported the fix and is preparing a build.
Comment 10 Pei Zhang 2017-06-07 23:37:17 EDT
==Verification==
Versions:
3.10.0-679.rt56.602.el7.x86_64.debug(in both host&guest)
qemu-kvm-rhev-2.9.0-7.el7.x86_64
libvirt-3.2.0-6.el7.x86_64
tuned-2.8.0-4.el7.noarch


Steps:
1. Reboot rt host

2. First boot rt guest hit issue[1]

3. Repeat boot/shutdown/reboot rt guest several times, no error info shows in both host and guest.


Hi Luis, there is still a problem with kernel-rt-debug in step2, as the issue looks different, so I filed a new bug[1]. Please feel free to set it as a duplicate bug and fail on_qa this bug.


[1] Bug 1459739 - Call Trace shows in rt host with kernel-rt-debug when first booting rt guest



Best Regards,
Pei
Comment 11 Pei Zhang 2017-06-13 21:56:42 EDT
==Re-verification==

Versions:
3.10.0-680.rt56.604.el7.x86_64.debug(in both host and guest)
tuned-2.8.0-5.el7.noarch
qemu-kvm-rhev-2.9.0-9.el7.x86_64
libvirt-3.2.0-9.el7.x86_64


Steps:

1. Boot rt host

2. Boot rt guest

3. Repeat reboot/shutdown/boot rt guest several times, no error info shows in both host and guest.


So this bug has been fixed well. Thanks.


Move status of this bug to 'VERIFIED'.
Comment 12 errata-xmlrpc 2017-08-01 15:05:29 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077
Comment 13 errata-xmlrpc 2017-08-01 20:26:40 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2077

Note You need to log in before you can comment on or make changes to this bug.