Bug 2213852

Summary: Processes of VMs hang if cpu limits are applied.
Product: Red Hat Enterprise Linux 7 Reporter: redhat
Component: kernelAssignee: Phil Auld <pauld>
kernel sub component: Scheduler QA Contact: Chunyu Hu <chuhu>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: bhu
Version: 7.8Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-21 12:22:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lshw on the node where the issue reproduces. none

Description redhat 2023-06-09 15:04:08 UTC
Created attachment 1969966 [details]
lshw on the node where the issue reproduces.

Description of problem:

While live migrating VMs with CPU limits from a host with kernel v. 3.10.0-1160.90.1.el7.x86_64, they start to hang. VMs can be “unhanged” by removing CPU limits. The issue does not reproduce with VMs without CPU limits.

Version-Release number of selected component (if applicable):

 3.10.0-1160.90.1.el7

How reproducible:

It takes between 2 and 4 hours to reproduce under (our) lab conditions.

Steps to Reproduce:

1. On a machine with 256GiB RAM and more than one NUMA node, start 31 VMs with 2 VCPUs, 8 GiB of memory, and cpu limits:
<cputune>
	<shares>4000</shares>
	<period>10000</period>
	<quota>6200</quota>
  </cputune>

2. Start some load in the VMs (for example, memtest & stress -c)

3. Every 20 seconds shutdown one VM and start another


Actual results:

VMs hang with the following logged:

…
[Wed Jun  7 11:09:19 2023] INFO: task qemu-kvm:24375 blocked for more than 120 seconds.
[Wed Jun  7 11:09:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:09:19 2023] qemu-kvm    	D ffff9532fba43840 	0 24375  	1 0x00000080
[Wed Jun  7 11:09:19 2023] Call Trace:
[Wed Jun  7 11:09:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:09:19 2023]  [<ffffffff991b9455>] rwsem_down_write_failed+0x215/0x3c0
[Wed Jun  7 11:09:19 2023]  [<ffffffff98dae417>] call_rwsem_down_write_failed+0x17/0x30
[Wed Jun  7 11:09:19 2023]  [<ffffffff991b716d>] down_write+0x2d/0x41
[Wed Jun  7 11:09:19 2023]  [<ffffffff98c080d4>] vm_munmap+0x54/0xb0
[Wed Jun  7 11:09:19 2023]  [<ffffffff98c09412>] SyS_munmap+0x22/0x30
[Wed Jun  7 11:09:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
[Wed Jun  7 11:09:19 2023] INFO: task qemu-kvm:24376 blocked for more than 120 seconds.
[Wed Jun  7 11:09:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:09:19 2023] qemu-kvm    	D ffff9532fc73be80 	0 24376  	1 0x00000080
[Wed Jun  7 11:09:19 2023] Call Trace:
[Wed Jun  7 11:09:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:09:19 2023]  [<ffffffff991b9705>] rwsem_down_read_failed+0x105/0x1d0
[Wed Jun  7 11:09:19 2023]  [<ffffffff98dae3e8>] call_rwsem_down_read_failed+0x18/0x30
[Wed Jun  7 11:09:19 2023]  [<ffffffff991b7120>] down_read+0x20/0x40
[Wed Jun  7 11:09:19 2023]  [<ffffffff98bf9799>] SyS_madvise+0x639/0xac0
[Wed Jun  7 11:09:19 2023]  [<ffffffffc08e79a5>] ? kvm_on_user_return+0x75/0xb0 [kvm]
[Wed Jun  7 11:09:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
[Wed Jun  7 11:09:19 2023] INFO: task qemu-kvm:24377 blocked for more than 120 seconds.
[Wed Jun  7 11:09:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:09:19 2023] qemu-kvm    	D ffff9532fba43840 	0 24377  	1 0x00000080
[Wed Jun  7 11:09:19 2023] Call Trace:
[Wed Jun  7 11:09:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:09:19 2023]  [<ffffffff991b9705>] rwsem_down_read_failed+0x105/0x1d0
[Wed Jun  7 11:09:19 2023]  [<ffffffff98dae3e8>] call_rwsem_down_read_failed+0x18/0x30
[Wed Jun  7 11:09:19 2023]  [<ffffffff991b7120>] down_read+0x20/0x40
[Wed Jun  7 11:09:19 2023]  [<ffffffff98bf9799>] SyS_madvise+0x639/0xac0
[Wed Jun  7 11:09:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
[Wed Jun  7 11:11:19 2023] INFO: task qemu-kvm:24375 blocked for more than 120 seconds.
[Wed Jun  7 11:11:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:11:19 2023] qemu-kvm    	D ffff9532fba43840 	0 24375  	1 0x00000080
[Wed Jun  7 11:11:19 2023] Call Trace:
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b9455>] rwsem_down_write_failed+0x215/0x3c0
[Wed Jun  7 11:11:19 2023]  [<ffffffff98dae417>] call_rwsem_down_write_failed+0x17/0x30
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b716d>] down_write+0x2d/0x41
[Wed Jun  7 11:11:19 2023]  [<ffffffff98c080d4>] vm_munmap+0x54/0xb0
[Wed Jun  7 11:11:19 2023]  [<ffffffff98c09412>] SyS_munmap+0x22/0x30
[Wed Jun  7 11:11:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
[Wed Jun  7 11:11:19 2023] INFO: task qemu-kvm:24376 blocked for more than 120 seconds.
[Wed Jun  7 11:11:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:11:19 2023] qemu-kvm    	D ffff9532fc73be80 	0 24376  	1 0x00000080
[Wed Jun  7 11:11:19 2023] Call Trace:
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b9705>] rwsem_down_read_failed+0x105/0x1d0
[Wed Jun  7 11:11:19 2023]  [<ffffffff98dae3e8>] call_rwsem_down_read_failed+0x18/0x30
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b7120>] down_read+0x20/0x40
[Wed Jun  7 11:11:19 2023]  [<ffffffff98bf9799>] SyS_madvise+0x639/0xac0
[Wed Jun  7 11:11:19 2023]  [<ffffffffc08e79a5>] ? kvm_on_user_return+0x75/0xb0 [kvm]
[Wed Jun  7 11:11:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
[Wed Jun  7 11:11:19 2023] INFO: task qemu-kvm:24377 blocked for more than 120 seconds.
[Wed Jun  7 11:11:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:11:19 2023] qemu-kvm    	D ffff9532fba43840 	0 24377  	1 0x00000080
[Wed Jun  7 11:11:19 2023] Call Trace:
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b9705>] rwsem_down_read_failed+0x105/0x1d0
[Wed Jun  7 11:11:19 2023]  [<ffffffff98dae3e8>] call_rwsem_down_read_failed+0x18/0x30
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b7120>] down_read+0x20/0x40
[Wed Jun  7 11:11:19 2023]  [<ffffffff98bf9799>] SyS_madvise+0x639/0xac0
[Wed Jun  7 11:11:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
[Wed Jun  7 11:11:19 2023] INFO: task pgrep:24800 blocked for more than 120 seconds.
[Wed Jun  7 11:11:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:11:19 2023] pgrep       	D ffff9510efa61660 	0 24800  24799 0x00000080
[Wed Jun  7 11:11:19 2023] Call Trace:
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b9705>] rwsem_down_read_failed+0x105/0x1d0
[Wed Jun  7 11:11:19 2023]  [<ffffffff98dae3e8>] call_rwsem_down_read_failed+0x18/0x30
[Wed Jun  7 11:11:19 2023]  [<ffffffff991b7120>] down_read+0x20/0x40
[Wed Jun  7 11:11:19 2023]  [<ffffffff98cdb1b2>] proc_pid_cmdline_read+0xb2/0x5e0
[Wed Jun  7 11:11:19 2023]  [<ffffffff98d1b174>] ? security_file_permission+0x94/0xb0
[Wed Jun  7 11:11:19 2023]  [<ffffffff98c5bb2f>] vfs_read+0x9f/0x170
[Wed Jun  7 11:11:19 2023]  [<ffffffff98c5c9a5>] SyS_read+0x55/0xd0
[Wed Jun  7 11:11:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
[Wed Jun  7 11:13:19 2023] INFO: task qemu-kvm:24375 blocked for more than 120 seconds.
[Wed Jun  7 11:13:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:13:19 2023] qemu-kvm    	D ffff9532fba43840 	0 24375  	1 0x00000080
[Wed Jun  7 11:13:19 2023] Call Trace:
[Wed Jun  7 11:13:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:13:19 2023]  [<ffffffff991b9455>] rwsem_down_write_failed+0x215/0x3c0
[Wed Jun  7 11:13:19 2023]  [<ffffffff98dae417>] call_rwsem_down_write_failed+0x17/0x30
[Wed Jun  7 11:13:19 2023]  [<ffffffff991b716d>] down_write+0x2d/0x41
[Wed Jun  7 11:13:19 2023]  [<ffffffff98c080d4>] vm_munmap+0x54/0xb0
[Wed Jun  7 11:13:19 2023]  [<ffffffff98c09412>] SyS_munmap+0x22/0x30
[Wed Jun  7 11:13:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
[Wed Jun  7 11:13:19 2023] INFO: task qemu-kvm:24376 blocked for more than 120 seconds.
[Wed Jun  7 11:13:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:13:19 2023] qemu-kvm    	D ffff9532fc73be80 	0 24376  	1 0x00000080
[Wed Jun  7 11:13:19 2023] Call Trace:
[Wed Jun  7 11:13:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:13:19 2023]  [<ffffffff991b9705>] rwsem_down_read_failed+0x105/0x1d0
[Wed Jun  7 11:13:19 2023]  [<ffffffff98dae3e8>] call_rwsem_down_read_failed+0x18/0x30
[Wed Jun  7 11:13:19 2023]  [<ffffffff991b7120>] down_read+0x20/0x40
[Wed Jun  7 11:13:19 2023]  [<ffffffff98bf9799>] SyS_madvise+0x639/0xac0
[Wed Jun  7 11:13:19 2023]  [<ffffffffc08e79a5>] ? kvm_on_user_return+0x75/0xb0 [kvm]
[Wed Jun  7 11:13:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
[Wed Jun  7 11:13:19 2023] INFO: task qemu-kvm:24377 blocked for more than 120 seconds.
[Wed Jun  7 11:13:19 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jun  7 11:13:19 2023] qemu-kvm    	D ffff9532fba43840 	0 24377  	1 0x00000080
[Wed Jun  7 11:13:19 2023] Call Trace:
[Wed Jun  7 11:13:19 2023]  [<ffffffff991b7ca9>] schedule+0x29/0x70
[Wed Jun  7 11:13:19 2023]  [<ffffffff991b9705>] rwsem_down_read_failed+0x105/0x1d0
[Wed Jun  7 11:13:19 2023]  [<ffffffff98dae3e8>] call_rwsem_down_read_failed+0x18/0x30
[Wed Jun  7 11:13:19 2023]  [<ffffffff991b7120>] down_read+0x20/0x40
[Wed Jun  7 11:13:19 2023]  [<ffffffff98bf9799>] SyS_madvise+0x639/0xac0
[Wed Jun  7 11:13:19 2023]  [<ffffffff991c539a>] system_call_fastpath+0x25/0x2a
...


Expected results:

The VMs shouldn't hang.




Additional Info:

To list machine.slice processes with CPU limits that are set, use the following command:
find /sys/fs/cgroup/cpu,cpuacct/machine.slice   -name cpu.cfs_quota_us -print0 |xargs  -0 grep . | grep -v -e :-1

То remove CPU limits for all processes in machine.slice use the following command:
for i in `find /sys/fs/cgroup/cpu,cpuacct/machine.slice/ -name cpu.cfs_quota_us  `; do echo -1 > "$i"; done

The removal of the limits unblocks the hanging processes.

Comment 4 Phil Auld 2023-06-14 14:30:27 UTC
Thanks for raising this. Could you please open a support ticket for this issue? That will be a good way to help see what we can do here.

Cheers,
Phil

Comment 5 redhat 2023-06-14 14:34:32 UTC
Hi Phil,

Not sure in which system I need to open this ticket, and as this looks to be a bug in the kernel, isn't this the right place?

Comment 6 Phil Auld 2023-06-14 14:47:40 UTC
At the Red Hat Customer support portal which I think is here: https://access.redhat.com/support/

Yes, it is a kernel bug but 7.9.z is in a late maintenance phase so it would be good to see a support ticket. 

That said, I have other cases of this but no good way to reproduce it. You seem to have a way to reproduce it. We might be able to help each other :)

Comment 7 redhat 2023-06-14 16:02:05 UTC
I have opened case  03538458. Also, we have the environment and everything needed to reproduce it, I can provide you with the details or just give you access to the test node in question, you can ping me directly at vk for access.

Comment 8 Phil Auld 2023-06-14 16:18:25 UTC
Thanks!  My hope would be to provide you with a test kernel. That will be a lot easier once we get RH support people looped in, hence the support ticket.

Comment 9 redhat 2023-06-14 17:17:25 UTC
Ok, let me know if anything else is needed either here or in the case.

Comment 11 redhat 2023-06-15 02:51:55 UTC
Hi Phil,

I've gotten the case closed as it seems to require different level of support than we have. Does this mean that RHEL 7 cases are not looked into except for specific support levels?

Comment 12 Phil Auld 2023-06-16 12:58:37 UTC
Apparently you have a self support level, which means no hands-on from the support side. However, I think you can still consume new 7.9.z releases. I'll see about getting you the test kernel myself first then we can hopefully get this fixed in an official 7.9.z release.

Comment 13 redhat 2023-06-18 05:55:29 UTC
Ack, let me know if also there's anything I can do on my side - as this reproduces, I can collect some more information and provide that.

Comment 14 Phil Auld 2023-06-20 13:21:10 UTC
I've put a tarball with the kernel in my people.redhat directory.  You should be able to get it with something like:

  wget https://people.redhat.com/~pauld/for_storpool.tgz

Let me know when you've gotten it. And how it works for you.   Thanks for trying it out.

This includes a handful of patches which are supposed to fix the race we are hitting here.  What's happening I believe is that the limited task is getting throttled w/o the period timer being armed. That leads to never getting unthrottled.  Sometimes this happens when the task is holding the mmap_sem (while taking a page fault) which will lead to the whole system grinding to a halt.

Comment 15 redhat 2023-06-20 13:49:24 UTC
Thanks, we'll test and get back to you.

Comment 16 redhat 2023-06-21 09:02:37 UTC
The issue does not reproduce with the above kernel, so it looks to be fixed.

Comment 17 Phil Auld 2023-06-21 12:22:24 UTC
Thank you very much for testing it out.  I'm going to close this now as a dup of the bug that is carrying that kernel with those fixes into 7.9z and work to getting it officially released.  Your testing is important to that since we haven't gotten it to reproduce reliably here.

Cheers,
Phil

*** This bug has been marked as a duplicate of bug 2180681 ***

Comment 18 Phil Auld 2023-06-21 12:25:22 UTC
You likely can't see that bug, sorry about that. It's got a number of other customer cases attached though and is the same issue. The number is the important part though as it I think will show up in errata notes.