Bug 1897869
Summary: | Running oslat in RT guest, guest kernel shows Call Trace: INFO: task kcompactd0:35 blocked for more than 600 seconds. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Pei Zhang <pezhang> | ||||||||
Component: | kernel-rt | Assignee: | Marcelo Tosatti <mtosatti> | ||||||||
kernel-rt sub component: | Memory Management | QA Contact: | Pei Zhang <pezhang> | ||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||
Severity: | high | ||||||||||
Priority: | high | CC: | bhu, chayang, chwhite, jinzhao, jlelli, juzhang, lcapitulino, mm-maint, mtosatti, nilal, peterx, qzhao, rt-maint, rt-qe, virt-maint | ||||||||
Version: | 8.4 | Keywords: | Regression, Triaged | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | 8.4 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | kernel-rt-4.18.0-288.rt7.53.el8 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2021-05-18 15:12:53 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1883636, 1898189 | ||||||||||
Attachments: |
|
Update: This issue doesn't affect the oslat latency value, we still can get expected results. ==Results== (1)Single VM with 1 rt vCPU: Max Latency: 8 (us) (2)Single VM with 8 rt vCPUs: Max Latency: 10 10 10 9 10 9 9 9 (us) (3)Multiple VMs each with 1 rt vCPU: - VM1 Max Latency: 8 (us) - VM2 Max Latency: 8 (us) - VM3 Max Latency: 8 (us) - VM4 Max Latency: 10 (us) I believe this is caused by the fact that we needed to revert Marcelo's change because of https://bugzilla.redhat.com/show_bug.cgi?id=1894793#c7 I just quickly looked at the stack trace, but my suspect is that drain_all_pages queues kcompactd on isolated cores that are running oslat busy threads and thus it gets starved by those. I guess this is a sign we'd need to adapt Marcelo's change to 8.4. Created attachment 1746548 [details]
Full Call Trace dmesg log with 4.18.0-272.rt7.37.el8.x86_64
(In reply to Pei Zhang from comment #1) > Update: > > This issue doesn't affect the oslat latency value, we still can get expected > results. In latest 2 testing runs, the oslat max latency exceeds 15us(run 1 is 16us, run 2 is 21us). So this issue also affects the performance. == Run 1: Results: (1)Single VM with 1 rt vCPU: Maximum: 8 (us) (2)Single VM with 8 rt vCPUs: Maximum: 10 9 9 9 9 8 8 8 (us) (3)Multiple VMs each with 1 rt vCPU: - VM1 Maximum: 15 (us) - VM2 Maximum: 16 (us) - VM3 Maximum: 15 (us) - VM4 Maximum: 14 (us) Versions: kernel-rt-4.18.0-269.rt7.34.el8.x86_64 qemu-kvm-5.2.0-2.scrmod+el8.4.0+9296+986afc72.wrb210106.x86_64 tuned-2.15.0-1.el8.noarch libvirt-6.10.0-1.scrmod+el8.4.0+9296+986afc72.x86_64 python3-libvirt-6.6.0-1.module+el8.3.0+7572+bcbf6b90.x86_64 microcode_ctl-20201112-1.el8.x86_64 rt-tests-1.9-2.el8.x86_64 == Run 2: Results: (1)Single VM with 1 rt vCPU: Maximum: 10 (us) (2)Single VM with 8 rt vCPUs: Maximum: 10 10 10 10 10 10 10 10 (us) (3)Multiple VMs each with 1 rt vCPU: - VM1 Maximum: 16 (us) - VM2 Maximum: 17 (us) - VM3 Maximum: 18 (us) - VM4 Maximum: 21 (us) Versions: kernel-rt-4.18.0-272.rt7.37.el8.x86_64 qemu-kvm-5.2.0-2.scrmod+el8.4.0+9296+986afc72.wrb210106.x86_64 tuned-2.15.0-1.el8.noarch libvirt-6.10.0-1.scrmod+el8.4.0+9296+986afc72.x86_64 microcode_ctl-20201112-1.el8.x86_64 rt-tests-1.9-2.el8.x86_64 Hi Pei, Can you please confirm this kernel-rt build fixes the problem? https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34531073 Hi Marcelo, I was looking through your last upstream posting of "re-enable remote per-cpu-pages draining". In that thread, Sebastian mentioned [1] that Joe Korty reported that the remote draining patch-set somehow broke a workload for him. I haven't been able to find out this thread where Joe reported this, maybe you will know. Do you know if that issue still exists? [1] https://www.spinics.net/lists/linux-rt-users/msg22238.html Thanks Nitesh (In reply to Marcelo Tosatti from comment #19) > Hi Pei, > > Can you please confirm this kernel-rt build fixes the problem? > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34531073 Hi Marcelo, Testing with this build is ongoing now. Results will be updated tomorrow. Thanks. Best regards, Pei (In reply to Nitesh Narayan Lal from comment #20) > Hi Marcelo, > > I was looking through your last upstream posting of "re-enable remote > per-cpu-pages draining". > > In that thread, Sebastian mentioned [1] that Joe Korty reported that the > remote draining patch-set somehow broke a workload for him. > I haven't been able to find out this thread where Joe reported this, maybe > you will know. > > Do you know if that issue still exists? > > [1] https://www.spinics.net/lists/linux-rt-users/msg22238.html > > Thanks > Nitesh Nitesh, Haven't got access to Joe Korty's report, but for https://bugzilla.redhat.com/show_bug.cgi?id=1894793#c7 the issue was that for the new local lock interface, "local_lock_irqsave" was not disabling interrupts (attaching the patch). Maybe its the same issue. Created attachment 1751221 [details]
drain_all_pages + local_irq_save fix
(In reply to Pei Zhang from comment #21) > (In reply to Marcelo Tosatti from comment #19) > > Hi Pei, > > > > Can you please confirm this kernel-rt build fixes the problem? > > > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34531073 > > Hi Marcelo, > > Testing with this build is ongoing now. Results will be updated tomorrow. > Thanks. Hello Marcelo, 4.18.0-275.rt7.40.el8rdrain2.x86_64 doesn't work very well. In our latest several runs, here are the testing results: 1. The Call Trace issue is gone. No any Call Trace in rt guest any more. 2. There is oslat latency spike (19us) in 1 hour testing. This run was testing with "-w memmove -m 4K". (1)Single VM with 1 rt vCPU: Maximum: 10 (us) (2)Single VM with 8 rt vCPUs: Maximum: 10 10 19 10 10 10 10 10 (us) ^^ (3)Multiple VMs each with 1 rt vCPU: - VM1 Maximum: 10 (us) - VM2 Maximum: 10 (us) - VM3 Maximum: 10 (us) - VM4 Maximum: 10 (us) ==Versions== kernel-rt-4.18.0-275.rt7.40.el8rdrain2.x86_64 qemu-kvm-5.2.0-3.scrmod+el8.4.0+9533+d0097f55.wrb210120.x86_64 tuned-2.15.0-1.el8.noarch libvirt-7.0.0-2.scrmod+el8.4.0+9533+d0097f55.x86_64 python3-libvirt-6.10.0-1.module+el8.4.0+8948+a39b3f3a.x86_64 microcode_ctl-20201112-1.el8.x86_64 rt-tests-1.10-1.el8.x86_64 beaker job: https://beaker.engineering.redhat.com/jobs/5039943 3. We hit RT guest hang during the oslat testing. This issue is not 100% reproduced, I just hit it once. As I cannot get access to the guest, so there is no other info can be provided now. Best regards, Pei > > Best regards, > > Pei (In reply to Pei Zhang from comment #24) > (In reply to Pei Zhang from comment #21) > > (In reply to Marcelo Tosatti from comment #19) > > > Hi Pei, > > > > > > Can you please confirm this kernel-rt build fixes the problem? > > > > > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34531073 > > > > Hi Marcelo, > > > > Testing with this build is ongoing now. Results will be updated tomorrow. > > Thanks. > > Hello Marcelo, > > 4.18.0-275.rt7.40.el8rdrain2.x86_64 doesn't work very well. Pei, The patch is very broken. Should have a new brew build today. (In reply to Marcelo Tosatti from comment #25) > (In reply to Pei Zhang from comment #24) > > (In reply to Pei Zhang from comment #21) > > > (In reply to Marcelo Tosatti from comment #19) > > > > Hi Pei, > > > > > > > > Can you please confirm this kernel-rt build fixes the problem? > > > > > > > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34531073 > > > > > > Hi Marcelo, > > > > > > Testing with this build is ongoing now. Results will be updated tomorrow. > > > Thanks. > > > > Hello Marcelo, > > > > 4.18.0-275.rt7.40.el8rdrain2.x86_64 doesn't work very well. > > Pei, > > The patch is very broken. Should have a new brew build today. Pei, This should work: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34606898 (In reply to Marcelo Tosatti from comment #26) [...] > > Pei, > > This should work: > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34606898 Thanks Marcelo. The testing with this build is ongoing now. Results will be updated soon(should before Wednesday). Best regards, Pei Verification: Versions: kernel-rt-4.18.0-288.rt7.53.el8.x86_64 qemu-kvm-5.2.0-7.module+el8.4.0+9943+d64b3717.x86_64 tuned-2.15.0-1.el8.noarch libvirt-7.0.0-3.scrmod+el8.4.0+9884+244fb11b.x86_64 microcode_ctl-20201112-2.el8.x86_64 rt-tests-1.10-1.el8.x86_64 == 12 hours oslat: max latency is 6us. No any error shows. (1)Single VM with 1 rt vCPU: Maximum: 6 (us) (2)Single VM with 8 rt vCPUs: Maximum: 6 6 6 6 6 6 6 6 (us) (3)Multiple VMs each with 1 rt vCPU: - VM1 Maximum: 6 (us) - VM2 Maximum: 6 (us) - VM3 Maximum: 6 (us) - VM4 Maximum: 6 (us) == 12 hours cyclictest: max latency is 29ys. No any error shows. (1)Single VM with 1 rt vCPU: # Min Latencies: 00007 # Avg Latencies: 00010 # Max Latencies: 00024 (2)Single VM with 8 rt vCPUs: # Min Latencies: 00007 00012 00012 00012 00012 00012 00012 00012 # Avg Latencies: 00009 00012 00012 00012 00012 00012 00012 00012 # Max Latencies: 00021 00022 00023 00022 00022 00022 00022 00022 (3)Multiple VMs each with 1 rt vCPU: - VM1 # Min Latencies: 00007 # Avg Latencies: 00009 # Max Latencies: 00022 - VM2 # Min Latencies: 00007 # Avg Latencies: 00009 # Max Latencies: 00020 - VM3 # Min Latencies: 00007 # Avg Latencies: 00009 # Max Latencies: 00022 - VM4 # Min Latencies: 00007 # Avg Latencies: 00009 # Max Latencies: 00029 So this bug has been fixed. Move to 'VERIFIED'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel-rt security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1739 |
Created attachment 1729461 [details] Full dmesg log of RT guest Description of problem: This issue is triggered during kvm-rt regular testing with oslat. When running oslat in the RT guest, RT guest kernel shows Call Trace error. Version-Release number of selected component (if applicable): kernel-rt-4.18.0-249.rt4.14.el8.x86_64 qemu-kvm-5.1.0-14.module+el8.3.0+8438+644aff69.x86_64 tuned-2.14.0-3.el8.noarch python3-libvirt-6.6.0-1.module+el8.3.0+7572+bcbf6b90.x86_64 microcode_ctl-20201027-1.el8.x86_64 rt-tests-1.8-11.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Setup RT host 2. Setup RT guest 3. In RT guest, running oslat in the RT vCPUs and compiling kernel in the housekeeping vCPUs paralleled. # oslat --cpu-list 1 --rtprio 1 --runtime 12h # while true; do cd /home/nfv-virt-rt-kvm/src/kernel-rt/linux-4.18.0-249.rt4.14.el8; make -j2>/dev/null; make clean>/dev/null; done 4. Check RT guest #dmesg, there are Call Trace shows. # dmesg ... [ 4362.264644] INFO: task kcompactd0:35 blocked for more than 600 seconds. [ 4362.265735] Not tainted 4.18.0-249.rt4.14.el8.x86_64 #1 [ 4362.266649] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4362.267857] kcompactd0 D 0 35 2 0x80084000 [ 4362.268739] Call Trace: [ 4362.269195] __schedule+0x34d/0x8a0 [ 4362.269796] ? migrate_enable+0x118/0x3a0 [ 4362.270456] schedule+0x56/0xe0 [ 4362.271003] __rt_mutex_slowlock+0xbe/0x130 [ 4362.275672] rt_mutex_slowlock_locked+0xbc/0x260 [ 4362.276451] rt_mutex_slowlock.constprop.31+0x5f/0x90 [ 4362.277271] drain_all_pages+0x184/0x190 [ 4362.277927] kcompactd_do_work+0x26f/0x330 [ 4362.278603] ? kcompactd_do_work+0x330/0x330 [ 4362.285985] kcompactd+0x8b/0x200 [ 4362.286555] ? finish_wait+0x70/0x70 [ 4362.287420] kthread+0x15d/0x180 [ 4362.287999] ? __kthread_parkme+0xa0/0xa0 [ 4362.288674] ret_from_fork+0x35/0x40 [ 4362.289302] INFO: task cc1:83995 blocked for more than 600 seconds. [ 4362.290286] Not tainted 4.18.0-249.rt4.14.el8.x86_64 #1 [ 4362.295064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4362.296273] cc1 D 0 83995 83993 0x00084080 [ 4362.300874] Call Trace: [ 4362.301353] __schedule+0x34d/0x8a0 [ 4362.301967] schedule+0x56/0xe0 [ 4362.302505] schedule_timeout+0x250/0x400 [ 4362.306974] ? preempt_count_add+0x68/0xa0 [ 4362.307666] ? _raw_spin_unlock_irqrestore+0x20/0x60 [ 4362.308470] wait_for_completion+0x85/0xe0 [ 4362.309154] flush_work+0x123/0x1d0 [ 4362.309752] ? flush_workqueue_prep_pwqs+0x130/0x130 [ 4362.310546] ? __queue_work+0x1aa/0x590 [ 4362.323271] drain_all_pages+0x140/0x190 [ 4362.324052] __alloc_pages_slowpath+0x44c/0xe40 [ 4362.324862] __alloc_pages_nodemask+0x208/0x2f0 [ 4362.325598] alloc_pages_vma+0xc5/0x170 [ 4362.326247] __handle_mm_fault+0x618/0xb50 [ 4362.328971] ? __switch_to_asm+0x35/0x70 [ 4362.329636] handle_mm_fault+0xd2/0x1e0 [ 4362.330275] __do_page_fault+0x28e/0x5d0 [ 4362.334994] do_page_fault+0x47/0x1b0 [ 4362.335626] ? async_page_fault+0x8/0x30 [ 4362.336275] async_page_fault+0x1e/0x30 [ 4362.336921] RIP: 0033:0x56128bd8e938 [ 4362.337523] Code: Bad RIP value. [ 4362.338080] RSP: 002b:00007fffb9217fd0 EFLAGS: 00010216 [ 4362.350981] RAX: 00007feeed5cd000 RBX: 0000000000000090 RCX: 000000000000000c [ 4362.352106] RDX: 00007feeed5cd000 RSI: 000000000000001b RDI: 0000000000002424 [ 4362.355258] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000010000000 [ 4362.356403] R10: 0000000010000001 R11: 0000000000000006 R12: 000000001f955ef3 [ 4362.357513] R13: 00007feeed5bffa0 R14: 00007feeeffa45e8 R15: 0000000000000003 [ 4984.857254] INFO: task kcompactd0:35 blocked for more than 600 seconds. [ 4984.858350] Not tainted 4.18.0-249.rt4.14.el8.x86_64 #1 [ 4984.859265] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4984.862392] kcompactd0 D 0 35 2 0x80084000 [ 4984.863271] Call Trace: [ 4984.863718] __schedule+0x34d/0x8a0 [ 4984.864575] ? migrate_enable+0x118/0x3a0 [ 4984.865253] schedule+0x56/0xe0 [ 4984.865793] __rt_mutex_slowlock+0xbe/0x130 [ 4984.866491] rt_mutex_slowlock_locked+0xbc/0x260 [ 4984.867247] rt_mutex_slowlock.constprop.31+0x5f/0x90 [ 4984.868057] drain_all_pages+0x184/0x190 [ 4984.872802] kcompactd_do_work+0x26f/0x330 [ 4984.876563] ? kcompactd_do_work+0x330/0x330 [ 4984.877280] kcompactd+0x8b/0x200 [ 4984.877850] ? finish_wait+0x70/0x70 [ 4984.878465] kthread+0x15d/0x180 [ 4984.879019] ? __kthread_parkme+0xa0/0xa0 [ 4984.879693] ret_from_fork+0x35/0x40 [ 4984.884423] INFO: task cc1:83995 blocked for more than 600 seconds. [ 4984.885424] Not tainted 4.18.0-249.rt4.14.el8.x86_64 #1 [ 4984.886339] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4984.887552] cc1 D 0 83995 83993 0x00084080 [ 4984.892510] Call Trace: [ 4984.892962] __schedule+0x34d/0x8a0 [ 4984.893568] schedule+0x56/0xe0 [ 4984.894107] schedule_timeout+0x250/0x400 [ 4984.894781] ? preempt_count_add+0x68/0xa0 [ 4984.895476] ? _raw_spin_unlock_irqrestore+0x20/0x60 [ 4984.900353] wait_for_completion+0x85/0xe0 [ 4984.901043] flush_work+0x123/0x1d0 [ 4984.901652] ? flush_workqueue_prep_pwqs+0x130/0x130 [ 4984.902460] ? __queue_work+0x1aa/0x590 [ 4984.903101] drain_all_pages+0x140/0x190 [ 4984.903761] __alloc_pages_slowpath+0x44c/0xe40 [ 4984.908589] __alloc_pages_nodemask+0x208/0x2f0 [ 4984.909344] alloc_pages_vma+0xc5/0x170 [ 4984.909984] __handle_mm_fault+0x618/0xb50 [ 4984.910670] ? __switch_to_asm+0x35/0x70 [ 4984.911332] handle_mm_fault+0xd2/0x1e0 [ 4984.911970] __do_page_fault+0x28e/0x5d0 [ 4984.916697] do_page_fault+0x47/0x1b0 [ 4984.917330] ? async_page_fault+0x8/0x30 [ 4984.917981] async_page_fault+0x1e/0x30 [ 4984.918632] RIP: 0033:0x56128bd8e938 [ 4984.919249] Code: Bad RIP value. [ 4984.919800] RSP: 002b:00007fffb9217fd0 EFLAGS: 00010216 [ 4984.924701] RAX: 00007feeed5cd000 RBX: 0000000000000090 RCX: 000000000000000c [ 4984.925818] RDX: 00007feeed5cd000 RSI: 000000000000001b RDI: 0000000000002424 [ 4984.926930] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000010000000 [ 4984.928049] R10: 0000000010000001 R11: 0000000000000006 R12: 000000001f955ef3 [ 4984.933230] R13: 00007feeed5bffa0 R14: 00007feeeffa45e8 R15: 0000000000000003 [ 5607.449833] INFO: task kcompactd0:35 blocked for more than 600 seconds. [ 5607.450967] Not tainted 4.18.0-249.rt4.14.el8.x86_64 #1 [ 5607.451887] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5607.453095] kcompactd0 D 0 35 2 0x80084000 [ 5607.454541] Call Trace: [ 5607.455005] __schedule+0x34d/0x8a0 [ 5607.455593] ? migrate_enable+0x118/0x3a0 [ 5607.456261] schedule+0x56/0xe0 [ 5607.456807] __rt_mutex_slowlock+0xbe/0x130 [ 5607.457491] rt_mutex_slowlock_locked+0xbc/0x260 [ 5607.458539] rt_mutex_slowlock.constprop.31+0x5f/0x90 [ 5607.459373] drain_all_pages+0x184/0x190 [ 5607.460031] kcompactd_do_work+0x26f/0x330 [ 5607.460704] ? kcompactd_do_work+0x330/0x330 [ 5607.461449] kcompactd+0x8b/0x200 [ 5607.462300] ? finish_wait+0x70/0x70 [ 5607.462913] kthread+0x15d/0x180 [ 5607.463464] ? __kthread_parkme+0xa0/0xa0 [ 5607.464132] ret_from_fork+0x35/0x40 [ 5607.464761] INFO: task cc1:83995 blocked for more than 600 seconds. [ 5607.465750] Not tainted 4.18.0-249.rt4.14.el8.x86_64 #1 [ 5607.466998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5607.468201] cc1 D 0 83995 83993 0x00084080 [ 5607.469078] Call Trace: [ 5607.469521] __schedule+0x34d/0x8a0 [ 5607.470338] schedule+0x56/0xe0 [ 5607.470889] schedule_timeout+0x250/0x400 [ 5607.471551] ? preempt_count_add+0x68/0xa0 [ 5607.472237] ? _raw_spin_unlock_irqrestore+0x20/0x60 [ 5607.473286] wait_for_completion+0x85/0xe0 [ 5607.476072] flush_work+0x123/0x1d0 [ 5607.482434] ? flush_workqueue_prep_pwqs+0x130/0x130 [ 5607.483249] ? __queue_work+0x1aa/0x590 [ 5607.483897] drain_all_pages+0x140/0x190 [ 5607.484548] __alloc_pages_slowpath+0x44c/0xe40 [ 5607.485293] __alloc_pages_nodemask+0x208/0x2f0 [ 5607.490100] alloc_pages_vma+0xc5/0x170 [ 5607.490741] __handle_mm_fault+0x618/0xb50 [ 5607.491429] ? __switch_to_asm+0x35/0x70 [ 5607.492091] handle_mm_fault+0xd2/0x1e0 [ 5607.492728] __do_page_fault+0x28e/0x5d0 [ 5607.493387] do_page_fault+0x47/0x1b0 [ 5607.498087] ? async_page_fault+0x8/0x30 [ 5607.498740] async_page_fault+0x1e/0x30 [ 5607.499394] RIP: 0033:0x56128bd8e938 [ 5607.500008] Code: Bad RIP value. [ 5607.500557] RSP: 002b:00007fffb9217fd0 EFLAGS: 00010216 [ 5607.501399] RAX: 00007feeed5cd000 RBX: 0000000000000090 RCX: 000000000000000c [ 5607.506562] RDX: 00007feeed5cd000 RSI: 000000000000001b RDI: 0000000000002424 [ 5607.507674] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000010000000 [ 5607.508783] R10: 0000000010000001 R11: 0000000000000006 R12: 000000001f955ef3 [ 5607.517970] R13: 00007feeed5bffa0 R14: 00007feeeffa45e8 R15: 0000000000000003 [ 6230.042436] INFO: task kcompactd0:35 blocked for more than 600 seconds. [ 6230.043529] Not tainted 4.18.0-249.rt4.14.el8.x86_64 #1 [ 6230.044705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6230.045938] kcompactd0 D 0 35 2 0x80084000 [ 6230.046817] Call Trace: [ 6230.047265] __schedule+0x34d/0x8a0 [ 6230.047867] ? migrate_enable+0x118/0x3a0 [ 6230.048805] schedule+0x56/0xe0 [ 6230.049349] __rt_mutex_slowlock+0xbe/0x130 [ 6230.050053] rt_mutex_slowlock_locked+0xbc/0x260 [ 6230.050806] rt_mutex_slowlock.constprop.31+0x5f/0x90 [ 6230.051624] drain_all_pages+0x184/0x190 [ 6230.052272] kcompactd_do_work+0x26f/0x330 [ 6230.057014] ? kcompactd_do_work+0x330/0x330 [ 6230.057740] kcompactd+0x8b/0x200 [ 6230.058309] ? finish_wait+0x70/0x70 [ 6230.058920] kthread+0x15d/0x180 [ 6230.059481] ? __kthread_parkme+0xa0/0xa0 [ 6230.060140] ret_from_fork+0x35/0x40 [ 6230.068854] INFO: task cc1:83995 blocked for more than 600 seconds. [ 6230.069853] Not tainted 4.18.0-249.rt4.14.el8.x86_64 #1 [ 6230.070766] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6230.071973] cc1 D 0 83995 83993 0x00084080 [ 6230.073157] Call Trace: [ 6230.073619] __schedule+0x34d/0x8a0 [ 6230.074208] schedule+0x56/0xe0 [ 6230.076804] schedule_timeout+0x250/0x400 [ 6230.077491] ? preempt_count_add+0x68/0xa0 [ 6230.078166] ? _raw_spin_unlock_irqrestore+0x20/0x60 [ 6230.084698] wait_for_completion+0x85/0xe0 [ 6230.085384] flush_work+0x123/0x1d0 [ 6230.085983] ? flush_workqueue_prep_pwqs+0x130/0x130 [ 6230.086801] ? __queue_work+0x1aa/0x590 [ 6230.087460] drain_all_pages+0x140/0x190 [ 6230.088115] __alloc_pages_slowpath+0x44c/0xe40 [ 6230.090916] __alloc_pages_nodemask+0x208/0x2f0 [ 6230.091680] alloc_pages_vma+0xc5/0x170 [ 6230.092319] __handle_mm_fault+0x618/0xb50 [ 6230.097059] ? __switch_to_asm+0x35/0x70 [ 6230.097727] handle_mm_fault+0xd2/0x1e0 [ 6230.098366] __do_page_fault+0x28e/0x5d0 [ 6230.099024] do_page_fault+0x47/0x1b0 [ 6230.099648] ? async_page_fault+0x8/0x30 [ 6230.100296] async_page_fault+0x1e/0x30 [ 6230.105005] RIP: 0033:0x56128bd8e938 [ 6230.105638] Code: Bad RIP value. [ 6230.106190] RSP: 002b:00007fffb9217fd0 EFLAGS: 00010216 [ 6230.107293] RAX: 00007feeed5cd000 RBX: 0000000000000090 RCX: 000000000000000c [ 6230.112461] RDX: 00007feeed5cd000 RSI: 000000000000001b RDI: 0000000000002424 [ 6230.113596] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000010000000 [ 6230.114707] R10: 0000000010000001 R11: 0000000000000006 R12: 000000001f955ef3 [ 6230.115815] R13: 00007feeed5bffa0 R14: 00007feeeffa45e8 R15: 0000000000000003 [ 6852.634992] INFO: task kcompactd0:35 blocked for more than 600 seconds. [ 6852.636091] Not tainted 4.18.0-249.rt4.14.el8.x86_64 #1 [ 6852.641392] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6852.642621] kcompactd0 D 0 35 2 0x80084000 [ 6852.643500] Call Trace: [ 6852.643948] __schedule+0x34d/0x8a0 [ 6852.644552] ? migrate_enable+0x118/0x3a0 [ 6852.645510] schedule+0x56/0xe0 [ 6852.646064] __rt_mutex_slowlock+0xbe/0x130 [ 6852.646750] rt_mutex_slowlock_locked+0xbc/0x260 [ 6852.647511] rt_mutex_slowlock.constprop.31+0x5f/0x90 [ 6852.648330] drain_all_pages+0x184/0x190 [ 6852.654084] kcompactd_do_work+0x26f/0x330 [ 6852.654797] ? kcompactd_do_work+0x330/0x330 [ 6852.655507] kcompactd+0x8b/0x200 [ 6852.656081] ? finish_wait+0x70/0x70 [ 6852.656681] kthread+0x15d/0x180 [ 6852.666891] ? __kthread_parkme+0xa0/0xa0 [ 6852.667571] ret_from_fork+0x35/0x40 .... Actual results: There are Call Trace in RT guest kernel dmesg. Expected results: There should be no Call Trace info showed. Additional info: 1. Only oslat testing hit this issue. cyclictest testing looks good, didn't hit this problem.