RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1843410 - RT guest get Call Trace "INFO: task ld:274531 blocked for more than 600 seconds."
Summary: RT guest get Call Trace "INFO: task ld:274531 blocked for more than 600 secon...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel-rt
Version: 8.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.0
Assignee: Marcelo Tosatti
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1817732 1823810 1855779
TreeView+ depends on / blocked
 
Reported: 2020-06-03 09:11 UTC by Pei Zhang
Modified: 2020-11-04 02:26 UTC (History)
13 users (show)

Fixed In Version: kernel-rt-4.18.0-220.rt7.32.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1855779 (view as bug list)
Environment:
Last Closed: 2020-11-04 02:25:41 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
VM Call Trace info (64.66 KB, text/plain)
2020-06-03 09:11 UTC, Pei Zhang
no flags Details
Another Call Trace info in guest (57.05 KB, text/plain)
2020-06-04 08:50 UTC, Pei Zhang
no flags Details

Description Pei Zhang 2020-06-03 09:11:11 UTC
Created attachment 1694764 [details]
VM Call Trace info

Description of problem:
During KVM-RT oslat testing, there are Call Trace info in RT guest.

Version-Release number of selected component (if applicable):
4.18.0-208.rt5.20.el8.x86_64
qemu-img-5.0.0-0.scrmod+el8.3.0+6791+fbd26444.wrb200527.x86_64
libvirt-6.3.0-1.scrmod+el8.3.0+6791+fbd26444.x86_64
tuned-2.13.0-6.el8.noarch


How reproducible:
1/1

Steps to Reproduce:
1. Setup rt host

2. Boot rt guest

3. Start 12h oslat testing in guest. Meanwhile compiling kernel in both host and guest housekeeping cores. 

# /home/nfv-virt-rt-kvm/tools/oslat --cpu-list 2,3,4,5,6,7,8,9 --rtprio 1 --runtime 12h

4. Check guest dmesg log, there are Call Trace. Full log is attached.

[ 7475.821066] INFO: task ld:274531 blocked for more than 600 seconds.
[ 7475.822157]       Not tainted 4.18.0-208.rt5.20.el8.x86_64 #1
[ 7475.823094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7475.824392] ld              D    0 274531 274530 0x00084080
[ 7475.825307] Call Trace:
[ 7475.825761]  __schedule+0x342/0x850
[ 7475.826377]  schedule+0x39/0xd0
[ 7475.826923]  schedule_timeout+0x20e/0x410
[ 7475.827610]  ? __schedule+0x34a/0x850
[ 7475.828247]  ? ___preempt_schedule+0x16/0x18
[ 7475.828953]  wait_for_completion+0x85/0xe0
[ 7475.829653]  flush_work+0x11a/0x1c0
[ 7475.830313]  ? flush_workqueue_prep_pwqs+0x130/0x130
[ 7475.831148]  drain_all_pages+0x140/0x190
[ 7475.831803]  __alloc_pages_slowpath+0x3f8/0xe20
[ 7475.832571]  ? mem_cgroup_commit_charge+0xcb/0x510
[ 7475.833371]  __alloc_pages_nodemask+0x1ca/0x2b0
[ 7475.834134]  pagecache_get_page+0xb5/0x2d0
[ 7475.834814]  ? account_page_dirtied+0x11a/0x220
[ 7475.835579]  grab_cache_page_write_begin+0x1f/0x40
[ 7475.836379]  iomap_write_begin.constprop.44+0x1c1/0x370
[ 7475.837241]  ? iomap_write_end+0x91/0x290
[ 7475.837911]  iomap_write_actor+0x92/0x170
[ 7475.838603]  ? iomap_write_end+0x290/0x290
[ 7475.839310]  iomap_apply+0xba/0x130
[ 7475.839905]  ? iomap_write_end+0x290/0x290
[ 7475.840610]  iomap_file_buffered_write+0x62/0x90
[ 7475.841385]  ? iomap_write_end+0x290/0x290
[ 7475.842132]  xfs_file_buffered_aio_write+0xd6/0x390 [xfs]
[ 7475.843057]  ? xfs_file_buffered_aio_read+0x5b/0x120 [xfs]
[ 7475.843941]  new_sync_write+0x124/0x170
[ 7475.844610]  vfs_write+0xa5/0x1a0
[ 7475.845201]  ksys_write+0x52/0xc0
[ 7475.845770]  do_syscall_64+0x87/0x1a0
[ 7475.846411]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 7475.847256] RIP: 0033:0x7f74c8a0caa8
[ 7475.847865] Code: Bad RIP value.
[ 7475.848447] RSP: 002b:00007ffeb5854a28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 7475.849644] RAX: ffffffffffffffda RBX: 00000000016fb000 RCX: 00007f74c8a0caa8
[ 7475.850780] RDX: 00000000016fb000 RSI: 00007f74b0bfb478 RDI: 0000000000000006
[ 7475.851921] RBP: 00007f74b0bfb478 R08: fffffffffffffff8 R09: 00000000006fbd04
[ 7475.853069] R10: 00005593590ba700 R11: 0000000000000246 R12: 0000559358308fa0
[ 7475.854206] R13: 00000000016fb000 R14: 00007f74c8cd8880 R15: 00000000016fb000
[ 8098.413957] INFO: task ld:274531 blocked for more than 600 seconds.
[ 8098.415126]       Not tainted 4.18.0-208.rt5.20.el8.x86_64 #1
[ 8098.416079] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8098.417319] ld              D    0 274531 274530 0x00084080
[ 8098.418236] Call Trace:
[ 8098.418690]  __schedule+0x342/0x850
[ 8098.419312]  schedule+0x39/0xd0
[ 8098.419860]  schedule_timeout+0x20e/0x410
[ 8098.420558]  ? __schedule+0x34a/0x850
[ 8098.421205]  ? ___preempt_schedule+0x16/0x18
[ 8098.421932]  wait_for_completion+0x85/0xe0
[ 8098.422618]  flush_work+0x11a/0x1c0
[ 8098.423235]  ? flush_workqueue_prep_pwqs+0x130/0x130
[ 8098.424067]  drain_all_pages+0x140/0x190
[ 8098.424725]  __alloc_pages_slowpath+0x3f8/0xe20
[ 8098.425492]  ? mem_cgroup_commit_charge+0xcb/0x510
[ 8098.426297]  __alloc_pages_nodemask+0x1ca/0x2b0
[ 8098.427064]  pagecache_get_page+0xb5/0x2d0
[ 8098.427747]  ? account_page_dirtied+0x11a/0x220
[ 8098.428517]  grab_cache_page_write_begin+0x1f/0x40
[ 8098.429324]  iomap_write_begin.constprop.44+0x1c1/0x370
[ 8098.430187]  ? iomap_write_end+0x91/0x290
[ 8098.430857]  iomap_write_actor+0x92/0x170
[ 8098.431551]  ? iomap_write_end+0x290/0x290
[ 8098.432255]  iomap_apply+0xba/0x130
[ 8098.432851]  ? iomap_write_end+0x290/0x290
[ 8098.433555]  iomap_file_buffered_write+0x62/0x90
[ 8098.434334]  ? iomap_write_end+0x290/0x290
[ 8098.435085]  xfs_file_buffered_aio_write+0xd6/0x390 [xfs]
[ 8098.436051]  ? xfs_file_buffered_aio_read+0x5b/0x120 [xfs]
[ 8098.436958]  new_sync_write+0x124/0x170
[ 8098.437610]  vfs_write+0xa5/0x1a0
[ 8098.438206]  ksys_write+0x52/0xc0
[ 8098.438780]  do_syscall_64+0x87/0x1a0
[ 8098.439426]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 8098.440271] RIP: 0033:0x7f74c8a0caa8
[ 8098.440903] Code: Bad RIP value.
[ 8098.441464] RSP: 002b:00007ffeb5854a28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 8098.442668] RAX: ffffffffffffffda RBX: 00000000016fb000 RCX: 00007f74c8a0caa8
[ 8098.443808] RDX: 00000000016fb000 RSI: 00007f74b0bfb478 RDI: 0000000000000006
[ 8098.444967] RBP: 00007f74b0bfb478 R08: fffffffffffffff8 R09: 00000000006fbd04
[ 8098.446108] R10: 00005593590ba700 R11: 0000000000000246 R12: 0000559358308fa0
[ 8098.447247] R13: 00000000016fb000 R14: 00007f74c8cd8880 R15: 00000000016fb000


Actual results:
There are Call Trace in RT guest #dmesg.

Expected results:
There should be no error info in RT guest #dmesg.

Additional info:
1. This Call Trace only shows in RT guest, RT host works well.

Comment 2 Luiz Capitulino 2020-06-03 19:32:11 UTC
Clark,

By looking at the trace, my initial guess for this issue
would be that drain_all_pages() or flush_work() is scheduling
work on the CPU where oslat is running (which is isolated).

In other words, work that originated in an housekeeping CPU
is being scheduled on an isolated CPU. Which will cause a stall,
since oslat won't let the kworker run.

Now, I might be completely wrong, but it does ring a bell to me
that we used to have a patch in RHEL7 RT that would prevent either
drain_all_pages() or flush_work() from scheduling work on isolated CPUs.

I hope I'm not just creating noise, but if this is correct then
this issue should be considered serious since it will affect
DPDK workloads on RHEL8.2 the same way.

Comment 3 Pei Zhang 2020-06-04 08:50:01 UTC
Created attachment 1695007 [details]
Another Call Trace info in guest

"(3)Multiple VMs each with 1 rt vCPU" testing has finished. There are Call Trace in one of the four guests. It looks a bit different with above. I'm not sure if they are same issue, so I also paste it here and full # dmesg is attached. 

Luiz,Clark,

Please let me know if they are different issues, then I can file new bug to track it. Thanks.


[ 4363.236900] INFO: task cc1:77565 blocked for more than 600 seconds.
[ 4363.238025]       Not tainted 4.18.0-208.rt5.20.el8.x86_64 #1
[ 4363.239458] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4363.240796] cc1             D    0 77565  77560 0x00084080
[ 4363.241784] Call Trace:
[ 4363.242363]  __schedule+0x342/0x850
[ 4363.243059]  schedule+0x39/0xd0
[ 4363.243609]  schedule_timeout+0x20e/0x410
[ 4363.245349]  ? __schedule+0x34a/0x850
[ 4363.247042]  ? ___preempt_schedule+0x16/0x18
[ 4363.247757]  wait_for_completion+0x85/0xe0
[ 4363.249505]  flush_work+0x11a/0x1c0
[ 4363.251160]  ? flush_workqueue_prep_pwqs+0x130/0x130
[ 4363.253032]  drain_all_pages+0x140/0x190
[ 4363.253703]  __alloc_pages_slowpath+0x3f8/0xe20
[ 4363.255508]  __alloc_pages_nodemask+0x1ca/0x2b0
[ 4363.257310]  alloc_pages_vma+0xc5/0x170
[ 4363.260023]  ? __switch_to_asm+0x35/0x70
[ 4363.260700]  __handle_mm_fault+0x5a5/0xa70
[ 4363.262451]  ? __switch_to_asm+0x35/0x70
[ 4363.264165]  ? __switch_to_asm+0x41/0x70
[ 4363.264827]  handle_mm_fault+0xd2/0x1e0
[ 4363.266536]  __do_page_fault+0x28e/0x5d0
[ 4363.268246]  do_page_fault+0x47/0x1b0
[ 4363.269925]  ? async_page_fault+0x8/0x30
[ 4363.270598]  async_page_fault+0x1e/0x30
[ 4363.272298] RIP: 0033:0x56115b27ffb0
[ 4363.274020] Code: Bad RIP value.
[ 4363.274580] RSP: 002b:00007ffe5c70d8d0 EFLAGS: 00010202
[ 4363.276479] RAX: 00007feb25025000 RBX: 00007feb24f01370 RCX: 000000000000000c
[ 4363.278653] RDX: 000056115beb4c40 RSI: 00007feb25213cb0 RDI: 0000000000002227
[ 4363.281835] RBP: 00007feb26692c60 R08: 0000000000000001 R09: 0000000000000002
[ 4363.285054] R10: 00007feb25022400 R11: 0000000000000006 R12: 0000000000000000
[ 4363.287225] R13: 00007feb24f013b8 R14: 00007ffe5c70d8f0 R15: 00007feb25025008
[ 4985.829739] INFO: task cc1:77565 blocked for more than 600 seconds.
[ 4985.830822]       Not tainted 4.18.0-208.rt5.20.el8.x86_64 #1
[ 4985.832039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4985.833382] cc1             D    0 77565  77560 0x00084080
[ 4985.834375] Call Trace:
[ 4985.834930]  __schedule+0x342/0x850
[ 4985.835534]  schedule+0x39/0xd0
[ 4985.836225]  schedule_timeout+0x20e/0x410
[ 4985.838968]  ? __schedule+0x34a/0x850
[ 4985.839595]  ? ___preempt_schedule+0x16/0x18
[ 4985.842383]  wait_for_completion+0x85/0xe0
[ 4985.845132]  flush_work+0x11a/0x1c0
[ 4985.847752]  ? flush_workqueue_prep_pwqs+0x130/0x130
[ 4985.848572]  drain_all_pages+0x140/0x190
[ 4985.852288]  __alloc_pages_slowpath+0x3f8/0xe20
[ 4985.855085]  __alloc_pages_nodemask+0x1ca/0x2b0
[ 4985.857889]  alloc_pages_vma+0xc5/0x170
[ 4985.858541]  ? __switch_to_asm+0x35/0x70
[ 4985.860546]  __handle_mm_fault+0x5a5/0xa70
[ 4985.862838]  ? __switch_to_asm+0x35/0x70
[ 4985.863498]  ? __switch_to_asm+0x41/0x70
[ 4985.866215]  handle_mm_fault+0xd2/0x1e0
[ 4985.867911]  __do_page_fault+0x28e/0x5d0
[ 4985.868575]  do_page_fault+0x47/0x1b0
[ 4985.870238]  ? async_page_fault+0x8/0x30
[ 4985.871946]  async_page_fault+0x1e/0x30
[ 4985.872601] RIP: 0033:0x56115b27ffb0
[ 4985.874263] Code: Bad RIP value.
[ 4985.875868] RSP: 002b:00007ffe5c70d8d0 EFLAGS: 00010202
[ 4985.877749] RAX: 00007feb25025000 RBX: 00007feb24f01370 RCX: 000000000000000c
[ 4985.879910] RDX: 000056115beb4c40 RSI: 00007feb25213cb0 RDI: 0000000000002227
[ 4985.882072] RBP: 00007feb26692c60 R08: 0000000000000001 R09: 0000000000000002
[ 4985.885242] R10: 00007feb25022400 R11: 0000000000000006 R12: 0000000000000000
[ 4985.887400] R13: 00007feb24f013b8 R14: 00007ffe5c70d8f0 R15: 00007feb25025008

Comment 4 Marcelo Tosatti 2020-06-05 20:45:38 UTC
drain_all_pages does:

        for_each_cpu(cpu, &cpus_with_pcps) {
                struct pcpu_drain *drain = per_cpu_ptr(&pcpu_drain, cpu);

                drain->zone = zone;
                INIT_WORK(&drain->work, drain_local_pages_wq);
                queue_work_on(cpu, mm_percpu_wq, &drain->work);
        }
        for_each_cpu(cpu, &cpus_with_pcps)
                flush_work(&per_cpu_ptr(&pcpu_drain, cpu)->work);


And worker thread can't run on isolated CPU with oslat (or DPDK with FIFO) running there.

Comment 5 Nitesh Narayan Lal 2020-06-05 21:30:13 UTC
One way to fix it would be to make drain_all_pages() aware of isolated CPUs probably by deriving the housekeeping CPUs via housekeeping_cpumask().
If so, then there are other callers of queue_work_on() as well that may not respect CPU isolation and maybe we should look into them as well.
eg. swap.c:
        for_each_online_cpu(cpu) {
                struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
...
                        INIT_WORK(work, lru_add_drain_per_cpu);
                        queue_work_on(cpu, mm_percpu_wq, work);

However, we have to test this change extensively to make sure that we are not breaking anything.

Comment 6 Marcelo Tosatti 2020-06-05 21:36:45 UTC
Options:

1) Do a first pass on non-isolated CPUs trying to drain their per-CPU queues.
Not guaranteed to work (but perhaps a workable intermediate fix).

2) Reduce the limit of pages in per-CPU free queue to a value that is 
not very significant (don't think its a good idea: if there is a free page
somewhere, should be able to use it before OOM'ing).

3) Drain pages when entering "task isolated" mode.

From task isolated patchset:

+
+       /* Drain the pagevecs to avoid unnecessary IPI flushes later. */
+       lru_add_drain();
+

Comment 7 Nitesh Narayan Lal 2020-06-05 22:10:37 UTC
(In reply to Marcelo Tosatti from comment #6)
> Options:
> 
> 1) Do a first pass on non-isolated CPUs trying to drain their per-CPU queues.
> Not guaranteed to work (but perhaps a workable intermediate fix).
> 

Yes, I agree there is a fair chance that this will fail. Since you mentioned this I don't think I will favor this now.

> 2) Reduce the limit of pages in per-CPU free queue to a value that is 
> not very significant (don't think its a good idea: if there is a free page
> somewhere, should be able to use it before OOM'ing).

+1, we should avoid doing this.

> 
> 3) Drain pages when entering "task isolated" mode.
> 
> From task isolated patchset:
> 
> +
> +       /* Drain the pagevecs to avoid unnecessary IPI flushes later. */
> +       lru_add_drain();
> +

It sounds like a decent idea, but I will also go through the code to understand if there are any downsides to it.

Comment 19 Pei Zhang 2020-07-02 09:39:11 UTC
Verified with 4.18.0-221.rt7.33.el8.x86_64:

No any Call Trace in RT guest. 

(2)Single VM with 8 rt vCPUs:
 Max Latency:	 9 8 8 8 8 8 8 8 (us)


==Versions==
kernel-rt-4.18.0-221.rt7.33.el8.x86_64
qemu-kvm-5.0.0-0.scrmod+el8.3.0+7241+ccee766e.wrb200701.x86_64
tuned-2.14.0-2.el8.noarch
libvirt-6.4.0-1.scrmod+el8.3.0+7241+ccee766e.x86_64
python3-libvirt-6.3.0-1.module+el8.3.0+6485+695fc960.x86_64
microcode_ctl-20200609-2.el8.x86_64
rt-tests-1.8-3.el8.x86_64

So this bug has been fixed very well. Move to 'VERIFIED'.

Comment 25 errata-xmlrpc 2020-11-04 02:25:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel-rt security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4609


Note You need to log in before you can comment on or make changes to this bug.