RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1779046 - RT host hang with Call Trace "WARNING: CPU: 16 PID: 153 at kernel/smp.c:333 smp_call_function_single_async+0x7a/"
Summary: RT host hang with Call Trace "WARNING: CPU: 16 PID: 153 at kernel/smp.c:333 s...
Keywords:
Status: CLOSED DUPLICATE of bug 1830014
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel-rt
Version: 8.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.3
Assignee: Peter Xu
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On: 1830014
Blocks: 1817732 1823810 1825271
TreeView+ depends on / blocked
 
Reported: 2019-12-03 07:02 UTC by Pei Zhang
Modified: 2020-12-20 06:45 UTC (History)
13 users (show)

Fixed In Version: 4.18.0-198
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-19 16:33:09 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Call Trace info (61.55 KB, text/plain)
2019-12-03 07:02 UTC, Pei Zhang
no flags Details

Description Pei Zhang 2019-12-03 07:02:25 UTC
Created attachment 1641562 [details]
Call Trace info

Description of problem:
RT host hang with Call Trace during kvm-rt standard testing scenarios.

Version-Release number of selected component (if applicable):
kernel-rt-4.18.0-159.rt13.16.el8.x86_64
microcode_ctl-20191115-2.el8.x86_64
tuned-2.13.0-0.1.rc1.el8.noarch
rt-tests-1.5-15.el8.x86_64
qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb.x86_64
libvirt-5.9.0-4.module+el8.2.0+4836+a8e32ad7.x86_64

How reproducible:
1/1

Steps to Reproduce:
1. Install rhel8 host

2. Setup rhel8 RT host

3. Testing cyclictest with scenario single VM with single RT vCPU, looks good.

4. Next, testing scenario multiple VMs each with single RT vCPU, Call Trace shows in RT host during 4 VMs installation process.

Actual results:
RT host hang with Call Trace. Full log is attached.

[11826.666976] WARNING: CPU: 16 PID: 153 at kernel/smp.c:333 smp_call_function_single_async+0x7a/0
[11826.666977] Modules linked in: vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE xt_conntrack iptd
[11826.667011] CPU: 16 PID: 153 Comm: ksoftirqd/16 Kdump: loaded Tainted: G        W        -----1
[11826.667011] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.0.1 04/11/2016
[11826.667014] RIP: 0010:smp_call_function_single_async+0x7a/0x90
[11826.667015] Code: ff 7f 74 05 89 d8 5b 5d c3 65 48 8b 04 25 c0 5e 01 00 8b 50 0c 85 d2 75 eb 40
[11826.667016] RSP: 0018:ffffb670ca93bd58 EFLAGS: 00010002
[11826.667018] RAX: 0000000000000002 RBX: ffff969e5fa1cc00 RCX: 0000000000000000
[11826.667018] RDX: 0000000000000000 RSI: ffff969e5fa1cc00 RDI: 0000000000000001
[11826.667019] RBP: 0000000000000010 R08: 0000000000000035 R09: ffffd670bfc90238
[11826.667019] R10: 000000000000002c R11: 000000000000002c R12: ffff969e51508c10
[11826.667020] R13: ffff969e5fa1cc40 R14: ffff969e51508c20 R15: ffff969e51508c30
[11826.667021] FS:  0000000000000000(0000) GS:ffff969e5fa00000(0000) knlGS:0000000000000000
[11826.667022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11826.667022] CR2: 00005573a6e1b018 CR3: 0000000f7360e001 CR4: 00000000001626e0
[11826.667023] Call Trace:
[11826.667031]  mod_timer+0x382/0x3e0
[11826.667034]  run_timer_softirq+0x1b2/0x910
[11826.667038]  ? blk_stat_free_callback_rcu+0x30/0x30
[11826.667043]  ? finish_task_switch+0x108/0x300
[11826.667047]  __do_softirq+0x10a/0x3a4
[11826.667049]  ? migrate_disable+0x38/0xc0
[11826.667053]  ? smpboot_register_percpu_thread_cpumask+0x130/0x130
[11826.667056]  run_ksoftirqd+0x47/0x60
[11826.667058]  smpboot_thread_fn+0x1d6/0x2c0
[11826.667060]  kthread+0x112/0x130
[11826.667062]  ? kthread_flush_work_fn+0x10/0x10
[11826.667065]  ret_from_fork+0x1f/0x40
[11826.667068] ---[ end trace 0000000000000003 ]---
[11854.005718] restraintd[2567]: *** Current Time: Tue Dec 03 01:02:58 2019 Localwatchdog at: Wed9
[11886.728599] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[11886.728603] rcu: 	Tasks blocked on level-1 rcu_node (CPUs 10-19): P10232
[11886.728607] rcu: 	16-...0: (695975 GPs behind) idle=632/1/0x4000000000000000 softirq=0/0 fq 
[11886.728608] rcu: 	(detected by 0, t=60002 jiffies, g=2782837, q=17262)
[11886.728614] Sending NMI from CPU 0 to CPUs 16:
[11886.729622] NMI backtrace for cpu 16
[11886.729623] CPU: 16 PID: 153 Comm: ksoftirqd/16 Kdump: loaded Tainted: G        W        -----1
[11886.729623] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.0.1 04/11/2016
[11886.729624] RIP: 0010:smp_call_function_single_async+0x88/0x90
[11886.729625] Code: c0 5e 01 00 8b 50 0c 85 d2 75 eb 48 8b 00 a9 00 00 04 00 74 e1 e8 8d e9 ea f4
[11886.729625] RSP: 0018:ffffb670ca93bd58 EFLAGS: 00000002
[11886.729626] RAX: 0000000000000001 RBX: ffff969e5fa1cc00 RCX: 0000000000000000
[11886.729626] RDX: 0000000000000000 RSI: ffff969e5fa1cc00 RDI: 0000000000000001
[11886.729627] RBP: 0000000000000010 R08: 0000000000000035 R09: ffffd670bfc90238
[11886.729627] R10: 000000000000002c R11: 000000000000002c R12: ffff969e51508c10
[11886.729628] R13: ffff969e5fa1cc40 R14: ffff969e51508c20 R15: ffff969e51508c30
[11886.729628] FS:  0000000000000000(0000) GS:ffff969e5fa00000(0000) knlGS:0000000000000000
[11886.729629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11886.729629] CR2: 00005573a6e1b018 CR3: 0000000f7360e001 CR4: 00000000001626e0
[11886.729630] Call Trace:
[11886.729630]  mod_timer+0x382/0x3e0
[11886.729630]  run_timer_softirq+0x1b2/0x910
[11886.729631]  ? blk_stat_free_callback_rcu+0x30/0x30
[11886.729631]  ? finish_task_switch+0x108/0x300
[11886.729632]  __do_softirq+0x10a/0x3a4
[11886.729632]  ? migrate_disable+0x38/0xc0
[11886.729632]  ? smpboot_register_percpu_thread_cpumask+0x130/0x130
[11886.729633]  run_ksoftirqd+0x47/0x60
[11886.729633]  smpboot_thread_fn+0x1d6/0x2c0
[11886.729633]  kthread+0x112/0x130
[11886.729634]  ? kthread_flush_work_fn+0x10/0x10
[11886.729634]  ret_from_fork+0x1f/0x40
[11886.729638] CPU 0/KVM       R  running task        0 10232      1 0x000801a0
[11886.729640] Call Trace:
[11886.729647]  ? __schedule+0x316/0x7c0
[11886.729651]  ? ___preempt_schedule+0x16/0x18
[11886.729653]  preempt_schedule_common+0x23/0x80
[11886.729655]  ___preempt_schedule+0x16/0x18
[11886.729658]  _raw_spin_unlock_irqrestore+0x52/0x60
[11886.729683]  kvm_vcpu_kick+0x88/0xd0 [kvm]
[11886.729701]  __apic_accept_irq+0x7d/0x340 [kvm]
[11886.729718]  kvm_irq_delivery_to_apic_fast+0x221/0x410 [kvm]
[11886.729735]  kvm_irq_delivery_to_apic+0x63/0x2c0 [kvm]
[11886.729750]  kvm_emulate_hypercall+0x139/0x500 [kvm]
[11886.729754]  ? sched_clock+0x5/0x10
[11886.729756]  ? get_vtime_delta+0x13/0xc0
[11886.729770]  vcpu_enter_guest+0x3cd/0x19b0 [kvm]
[11886.729781]  ? kvm_vcpu_kick+0x57/0xd0 [kvm]
[11886.729796]  ? __apic_accept_irq+0x1ad/0x340 [kvm]
[11886.729811]  kvm_arch_vcpu_ioctl_run+0x107/0x580 [kvm]
[11886.729822]  kvm_vcpu_ioctl+0x232/0x630 [kvm]
[11886.729827]  do_vfs_ioctl+0xa4/0x630
[11886.729830]  ksys_ioctl+0x60/0x90
[11886.729832]  __x64_sys_ioctl+0x16/0x20
[11886.729834]  do_syscall_64+0x87/0x1a0
[11886.729837]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[11886.729839] RIP: 0033:0x7f12ffdb387b
[11886.729845] Code: Bad RIP value.
[11886.729846] RSP: 002b:00007f12f17b4618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[11886.729847] RAX: ffffffffffffffda RBX: 00005645326b72c0 RCX: 00007f12ffdb387b
[11886.729848] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019
[11886.729849] RBP: 0000000000000001 R08: 0000564530571c30 R09: 00000000000000ff
[11886.729849] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
[11886.729850] R13: 000056453054e1e0 R14: 0000000000000000 R15: 00007f1305140000
[11914.052282] restraintd[2567]: *** Current Time: Tue Dec 03 01:03:58 2019 Localwatchdog at: Wed9
[11974.084129] restraintd[2567]: *** Current Time: Tue Dec 03 01:04:58 2019 Localwatchdog at: Wed9
[12034.048580] restraintd[2567]: *** Current Time: Tue Dec 03 01:05:58 2019 Localwatchdog at: Wed9
[12066.733756] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[12066.733759] rcu: 	Tasks blocked on level-1 rcu_node (CPUs 10-19): P10232
[12066.733764] rcu: 	16-...0: (695975 GPs behind) idle=632/1/0x4000000000000000 softirq=0/0 fq 
[12066.733764] rcu: 	(detected by 0, t=240007 jiffies, g=2782837, q=27500)
[12066.733769] Sending NMI from CPU 0 to CPUs 16:
[12066.734776] NMI backtrace for cpu 16
[12066.734777] CPU: 16 PID: 153 Comm: ksoftirqd/16 Kdump: loaded Tainted: G        W        -----1


Expected results:
RT host should keep working well and there should be no Call Trace.

Additional info:

Comment 13 Pei Zhang 2020-02-03 08:12:05 UTC
Update:

In my testing, I hit this issue again with 4.18.0-174.rt13.31.el8.x86_64.

Comment 24 Luiz Capitulino 2020-04-29 20:20:04 UTC
Peter,

I think the downstream kernel-rt tree is already open for business :)

Adding Beth for devel_ack.

Btw, my understanding is that we also want this for 8.2.z. Do you agree, Peter?

Comment 41 Pei Zhang 2020-06-16 09:07:36 UTC
We didn't hit this hang issue with below 10 runs.

kernel-rt-4.18.0-199.rt5.11.el8.x86_64 (cyclictest, 24h)
kernel-rt-4.18.0-201.rt5.13.el8.x86_64 (cyclictest, 24h )
kernel-rt-4.18.0-203.rt5.15.el8.x86_64 (cyclictest, 24h )
kernel-rt-4.18.0-208.rt5.20.el8.x86_64 (cyclictest, 12h)
kernel-rt-4.18.0-208.rt5.20.el8.x86_64 (oslat, 12h)
kernel-rt-4.18.0-210.rt5.22.el8.x86_64 (cyclictest, 12h)
kernel-rt-4.18.0-210.rt5.22.el8.x86_64 (oslat, 12h)
kernel-rt-4.18.0-211.rt5.23.el8.x86_64 (cyclictest, 12h)
kernel-rt-4.18.0-211.rt5.23.el8.x86_64 (oslat, 12h)
kernel-rt-4.18.0-214.rt7.26.el8.x86_64 (oslat, 12h)

So this bug has been fixed very well. Move to 'VERIFIED'.


Note You need to log in before you can comment on or make changes to this bug.