RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1664257 - BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
Summary: BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel-rt
Version: 8.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Daniel Bristot de Oliveira
QA Contact: Qiao Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1649545
TreeView+ depends on / blocked
 
Reported: 2019-01-08 08:52 UTC by Qiao Zhao
Modified: 2019-06-14 00:54 UTC (History)
6 users (show)

Fixed In Version: kernel-rt-4.18.0-58.rt9.110.el8
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1664380 (view as bug list)
Environment:
Last Closed: 2019-06-14 00:54:35 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
[PATCH RT] padata: Make padata_do_serial() use get_cpu_light() (2.44 KB, patch)
2019-01-08 22:55 UTC, Daniel Bristot de Oliveira
no flags Details | Diff
[PATCH RT] padata: Make padata_do_serial() use get_cpu_light() (2.92 KB, patch)
2019-01-09 16:14 UTC, Daniel Bristot de Oliveira
no flags Details | Diff

Description Qiao Zhao 2019-01-08 08:52:31 UTC
Description of problem:
When run LTP case cve-2017-18075 (pcrypt_aead01), the system occur panic 

[ 1343.642670] BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
[ 1343.642671] Modules linked in: authenc pcrypt crypto_user vfat fat intel_rapl nfit libnvdimm intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ses enclosure intel_uncore intel_rapl_perf mgag200 ipmi_ssif i2c_algo_bit sg pcspkr ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ioatdma hpilo tg3 hpwdt mei_me mei lpc_ich wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod uas usb_storage crc32c_intel smartpqi scsi_transport_sas dm_mod
[ 1343.642716] Preemption disabled at:
[ 1343.642728] [<ffffffffb61fd824>] padata_do_serial+0x24/0x110
[ 1343.642733] CPU: 1 PID: 24117 Comm: kworker/1:1 Not tainted 4.18.0-56.rt9.107.el8.x86_64 #1
[ 1343.642735] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/14/2017
[ 1343.642740] Workqueue: pencrypt padata_parallel_worker
[ 1343.642742] Call Trace:
[ 1343.642755]  dump_stack+0x5c/0x80
[ 1343.642760]  ? padata_do_serial+0x24/0x110
[ 1343.642769]  __schedule_bug.cold.83+0x8e/0x9b
[ 1343.642776]  __schedule+0x5a0/0x680
[ 1343.642781]  schedule+0x39/0xd0
[ 1343.642787]  rt_spin_lock_slowlock_locked+0x10e/0x2b0
[ 1343.642793]  rt_spin_lock_slowlock+0x50/0x80
[ 1343.642798]  padata_do_serial+0x4d/0x110
[ 1343.642804]  padata_parallel_worker+0xaf/0xe0
[ 1343.642809]  process_one_work+0x183/0x3b0
[ 1343.642813]  ? process_one_work+0x3b0/0x3b0
[ 1343.642816]  worker_thread+0x30/0x3d0
[ 1343.642819]  ? process_one_work+0x3b0/0x3b0
[ 1343.642824]  kthread+0x112/0x130
[ 1343.642829]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1343.642832]  ret_from_fork+0x35/0x40
[ 1343.642848] WARNING: CPU: 1 PID: 24117 at kernel/sched/core.c:7307 migrate_enable+0x23d/0x360
[ 1343.642849] Modules linked in: authenc pcrypt crypto_user vfat fat intel_rapl nfit libnvdimm intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ses enclosure intel_uncore intel_rapl_perf mgag200 ipmi_ssif i2c_algo_bit sg pcspkr ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ioatdma hpilo tg3 hpwdt mei_me mei lpc_ich wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod uas usb_storage crc32c_intel smartpqi scsi_transport_sas dm_mod
[ 1343.642885] CPU: 1 PID: 24117 Comm: kworker/1:1 Tainted: G        W        --------- ---  4.18.0-56.rt9.107.el8.x86_64 #1
[ 1343.642886] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/14/2017
[ 1343.642891] Workqueue: pencrypt padata_parallel_worker
[ 1343.642895] RIP: 0010:migrate_enable+0x23d/0x360
[ 1343.642898] Code: f2 ff e9 59 fe ff ff c7 44 24 08 04 00 00 00 e9 42 ff ff ff 48 8b 00 a9 00 00 04 00 74 9d e8 df b8 f2 ff eb 96 e8 b3 1a 0c 00 <0f> 0b e9 0e fe ff ff f6 43 2e 20 48 c7 c0 20 96 50 b7 48 89 ea bf
[ 1343.642900] RSP: 0018:ffffbf6e6471fdc0 EFLAGS: 00010246
[ 1343.642903] RAX: 0000000000000000 RBX: ffff9729d0933e00 RCX: ffff9722c7c0a008
[ 1343.642904] RDX: 0000000000000000 RSI: 0000000000100000 RDI: ffff9722c7c0a000
[ 1343.642906] RBP: ffff9729cafae450 R08: ffffdf6640066598 R09: 0000000000000001
[ 1343.642908] R10: 0000000000000000 R11: 000000000000000b R12: ffff9729dcfa0000
[ 1343.642909] R13: 0000000000000001 R14: ffffdf66400665a8 R15: ffffdf6640066598
[ 1343.642912] FS:  0000000000000000(0000) GS:ffff9729dfc40000(0000) knlGS:0000000000000000
[ 1343.642914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1343.642916] CR2: 00007ffeebd7bf80 CR3: 0000000f5f40e004 CR4: 00000000007606e0
[ 1343.642918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1343.642920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1343.642921] PKRU: 55555554
[ 1343.642922] Call Trace:
[ 1343.642930]  rt_spin_unlock+0x23/0x40
[ 1343.642934]  padata_do_serial+0x8e/0x110
[ 1343.642939]  padata_parallel_worker+0xaf/0xe0
[ 1343.642943]  process_one_work+0x183/0x3b0
[ 1343.642946]  ? process_one_work+0x3b0/0x3b0
[ 1343.642949]  worker_thread+0x30/0x3d0
[ 1343.642952]  ? process_one_work+0x3b0/0x3b0
[ 1343.642956]  kthread+0x112/0x130
[ 1343.642960]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1343.642963]  ret_from_fork+0x35/0x40
[ 1343.642968] ---[ end trace 0000000000000002 ]---
[ 1343.642970] ------------[ cut here ]------------
[ 1343.642976] WARNING: CPU: 1 PID: 24117 at kernel/sched/core.c:7273 migrate_disable+0x1ad/0x1f0
[ 1343.642977] Modules linked in: authenc pcrypt crypto_user vfat fat intel_rapl nfit libnvdimm intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ses enclosure intel_uncore intel_rapl_perf mgag200 ipmi_ssif i2c_algo_bit sg pcspkr ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ioatdma hpilo tg3 hpwdt mei_me mei lpc_ich wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod uas usb_storage crc32c_intel smartpqi scsi_transport_sas dm_mod
[ 1343.643011] CPU: 1 PID: 24117 Comm: kworker/1:1 Tainted: G        W        --------- ---  4.18.0-56.rt9.107.el8.x86_64 #1
[ 1343.643012] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/14/2017
[ 1343.643016] Workqueue: pencrypt padata_parallel_worker
[ 1343.643020] RIP: 0010:migrate_disable+0x1ad/0x1f0
[ 1343.643023] Code: 0f 8e 6a ff ff ff 8b 4b 44 48 c7 c2 40 53 02 00 48 03 14 cd 00 d7 12 b7 48 83 aa a0 08 00 00 01 e9 4b ff ff ff e8 13 28 0c 00 <0f> 0b e9 97 fe ff ff 83 bb a0 03 00 00 01 0f 8e 32 ff ff ff 8b 4b
[ 1343.643024] RSP: 0018:ffffbf6e6471fcc8 EFLAGS: 00010246
[ 1343.643027] RAX: 0000000000100000 RBX: ffff9729d0933e00 RCX: ffff9722c7c0a008
[ 1343.643028] RDX: 0000000000100000 RSI: 0000000000100000 RDI: ffff9722c7c0a000
[ 1343.643030] RBP: 0000000000000000 R08: 00040d56874a0e70 R09: 0000000000000024
[ 1343.643032] R10: 00000000000005b6 R11: 00000000000dfe2c R12: 0000000000000024
[ 1343.643034] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000006
[ 1343.643036] FS:  0000000000000000(0000) GS:ffff9729dfc40000(0000) knlGS:0000000000000000
[ 1343.643038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1343.643040] CR2: 00007ffeebd7bf80 CR3: 0000000f5f40e004 CR4: 00000000007606e0
[ 1343.643042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1343.643044] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1343.643045] PKRU: 55555554
[ 1343.643046] Call Trace:
[ 1343.643054]  vprintk_emit+0x3b9/0x460
[ 1343.643060]  printk+0x58/0x6f
[ 1343.643069]  __warn_printk+0x46/0x87
[ 1343.643073]  ? migrate_enable+0x1d8/0x360
[ 1343.643079]  preempt_count_sub+0x5a/0x90
[ 1343.643084]  padata_do_serial+0x98/0x110
[ 1343.643088]  padata_parallel_worker+0xaf/0xe0
[ 1343.643092]  process_one_work+0x183/0x3b0
[ 1343.643095]  ? process_one_work+0x3b0/0x3b0
[ 1343.643098]  worker_thread+0x30/0x3d0
[ 1343.643101]  ? process_one_work+0x3b0/0x3b0
[ 1343.643105]  kthread+0x112/0x130
[ 1343.643110]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1343.643113]  ret_from_fork+0x35/0x40
[ 1343.643117] ---[ end trace 0000000000000003 ]---
[ 1344.273085] DEBUG_LOCKS_WARN_ON(val > preempt_count())
[ 1344.273099] WARNING: CPU: 1 PID: 24117 at kernel/sched/core.c:3324 preempt_count_sub+0x5a/0x90
[ 1344.273099] Modules linked in: authenc pcrypt crypto_user vfat fat intel_rapl nfit libnvdimm intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ses enclosure intel_uncore intel_rapl_perf mgag200 ipmi_ssif i2c_algo_bit sg pcspkr ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ioatdma hpilo tg3 hpwdt mei_me mei lpc_ich wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod uas usb_storage crc32c_intel smartpqi scsi_transport_sas dm_mod
[ 1344.273134] CPU: 1 PID: 24117 Comm: kworker/1:1 Tainted: G        W        --------- ---  4.18.0-56.rt9.107.el8.x86_64 #1
[ 1344.273136] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/14/2017
[ 1344.273140] Workqueue: pencrypt padata_parallel_worker
[ 1344.273146] RIP: 0010:preempt_count_sub+0x5a/0x90
[ 1344.273148] Code: 36 f4 49 c3 e8 b7 30 32 00 85 c0 74 f6 8b 15 1d 31 dd 01 85 d2 75 ec 48 c7 c6 99 23 08 b7 48 c7 c7 fb b1 06 b7 e8 80 42 fd ff <0f> 0b c3 84 d2 75 c9 e8 8a 30 32 00 85 c0 74 c9 8b 05 f0 30 dd 01
[ 1344.273150] RSP: 0018:ffffbf6e6471fe10 EFLAGS: 00010282
[ 1344.273152] RAX: 0000000000000000 RBX: ffffdf6640066558 RCX: 0000000000000001
[ 1344.273154] RDX: 0000000000000000 RSI: ffffffffb70b2c77 RDI: 00000000ffffffff
[ 1344.273155] RBP: ffff9729cafae450 R08: 0000000000000000 R09: ffff9729d09345b0
[ 1344.273157] R10: ffff9729d09341b0 R11: ffffffffb7c0998d R12: ffff9729dcfa0000
[ 1344.273159] R13: 0000000000000001 R14: ffffdf66400665a8 R15: ffffdf6640066598
[ 1344.273161] FS:  0000000000000000(0000) GS:ffff9729dfc40000(0000) knlGS:0000000000000000
[ 1344.273163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1344.273165] CR2: 00007ffeebd7bf80 CR3: 0000000f5f40e004 CR4: 00000000007606e0
[ 1344.273167] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1344.273169] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1344.273170] PKRU: 55555554
[ 1344.273170] Call Trace:
[ 1344.273174]  padata_do_serial+0x98/0x110
[ 1344.273180]  padata_parallel_worker+0xaf/0xe0
[ 1344.273184]  process_one_work+0x183/0x3b0
[ 1344.273187]  ? process_one_work+0x3b0/0x3b0
[ 1344.273190]  worker_thread+0x30/0x3d0
[ 1344.273193]  ? process_one_work+0x3b0/0x3b0
[ 1344.273197]  kthread+0x112/0x130
[ 1344.273201]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1344.273204]  ret_from_fork+0x35/0x40
[ 1344.273209] ---[ end trace 0000000000000004 ]---

This problem doesn't occur on old package RHEL-8.0-20180904.0 (kernel-rt-4.18.0-27.rt7.67.el8).
AND this problem doesn't occur on RHEL kernel 4.18.0-56.el8.x86_64.

Version-Release number of selected component (if applicable):
kernel-rt-4.18.0-56.rt9.107.el8.x86_64
glibc-2.28-42.el8
latest-ltp - https://github.com/linux-test-project/ltp

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Clark Williams 2019-01-08 15:50:49 UTC
(In reply to Red Hat Bugzilla Rules Engine from comment #1)
> Blocker requests need a proper justification to be processed and approved.
> 
> Please answer the following questions (after reviewing with SST leads from
> Dev/QE/PM) and provide whatever context might be helpful, including
> requested target milestone:
> * What is the risk with putting this into the release vs. doing it in the
> next release? (For example, is this a regression?)

This is a regression that *only* affects the RHEL8-RT product. 

> * Is this related to an MVP or Layered Product request, if so what?

No, this is strictly a RHEL8-RT bug, and should be addressed in an RT build.

> * Do you have signoff from your Dev/QE/PM leads on the status of this as an
> exception for the team?

Yes, this bug is due to a RHEL change calling get_cpu/put_cpu causing the RT
spinlock to be acquired in atomic context (which is a bad thing). Changing to
get_cpu_light/put_cpu_light fixes this for RT and remains the same for RHEL.

Comment 6 Daniel Bristot de Oliveira 2019-01-08 22:55:39 UTC
Created attachment 1519296 [details]
[PATCH RT] padata: Make padata_do_serial() use get_cpu_light()

Patch to be submitted upstream after testing

[PATCH RT] padata: Make padata_do_serial() use get_cpu_light()

We hit the following BUG:

  BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
  Preemption disabled at:
  [<ffffffffb61fd824>] padata_do_serial+0x24/0x110
  CPU: 1 PID: 24117 Comm: kworker/1:1 Not tainted 4.18.0-56.rt9.107.el8.x86_64 #1
  Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/14/2017
  Workqueue: pencrypt padata_parallel_worker
  Call Trace:
    dump_stack+0x5c/0x80
    ? padata_do_serial+0x24/0x110
    __schedule_bug.cold.83+0x8e/0x9b
    __schedule+0x5a0/0x680
    schedule+0x39/0xd0
    rt_spin_lock_slowlock_locked+0x10e/0x2b0
    rt_spin_lock_slowlock+0x50/0x80
    padata_do_serial+0x4d/0x110
    padata_parallel_worker+0xaf/0xe0
    process_one_work+0x183/0x3b0
    ? process_one_work+0x3b0/0x3b0
    worker_thread+0x30/0x3d0
    ? process_one_work+0x3b0/0x3b0
    kthread+0x112/0x130
    ? kthread_create_worker_on_cpu+0x70/0x70
    ret_from_fork+0x35/0x40

and the cause is a spin_lock() taken inside a get_cpu() section.

Convert the get/put_cpu to get/put_cpu_light to fix the BUG while reducing the
preempt_disable section.

Signed-off-by: Daniel Bristot de Oliveira <bristot>
Cc: Sebastian Andrzej Siewior <bigeasy>
Cc: Thomas Gleixner <tglx>
Cc: Clark Williams <williams>
Cc: linux-rt-users.org
Cc: linux-crypto.org
Cc: linux-kernel.org

Comment 7 Daniel Bristot de Oliveira 2019-01-09 16:14:48 UTC
Created attachment 1519527 [details]
[PATCH RT] padata: Make padata_do_serial() use get_cpu_light()

We hit the following BUG:

  BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
  Preemption disabled at:
  [<ffffffffb61fd824>] padata_do_serial+0x24/0x110
  CPU: 1 PID: 24117 Comm: kworker/1:1 Not tainted 4.18.0-56.rt9.107.el8.x86_64 #1
  Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/14/2017
  Workqueue: pencrypt padata_parallel_worker
  Call Trace:
    dump_stack+0x5c/0x80
    ? padata_do_serial+0x24/0x110
    __schedule_bug.cold.83+0x8e/0x9b
    __schedule+0x5a0/0x680
    schedule+0x39/0xd0
    rt_spin_lock_slowlock_locked+0x10e/0x2b0
    rt_spin_lock_slowlock+0x50/0x80
    padata_do_serial+0x4d/0x110
    padata_parallel_worker+0xaf/0xe0
    process_one_work+0x183/0x3b0
    ? process_one_work+0x3b0/0x3b0
    worker_thread+0x30/0x3d0
    ? process_one_work+0x3b0/0x3b0
    kthread+0x112/0x130
    ? kthread_create_worker_on_cpu+0x70/0x70
    ret_from_fork+0x35/0x40

and the cause is a spin_lock() taken inside a get_cpu() section.

Convert the get/put_cpu to get/put_cpu_light to fix the BUG while reducing the
preempt_disable section.

Signed-off-by: Daniel Bristot de Oliveira <bristot>
Reviewed-by: Clark Williams <williams>
Cc: Sebastian Andrzej Siewior <bigeasy>
Cc: Thomas Gleixner <tglx>
Cc: Clark Williams <williams>
Cc: linux-rt-users.org
Cc: linux-crypto.org
Cc: linux-kernel.org

Comment 8 Juri Lelli 2019-01-10 06:23:16 UTC
I managed to build kernel-rt-4.18.0-58.rt9.109.el8 + Daniel's fix: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=19731657

Qiao Zhao, could you please try to see if problem is still reproducible with the kernel above?

Comment 9 Qiao Zhao 2019-01-10 06:43:38 UTC
(In reply to Juri Lelli from comment #8)
> I managed to build kernel-rt-4.18.0-58.rt9.109.el8 + Daniel's fix:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=19731657
> 
> Qiao Zhao, could you please try to see if problem is still reproducible with
> the kernel above?

Hi Juri,

Cool, the same test case passed by fix kernel, no call trace and panic.
# uname -r
4.18.0-58.rt9.109.el8.bz1664257.x86_64
# ./pcrypt_aead01 
tst_test.c:1085: INFO: Timeout per run is 0h 05m 00s
pcrypt_aead01.c:71: PASS: Nothing bad appears to have happened

Summary:
passed   1
failed   0
skipped  0
warnings 0

--
Thanks,
Qiao

Comment 10 Clark Williams 2019-01-10 20:56:30 UTC
Fixed in build kernel-rt-4.18.0-58.rt9.110.el8


Note You need to log in before you can comment on or make changes to this bug.