Bug 1664380 - BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
Summary: BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.7
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Daniel Bristot de Oliveira
QA Contact: Tiefu
URL:
Whiteboard:
Depends On:
Blocks: 1655694
TreeView+ depends on / blocked
 
Reported: 2019-01-08 15:44 UTC by Daniel Bristot de Oliveira
Modified: 2019-08-06 12:36 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1664257
Environment:
Last Closed: 2019-08-06 12:36:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
padata: Make padata_do_serial() use get_cpu_light() (2.39 KB, patch)
2019-02-05 18:07 UTC, Daniel Bristot de Oliveira
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:2043 0 None None None 2019-08-06 12:36:58 UTC

Comment 2 Daniel Bristot de Oliveira 2019-01-08 15:45:51 UTC
By code inspection, it is possible to see that this problem can also happen in RHEL7-rt.

Comment 8 Daniel Bristot de Oliveira 2019-02-05 18:07:23 UTC
Created attachment 1527249 [details]
padata: Make padata_do_serial() use get_cpu_light()

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1664380
BrewBuild: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=20081122

Internal fix.

We hit the following BUG in RHEL8:

  BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
  Preemption disabled at:
  [<ffffffffb61fd824>] padata_do_serial+0x24/0x110
  CPU: 1 PID: 24117 Comm: kworker/1:1 Not tainted 4.18.0-56.rt9.107.el8.x86_64 #1
  Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/14/2017
  Workqueue: pencrypt padata_parallel_worker
  Call Trace:
    dump_stack+0x5c/0x80
    ? padata_do_serial+0x24/0x110
    __schedule_bug.cold.83+0x8e/0x9b
    __schedule+0x5a0/0x680
    schedule+0x39/0xd0
    rt_spin_lock_slowlock_locked+0x10e/0x2b0
    rt_spin_lock_slowlock+0x50/0x80
    padata_do_serial+0x4d/0x110
    padata_parallel_worker+0xaf/0xe0
    process_one_work+0x183/0x3b0
    ? process_one_work+0x3b0/0x3b0
    worker_thread+0x30/0x3d0
    ? process_one_work+0x3b0/0x3b0
    kthread+0x112/0x130
    ? kthread_create_worker_on_cpu+0x70/0x70
    ret_from_fork+0x35/0x40

and the cause is a spin_lock() taken inside a get_cpu() section.

Convert the get/put_cpu to get/put_cpu_light to fix the BUG while reducing the
preempt_disable section.

As we also have this code on RHEL7, we also need this patch. This patch
differs from RHEL8 one because there is only one get_cpu() usage.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>

Comment 11 Tiefu 2019-05-15 09:23:09 UTC
[Tiefu Li on 15 May 2019]
I have done two different ways to verify the bug.
Here is the first approach:
Step 1.cd /mnt/tests/kernel/distribution/ltp/generic/ltp-full-20190115/testcases/kernel/crypto
Step 2. Run the test :./pcrypt_aead01
All passed

The second approach is:
Step 1.cd /mnt/tests/kernel/distribution/ltp/generic/
Step 2. FILTERTESTS="cve-2017-18075" make run

Both of the testing was conducted on 3.10.0-999.rt56.956.el7.x86_64 and kernel-rt-3.10.0-1010.rt56.968.el7.x86_64 respectively.
I haven't seen any issue therefore I mark the bug as verified.

Comment 14 errata-xmlrpc 2019-08-06 12:36:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2043


Note You need to log in before you can comment on or make changes to this bug.