Bug 1664380

Summary: BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
Product: Red Hat Enterprise Linux 7 Reporter: Daniel Bristot de Oliveira <daolivei>
Component: kernel-rtAssignee: Daniel Bristot de Oliveira <daolivei>
kernel-rt sub component: Scheduler QA Contact: Tiefu <tieli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: bhu, core-kernel-mgr, daolivei, lgoncalv, pmatouse, qzhao
Version: 7.7   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1664257 Environment:
Last Closed: 2019-08-06 12:36:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1655694    
Attachments:
Description Flags
padata: Make padata_do_serial() use get_cpu_light() none

Comment 2 Daniel Bristot de Oliveira 2019-01-08 15:45:51 UTC
By code inspection, it is possible to see that this problem can also happen in RHEL7-rt.

Comment 8 Daniel Bristot de Oliveira 2019-02-05 18:07:23 UTC
Created attachment 1527249 [details]
padata: Make padata_do_serial() use get_cpu_light()

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1664380
BrewBuild: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=20081122

Internal fix.

We hit the following BUG in RHEL8:

  BUG: scheduling while atomic: kworker/1:1/24117/0x00000002
  Preemption disabled at:
  [<ffffffffb61fd824>] padata_do_serial+0x24/0x110
  CPU: 1 PID: 24117 Comm: kworker/1:1 Not tainted 4.18.0-56.rt9.107.el8.x86_64 #1
  Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/14/2017
  Workqueue: pencrypt padata_parallel_worker
  Call Trace:
    dump_stack+0x5c/0x80
    ? padata_do_serial+0x24/0x110
    __schedule_bug.cold.83+0x8e/0x9b
    __schedule+0x5a0/0x680
    schedule+0x39/0xd0
    rt_spin_lock_slowlock_locked+0x10e/0x2b0
    rt_spin_lock_slowlock+0x50/0x80
    padata_do_serial+0x4d/0x110
    padata_parallel_worker+0xaf/0xe0
    process_one_work+0x183/0x3b0
    ? process_one_work+0x3b0/0x3b0
    worker_thread+0x30/0x3d0
    ? process_one_work+0x3b0/0x3b0
    kthread+0x112/0x130
    ? kthread_create_worker_on_cpu+0x70/0x70
    ret_from_fork+0x35/0x40

and the cause is a spin_lock() taken inside a get_cpu() section.

Convert the get/put_cpu to get/put_cpu_light to fix the BUG while reducing the
preempt_disable section.

As we also have this code on RHEL7, we also need this patch. This patch
differs from RHEL8 one because there is only one get_cpu() usage.

Signed-off-by: Daniel Bristot de Oliveira <bristot>

Comment 11 Tiefu 2019-05-15 09:23:09 UTC
[Tiefu Li on 15 May 2019]
I have done two different ways to verify the bug.
Here is the first approach:
Step 1.cd /mnt/tests/kernel/distribution/ltp/generic/ltp-full-20190115/testcases/kernel/crypto
Step 2. Run the test :./pcrypt_aead01
All passed

The second approach is:
Step 1.cd /mnt/tests/kernel/distribution/ltp/generic/
Step 2. FILTERTESTS="cve-2017-18075" make run

Both of the testing was conducted on 3.10.0-999.rt56.956.el7.x86_64 and kernel-rt-3.10.0-1010.rt56.968.el7.x86_64 respectively.
I haven't seen any issue therefore I mark the bug as verified.

Comment 14 errata-xmlrpc 2019-08-06 12:36:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2043