RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1518274 - backport: c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race
Summary: backport: c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Clark Williams
QA Contact: Jiri Kastner
URL:
Whiteboard:
Depends On:
Blocks: 1442258
TreeView+ depends on / blocked
 
Reported: 2017-11-28 14:31 UTC by Luiz Capitulino
Modified: 2018-04-10 09:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-04-10 09:00:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race (5.67 KB, patch)
2017-11-28 14:33 UTC, Luiz Capitulino
no flags Details | Diff
Revert "[rt] avoid disabling preemption during fast iova allocations" (841 bytes, patch)
2017-11-28 14:34 UTC, Luiz Capitulino
no flags Details | Diff
a761e53ed4 iommu/iova: Don't disable preempt around this_cpu_ptr() (3.35 KB, patch)
2017-11-28 14:35 UTC, Luiz Capitulino
no flags Details | Diff
raw_cpu_ptr -> this_cpu_ptr (1.13 KB, patch)
2017-11-28 14:35 UTC, Luiz Capitulino
no flags Details | Diff
locking/rtmutex: Prevent dequeue vs. unlock race (6.54 KB, patch)
2017-12-07 05:21 UTC, Clark Williams
no flags Details | Diff
iommu/iova: Don't disable preempt around this_cpu_ptr() (3.48 KB, patch)
2017-12-07 05:22 UTC, Clark Williams
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:0676 0 None None None 2018-04-10 09:02:11 UTC

Description Luiz Capitulino 2017-11-28 14:31:05 UTC
Description of problem:

When comparing downstream and upstream rtmutex implementations, I found that we're missing this fix:

c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race

This fix seemed to help with a manual reproducer of bug 1448770. However, when trying to reproduce bug 1448770 with c4ccd6b1ce applied, I ran into an scheduling while atomic bug, so we actually need:

c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race
a761e53ed4 iommu/iova: Don't disable preempt around this_cpu_ptr()

However, a761e53ed4 needs work before and after applying it:

- Before: revert a0de71dc0fea since it will conflict and since a761e53ed4 changes supersedes a0de71dc0fea

- After: rename raw_cpu_ptr() to this_cpu_ptr() since that's what exist downstream.

I'll attach my version of this work.

Version-Release number of selected component (if applicable): kernel-rt-3.10.0-789.rt56.723.el7

Comment 2 Luiz Capitulino 2017-11-28 14:33:47 UTC
Created attachment 1359903 [details]
c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race

Comment 3 Luiz Capitulino 2017-11-28 14:34:21 UTC
Created attachment 1359904 [details]
Revert "[rt] avoid disabling preemption during fast iova allocations"

Comment 4 Luiz Capitulino 2017-11-28 14:35:00 UTC
Created attachment 1359905 [details]
a761e53ed4 iommu/iova: Don't disable preempt around this_cpu_ptr()

Comment 5 Luiz Capitulino 2017-11-28 14:35:31 UTC
Created attachment 1359918 [details]
raw_cpu_ptr -> this_cpu_ptr

Comment 6 Luiz Capitulino 2017-11-28 16:29:44 UTC
Forgot to post the scheduling while atomic dump patches 1-3 are fixing:

[18094.434539] BUG: scheduling while atomic: ksoftirqd/5/58/0x00000002
[18094.434567] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink bridge stp llc iTCO_wdt iTCO_vendor_support sb_edac intel_powerclamp dcdbas coretemp intel_rapl i
osf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif pcspkr sg ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter shpchp mei_me mei lpc_ich nfsd auth_rpcgss nfs_acl lockd grace ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 i2c_algo_bit drm_kms_helper
[18094.434574]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci tg3 i2c_core libahci ptp pps_core libata mxm_wmi megaraid_sas wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod scsi_transport_iscsi
[18094.434576] CPU: 5 PID: 58 Comm: ksoftirqd/5 Not tainted 3.10.0-789.rt56.723.fix1.el7.x86_64 #1
[18094.434577] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.0.1 04/11/2016
[18094.434578] Call Trace:
[18094.434586]  [<ffffffffb56d8e16>] dump_stack+0x19/0x1b
[18094.434589]  [<ffffffffb56d3691>] __schedule_bug+0x62/0x70
[18094.434592]  [<ffffffffb56ddaf6>] __schedule+0x6b6/0x830
[18094.434594]  [<ffffffffb56ddca0>] schedule+0x30/0xa0
[18094.434595]  [<ffffffffb56dea7d>] rt_spin_lock_slowlock+0x13d/0x360
[18094.434597]  [<ffffffffb56dffe5>] rt_spin_lock+0x25/0x30
[18094.434601]  [<ffffffffb5575067>] alloc_iova_fast+0x167/0x220
[18094.434605]  [<ffffffffb55809e5>] intel_alloc_iova+0xa5/0xd0
[18094.434607]  [<ffffffffb5584eb5>] intel_map_sg+0xc5/0x240
[18094.434611]  [<ffffffffb54879ba>] scsi_dma_map+0xaa/0xe0
[18094.434618]  [<ffffffffc02c0c4c>] megasas_build_io_fusion+0xfc/0x8e0 [megaraid_sas]
[18094.434623]  [<ffffffffb50c1cec>] ? try_to_wake_up+0x6c/0x560
[18094.434629]  [<ffffffffc02c16dd>] megasas_build_and_issue_cmd_fusion+0xed/0x300 [megaraid_sas]
[18094.434633]  [<ffffffffc02b022e>] megasas_queue_command+0x11e/0x130 [megaraid_sas]
[18094.434635]  [<ffffffffb547cf7a>] scsi_dispatch_cmd+0xaa/0x290
[18094.434637]  [<ffffffffb5486546>] scsi_request_fn+0x4f6/0x6b0
[18094.434642]  [<ffffffffb5303233>] __blk_run_queue+0x33/0x40
[18094.434644]  [<ffffffffb5303286>] blk_run_queue+0x26/0x40
[18094.434646]  [<ffffffffb5485148>] scsi_run_queue+0x288/0x320
[18094.434647]  [<ffffffffb547c2bd>] ? __scsi_put_command+0x2d/0x90
[18094.434649]  [<ffffffffb5486740>] scsi_next_command+0x20/0x40
[18094.434650]  [<ffffffffb5486899>] scsi_end_request+0x139/0x1e0
[18094.434652]  [<ffffffffb5486b08>] scsi_io_completion+0x168/0x6a0
[18094.434655]  [<ffffffffb547b955>] scsi_finish_command+0xd5/0x130
[18094.434657]  [<ffffffffb5486022>] scsi_softirq_done+0x132/0x160
[18094.434659]  [<ffffffffb530e0d0>] blk_done_softirq+0xa0/0xe0
[18094.434661]  [<ffffffffb508a750>] do_current_softirqs+0x240/0x470
[18094.434663]  [<ffffffffb508aa6a>] run_ksoftirqd+0x3a/0x70
[18094.434665]  [<ffffffffb50b5e62>] smpboot_thread_fn+0x202/0x2d0
[18094.434667]  [<ffffffffb50b5c60>] ? lg_local_unlock+0x20/0x20
[18094.434670]  [<ffffffffb50ac98f>] kthread+0xcf/0xe0
[18094.434672]  [<ffffffffb50ac8c0>] ? kthread_worker_fn+0x170/0x170
[18094.434673]  [<ffffffffb56e8e58>] ret_from_fork+0x58/0x90
[18094.434675]  [<ffffffffb50ac8c0>] ? kthread_worker_fn+0x170/0x170

Comment 7 Clark Williams 2017-11-29 17:34:58 UTC
For some reason I don't have the commit id's listed in c#1. So tracking down the equivalent mods in 7.5:

revert 295e3dc1a7187 [rt] avoid disabling preemption during fast iova allocations

apply aaffaa8a3b595 iommu/iova: Don't disable preempt around this_cpu_ptr()

Then change references to raw_cpu_ptr to this_cpu_ptr.

Comment 8 Luiz Capitulino 2017-11-29 17:44:33 UTC
They are from the upstream RT devel repo.

Comment 11 Clark Williams 2017-12-07 05:21:41 UTC
Created attachment 1364032 [details]
locking/rtmutex: Prevent dequeue vs. unlock race

Comment 12 Clark Williams 2017-12-07 05:22:12 UTC
Created attachment 1364033 [details]
iommu/iova: Don't disable preempt around this_cpu_ptr()

Comment 13 Clark Williams 2017-12-07 05:24:10 UTC
Patches sent to kernel-rt-team list

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14701302

Scratch build done against kernel-rt-3.10.0-809.rt56.745.el7

Booted, 12h rteval running on realtime-03.khw.lab.eng.bos.redhat.com

Comment 14 Luis Claudio R. Goncalves 2017-12-07 12:58:59 UTC
We need to review the "iommu/iova: Don't disable preempt around this_cpu_ptr()" patch in light of this commit, which introduced raw_cpu_ptr:

    b3ca1c10d7b3 percpu: add raw_cpu_ops

In our current code, raw_cpu_ptr() maps directly to __this_cpu_ptr() as this is the version that does not check preemption. Upstream morphed things in a way that later on this_cpu_ptr() dropped the checks it did.

My impression is that the change will be basically this:

s/this_cpu_ptr/__this_cpu_ptr/g

Comment 18 Clark Williams 2018-02-19 17:22:39 UTC
@jiri, yes looks correct to me.

Comment 21 errata-xmlrpc 2018-04-10 09:00:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0676


Note You need to log in before you can comment on or make changes to this bug.