Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1518274 - backport: c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race
backport: c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt (Show other bugs)
7.5
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Clark Williams
Jiri Kastner
:
Depends On:
Blocks: 1442258
  Show dependency treegraph
 
Reported: 2017-11-28 09:31 EST by Luiz Capitulino
Modified: 2018-04-10 05:02 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-04-10 05:00:10 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race (5.67 KB, patch)
2017-11-28 09:33 EST, Luiz Capitulino
no flags Details | Diff
Revert "[rt] avoid disabling preemption during fast iova allocations" (841 bytes, patch)
2017-11-28 09:34 EST, Luiz Capitulino
no flags Details | Diff
a761e53ed4 iommu/iova: Don't disable preempt around this_cpu_ptr() (3.35 KB, patch)
2017-11-28 09:35 EST, Luiz Capitulino
no flags Details | Diff
raw_cpu_ptr -> this_cpu_ptr (1.13 KB, patch)
2017-11-28 09:35 EST, Luiz Capitulino
no flags Details | Diff
locking/rtmutex: Prevent dequeue vs. unlock race (6.54 KB, patch)
2017-12-07 00:21 EST, Clark Williams
no flags Details | Diff
iommu/iova: Don't disable preempt around this_cpu_ptr() (3.48 KB, patch)
2017-12-07 00:22 EST, Clark Williams
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:0676 None None None 2018-04-10 05:02 EDT

  None (edit)
Description Luiz Capitulino 2017-11-28 09:31:05 EST
Description of problem:

When comparing downstream and upstream rtmutex implementations, I found that we're missing this fix:

c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race

This fix seemed to help with a manual reproducer of bug 1448770. However, when trying to reproduce bug 1448770 with c4ccd6b1ce applied, I ran into an scheduling while atomic bug, so we actually need:

c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race
a761e53ed4 iommu/iova: Don't disable preempt around this_cpu_ptr()

However, a761e53ed4 needs work before and after applying it:

- Before: revert a0de71dc0fea since it will conflict and since a761e53ed4 changes supersedes a0de71dc0fea

- After: rename raw_cpu_ptr() to this_cpu_ptr() since that's what exist downstream.

I'll attach my version of this work.

Version-Release number of selected component (if applicable): kernel-rt-3.10.0-789.rt56.723.el7
Comment 2 Luiz Capitulino 2017-11-28 09:33 EST
Created attachment 1359903 [details]
c4ccd6b1ce locking/rtmutex: Prevent dequeue vs. unlock race
Comment 3 Luiz Capitulino 2017-11-28 09:34 EST
Created attachment 1359904 [details]
Revert "[rt] avoid disabling preemption during fast iova allocations"
Comment 4 Luiz Capitulino 2017-11-28 09:35 EST
Created attachment 1359905 [details]
a761e53ed4 iommu/iova: Don't disable preempt around this_cpu_ptr()
Comment 5 Luiz Capitulino 2017-11-28 09:35 EST
Created attachment 1359918 [details]
raw_cpu_ptr -> this_cpu_ptr
Comment 6 Luiz Capitulino 2017-11-28 11:29:44 EST
Forgot to post the scheduling while atomic dump patches 1-3 are fixing:

[18094.434539] BUG: scheduling while atomic: ksoftirqd/5/58/0x00000002
[18094.434567] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink bridge stp llc iTCO_wdt iTCO_vendor_support sb_edac intel_powerclamp dcdbas coretemp intel_rapl i
osf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif pcspkr sg ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter shpchp mei_me mei lpc_ich nfsd auth_rpcgss nfs_acl lockd grace ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 i2c_algo_bit drm_kms_helper
[18094.434574]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci tg3 i2c_core libahci ptp pps_core libata mxm_wmi megaraid_sas wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod scsi_transport_iscsi
[18094.434576] CPU: 5 PID: 58 Comm: ksoftirqd/5 Not tainted 3.10.0-789.rt56.723.fix1.el7.x86_64 #1
[18094.434577] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.0.1 04/11/2016
[18094.434578] Call Trace:
[18094.434586]  [<ffffffffb56d8e16>] dump_stack+0x19/0x1b
[18094.434589]  [<ffffffffb56d3691>] __schedule_bug+0x62/0x70
[18094.434592]  [<ffffffffb56ddaf6>] __schedule+0x6b6/0x830
[18094.434594]  [<ffffffffb56ddca0>] schedule+0x30/0xa0
[18094.434595]  [<ffffffffb56dea7d>] rt_spin_lock_slowlock+0x13d/0x360
[18094.434597]  [<ffffffffb56dffe5>] rt_spin_lock+0x25/0x30
[18094.434601]  [<ffffffffb5575067>] alloc_iova_fast+0x167/0x220
[18094.434605]  [<ffffffffb55809e5>] intel_alloc_iova+0xa5/0xd0
[18094.434607]  [<ffffffffb5584eb5>] intel_map_sg+0xc5/0x240
[18094.434611]  [<ffffffffb54879ba>] scsi_dma_map+0xaa/0xe0
[18094.434618]  [<ffffffffc02c0c4c>] megasas_build_io_fusion+0xfc/0x8e0 [megaraid_sas]
[18094.434623]  [<ffffffffb50c1cec>] ? try_to_wake_up+0x6c/0x560
[18094.434629]  [<ffffffffc02c16dd>] megasas_build_and_issue_cmd_fusion+0xed/0x300 [megaraid_sas]
[18094.434633]  [<ffffffffc02b022e>] megasas_queue_command+0x11e/0x130 [megaraid_sas]
[18094.434635]  [<ffffffffb547cf7a>] scsi_dispatch_cmd+0xaa/0x290
[18094.434637]  [<ffffffffb5486546>] scsi_request_fn+0x4f6/0x6b0
[18094.434642]  [<ffffffffb5303233>] __blk_run_queue+0x33/0x40
[18094.434644]  [<ffffffffb5303286>] blk_run_queue+0x26/0x40
[18094.434646]  [<ffffffffb5485148>] scsi_run_queue+0x288/0x320
[18094.434647]  [<ffffffffb547c2bd>] ? __scsi_put_command+0x2d/0x90
[18094.434649]  [<ffffffffb5486740>] scsi_next_command+0x20/0x40
[18094.434650]  [<ffffffffb5486899>] scsi_end_request+0x139/0x1e0
[18094.434652]  [<ffffffffb5486b08>] scsi_io_completion+0x168/0x6a0
[18094.434655]  [<ffffffffb547b955>] scsi_finish_command+0xd5/0x130
[18094.434657]  [<ffffffffb5486022>] scsi_softirq_done+0x132/0x160
[18094.434659]  [<ffffffffb530e0d0>] blk_done_softirq+0xa0/0xe0
[18094.434661]  [<ffffffffb508a750>] do_current_softirqs+0x240/0x470
[18094.434663]  [<ffffffffb508aa6a>] run_ksoftirqd+0x3a/0x70
[18094.434665]  [<ffffffffb50b5e62>] smpboot_thread_fn+0x202/0x2d0
[18094.434667]  [<ffffffffb50b5c60>] ? lg_local_unlock+0x20/0x20
[18094.434670]  [<ffffffffb50ac98f>] kthread+0xcf/0xe0
[18094.434672]  [<ffffffffb50ac8c0>] ? kthread_worker_fn+0x170/0x170
[18094.434673]  [<ffffffffb56e8e58>] ret_from_fork+0x58/0x90
[18094.434675]  [<ffffffffb50ac8c0>] ? kthread_worker_fn+0x170/0x170
Comment 7 Clark Williams 2017-11-29 12:34:58 EST
For some reason I don't have the commit id's listed in c#1. So tracking down the equivalent mods in 7.5:

revert 295e3dc1a7187 [rt] avoid disabling preemption during fast iova allocations

apply aaffaa8a3b595 iommu/iova: Don't disable preempt around this_cpu_ptr()

Then change references to raw_cpu_ptr to this_cpu_ptr.
Comment 8 Luiz Capitulino 2017-11-29 12:44:33 EST
They are from the upstream RT devel repo.
Comment 11 Clark Williams 2017-12-07 00:21 EST
Created attachment 1364032 [details]
locking/rtmutex: Prevent dequeue vs. unlock race
Comment 12 Clark Williams 2017-12-07 00:22 EST
Created attachment 1364033 [details]
iommu/iova: Don't disable preempt around this_cpu_ptr()
Comment 13 Clark Williams 2017-12-07 00:24:10 EST
Patches sent to kernel-rt-team list

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14701302

Scratch build done against kernel-rt-3.10.0-809.rt56.745.el7

Booted, 12h rteval running on realtime-03.khw.lab.eng.bos.redhat.com
Comment 14 Luis Claudio R. Goncalves 2017-12-07 07:58:59 EST
We need to review the "iommu/iova: Don't disable preempt around this_cpu_ptr()" patch in light of this commit, which introduced raw_cpu_ptr:

    b3ca1c10d7b3 percpu: add raw_cpu_ops

In our current code, raw_cpu_ptr() maps directly to __this_cpu_ptr() as this is the version that does not check preemption. Upstream morphed things in a way that later on this_cpu_ptr() dropped the checks it did.

My impression is that the change will be basically this:

s/this_cpu_ptr/__this_cpu_ptr/g
Comment 18 Clark Williams 2018-02-19 12:22:39 EST
@jiri, yes looks correct to me.
Comment 21 errata-xmlrpc 2018-04-10 05:00:10 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:0676

Note You need to log in before you can comment on or make changes to this bug.