Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 465862 - Warning from rt_mutex code while testing infiniband
Warning from rt_mutex code while testing infiniband
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
1.0
All Linux
medium Severity medium
: 1.1
: ---
Assigned To: Steven Rostedt
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-06 15:39 EDT by Clark Williams
Modified: 2009-01-22 05:44 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-22 05:44:51 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to correct locking order of ib driver (1.34 KB, patch)
2008-10-06 15:47 EDT, Clark Williams
no flags Details | Diff
patch to allow rwlocks to unlock out of order (3.22 KB, patch)
2008-10-08 23:31 EDT, Steven Rostedt
no flags Details | Diff
upadet rwlock torture test to include checking of unnested locks (8.23 KB, patch)
2008-10-08 23:32 EDT, Steven Rostedt
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0009 normal SHIPPED_LIVE Important: kernel security and bug fix update 2009-01-22 05:43:54 EST

  None (edit)
Description Clark Williams 2008-10-06 15:39:16 EDT
Description of problem:

testing of the openib code generated the following traceback on -65 kernel:

WARNING: at kernel/rtmutex.c:1852 rt_read_fastunlock()
Pid: 16811, comm: ibv_rc_pingpong Not tainted 2.6.24.7-65.el5rt #1

Call Trace:
 [<ffffffff811357b2>] ? free_layer+0x37/0x3f
 [<ffffffff8105f31f>] rt_mutex_up_read+0x1a4/0x232
 [<ffffffff8105fcbc>] rt_up_read+0x9/0xb
 [<ffffffff881922b1>] :ib_uverbs:put_uobj_read+0x15/0x21
 [<ffffffff881922f7>] :ib_uverbs:put_pd_read+0xd/0xf
 [<ffffffff88194f8f>] :ib_uverbs:ib_uverbs_create_qp+0x39c/0x4cf
 [<ffffffff88191ae0>] ? :ib_uverbs:ib_uverbs_qp_event_handler+0x0/0x2d
 [<ffffffff88191843>] :ib_uverbs:ib_uverbs_write+0x96/0xb0
 [<ffffffff810b00d5>] vfs_write+0xc7/0x170
 [<ffffffff810b06e5>] sys_write+0x4a/0x76
 [<ffffffff8100c35e>] traceret+0x0/0x5

Later testing on the MRG 1.0.3 errata kernel generated this traceback (from /var/log/messages):

Oct  2 04:44:35 dhcp71-141 kernel: WARNING: at kernel/rtmutex.c:1896
rt_read_fastunlock()
Oct  2 04:44:35 dhcp71-141 kernel: Pid: 4589, comm: ibv_rc_pingpong Not tainted
2.6.24.7-81.el5rt #1
Oct  2 04:44:35 dhcp71-141 kernel: 
Oct  2 04:44:35 dhcp71-141 kernel: Call Trace:
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff81135ca6>] ? free_layer+0x37/0x3f
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff8105f521>]
rt_mutex_up_read+0x1d2/0x260
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff8105fe55>] rt_up_read+0x9/0xb
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff883732b9>]
:ib_uverbs:put_uobj_read+0x15/0x21
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff883732ff>]
:ib_uverbs:put_pd_read+0xd/0xf
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff88375fd2>]
:ib_uverbs:ib_uverbs_create_qp+0x39c/0x4cf
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff88372ae0>] ?
:ib_uverbs:ib_uverbs_qp_event_handler+0x0/0x2d
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff88372843>]
:ib_uverbs:ib_uverbs_write+0x96/0xb0
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff810b0479>] vfs_write+0xc7/0x170
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff810b0a89>] sys_write+0x4a/0x76
Oct  2 04:44:35 dhcp71-141 kernel:  [<ffffffff8100c37e>] traceret+0x0/0x5
Oct  2 04:44:35 dhcp71-141 kernel: 


Version-Release number of selected component (if applicable):

kernel-rt-2.6.24.7-81.el5rt

How reproducible:
always

Steps to Reproduce:
1. Install RHEL5.2
2. Install MRG RT kernel
3. Install openib package
4. reboot
Comment 1 Clark Williams 2008-10-06 15:47:05 EDT
Created attachment 319594 [details]
patch to correct locking order of ib driver 

Patch from srostedt@redhat.com to address this issue:

The ib driver releases the locks not in the reverse order that it takes them.
The RW locks in RT is very sensitive to this.

Hopefully the attached patch will fix the issue.
Comment 2 Luis Claudio R. Goncalves 2008-10-07 08:30:05 EDT
Patxh added to MRG kernel -85
Comment 3 Steven Rostedt 2008-10-08 23:31:28 EDT
Created attachment 319823 [details]
patch to allow rwlocks to unlock out of order

This patch fixes the quirk inside rwlocks that expected to unlock in the reverse order the locks were taken.

Since other places in the kernel may do this, this is the better patch than the one already attached.
Comment 4 Steven Rostedt 2008-10-08 23:32:42 EDT
Created attachment 319824 [details]
upadet rwlock torture test to include checking of unnested locks

This patch updates the rwlock torture test to include testing locks being released in an order that is not nested.
Comment 5 Luis Claudio R. Goncalves 2008-10-09 10:17:01 EDT
Patches added to -85
Comment 7 David Sommerseth 2008-11-26 10:57:22 EST
Verified by code review.  Tried to trigger the misbehaviour on mrg-13.lab.bos.redhat.com on 2.6.24.7-81 without success.  The problem did indeed exist on dell-pe1950-02.rhts.bos.redhat.com, but that box was not available at the moment of testing.  No errors found using 2.6.24.7-93 on mrg-13.

Found these patches for this bugzilla: 

* ib: release locks in the proper order
2b39f5cb4d843c6d32e55e99eff32ff99518c9cb - bz465862--ib-fix-locking-order.patch

* rt: rwlock fix non nested unlocking
98184ed03651cbaa362a258b74238dbb26631290 - bz465862-rwlock-handle-bad-locking-practices.patch

* rwlock: update torture test for testing unnested locking
695c1c048aa18caebaab7b8645eadc69fbf2f633 - bz465862-rwlock-update-torture-test.patch

These patches was found in the mrg-rt.git tree, for the 2.6.24.7-93 branch.
Comment 9 errata-xmlrpc 2009-01-22 05:44:51 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0009.html

Note You need to log in before you can comment on or make changes to this bug.