Red Hat Bugzilla – Bug 465862
Warning from rt_mutex code while testing infiniband
Last modified: 2009-01-22 05:44:51 EST
Description of problem: testing of the openib code generated the following traceback on -65 kernel: WARNING: at kernel/rtmutex.c:1852 rt_read_fastunlock() Pid: 16811, comm: ibv_rc_pingpong Not tainted 2.6.24.7-65.el5rt #1 Call Trace: [<ffffffff811357b2>] ? free_layer+0x37/0x3f [<ffffffff8105f31f>] rt_mutex_up_read+0x1a4/0x232 [<ffffffff8105fcbc>] rt_up_read+0x9/0xb [<ffffffff881922b1>] :ib_uverbs:put_uobj_read+0x15/0x21 [<ffffffff881922f7>] :ib_uverbs:put_pd_read+0xd/0xf [<ffffffff88194f8f>] :ib_uverbs:ib_uverbs_create_qp+0x39c/0x4cf [<ffffffff88191ae0>] ? :ib_uverbs:ib_uverbs_qp_event_handler+0x0/0x2d [<ffffffff88191843>] :ib_uverbs:ib_uverbs_write+0x96/0xb0 [<ffffffff810b00d5>] vfs_write+0xc7/0x170 [<ffffffff810b06e5>] sys_write+0x4a/0x76 [<ffffffff8100c35e>] traceret+0x0/0x5 Later testing on the MRG 1.0.3 errata kernel generated this traceback (from /var/log/messages): Oct 2 04:44:35 dhcp71-141 kernel: WARNING: at kernel/rtmutex.c:1896 rt_read_fastunlock() Oct 2 04:44:35 dhcp71-141 kernel: Pid: 4589, comm: ibv_rc_pingpong Not tainted 2.6.24.7-81.el5rt #1 Oct 2 04:44:35 dhcp71-141 kernel: Oct 2 04:44:35 dhcp71-141 kernel: Call Trace: Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff81135ca6>] ? free_layer+0x37/0x3f Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff8105f521>] rt_mutex_up_read+0x1d2/0x260 Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff8105fe55>] rt_up_read+0x9/0xb Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff883732b9>] :ib_uverbs:put_uobj_read+0x15/0x21 Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff883732ff>] :ib_uverbs:put_pd_read+0xd/0xf Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff88375fd2>] :ib_uverbs:ib_uverbs_create_qp+0x39c/0x4cf Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff88372ae0>] ? :ib_uverbs:ib_uverbs_qp_event_handler+0x0/0x2d Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff88372843>] :ib_uverbs:ib_uverbs_write+0x96/0xb0 Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff810b0479>] vfs_write+0xc7/0x170 Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff810b0a89>] sys_write+0x4a/0x76 Oct 2 04:44:35 dhcp71-141 kernel: [<ffffffff8100c37e>] traceret+0x0/0x5 Oct 2 04:44:35 dhcp71-141 kernel: Version-Release number of selected component (if applicable): kernel-rt-2.6.24.7-81.el5rt How reproducible: always Steps to Reproduce: 1. Install RHEL5.2 2. Install MRG RT kernel 3. Install openib package 4. reboot
Created attachment 319594 [details] patch to correct locking order of ib driver Patch from srostedt@redhat.com to address this issue: The ib driver releases the locks not in the reverse order that it takes them. The RW locks in RT is very sensitive to this. Hopefully the attached patch will fix the issue.
Patxh added to MRG kernel -85
Created attachment 319823 [details] patch to allow rwlocks to unlock out of order This patch fixes the quirk inside rwlocks that expected to unlock in the reverse order the locks were taken. Since other places in the kernel may do this, this is the better patch than the one already attached.
Created attachment 319824 [details] upadet rwlock torture test to include checking of unnested locks This patch updates the rwlock torture test to include testing locks being released in an order that is not nested.
Patches added to -85
Verified by code review. Tried to trigger the misbehaviour on mrg-13.lab.bos.redhat.com on 2.6.24.7-81 without success. The problem did indeed exist on dell-pe1950-02.rhts.bos.redhat.com, but that box was not available at the moment of testing. No errors found using 2.6.24.7-93 on mrg-13. Found these patches for this bugzilla: * ib: release locks in the proper order 2b39f5cb4d843c6d32e55e99eff32ff99518c9cb - bz465862--ib-fix-locking-order.patch * rt: rwlock fix non nested unlocking 98184ed03651cbaa362a258b74238dbb26631290 - bz465862-rwlock-handle-bad-locking-practices.patch * rwlock: update torture test for testing unnested locking 695c1c048aa18caebaab7b8645eadc69fbf2f633 - bz465862-rwlock-update-torture-test.patch These patches was found in the mrg-rt.git tree, for the 2.6.24.7-93 branch.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0009.html