Bug 236771 - [RHEL5 RT][OPENIB]Stopping openSM process on RT kernel gives a kernel backtrace
Summary: [RHEL5 RT][OPENIB]Stopping openSM process on RT kernel gives a kernel backtrace
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Doug Ledford
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-17 16:17 UTC by Gurhan Ozen
Modified: 2008-02-27 19:57 UTC (History)
2 users (show)

Fixed In Version: -35
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-02 14:52:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gurhan Ozen 2007-04-17 16:17:23 UTC
Description of problem:
 When shutting down opensm service on RT kernel , I get the following backtrace:

------------[ cut here ]------------
kernel BUG at kernel/rt.c:344!
invalid opcode: 0000 [1] PREEMPT SMP 
CPU 1 
Modules linked in: autofs4 hidp l2cap bluetooth nfs lockd nfs_acl sunrpc
iscsi_tcp ib_iser libiscsi scsi_transport_iscsi ib_ucm rdma_ucm ib_srp ib_sdp
rdma_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad
loop dm_multipath video sbs i2c_ec i2c_core dock button battery asus_acpi
backlight ac parport_pc lp parport sg pcspkr ib_ipath ata_generic ib_mthca
ib_mad ib_core shpchp bnx2 serio_raw ide_cd cdrom dm_snapshot dm_zero dm_mirror
dm_mod ata_piix libata megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd
uhci_hcd
Pid: 4198, comm: opensm Not tainted 2.6.20-19.el5rt #1
RIP: 0010:[<ffffffff810b8e0a>]  [<ffffffff810b8e0a>] rt_downgrade_write+0x4/0x8
RSP: 0000:ffff81005d999c18  EFLAGS: 00010282
RAX: ffff81007cece828 RBX: ffff810076c907f8 RCX: ffff810076c90828
RDX: ffff81007cece828 RSI: 0000000000000000 RDI: ffff81007cece780
RBP: ffff81005d999c18 R08: 0000000000000000 R09: 0000000000000001
R10: ffff81005d96f6c0 R11: 0000000000000000 R12: ffff810076c90800
R13: ffff810076c907f8 R14: 0000000000000000 R15: ffff81007cece6c0
FS:  0000000000000000(0000) GS:ffff81000d510540(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003c40946ae8 CR3: 0000000001001000 CR4: 00000000000006e0
Process opensm (pid: 4198, threadinfo ffff81005d998000, task ffff81005d996700)
Stack:  ffff81005d999c58 ffffffff883614e3 ffff8100786b5d30 0000000000000008
 ffff8100786b57a0 ffff81005d96f6c0 ffff8100786b57a0 ffff8100019b1180
 ffff81005d999c98 ffffffff81012db7 ffff81007805a378 ffff81005d96f6c0
Call Trace:
 [<ffffffff883614e3>] :ib_umad:ib_umad_close+0xb7/0x10f
 [<ffffffff81012db7>] __fput+0xdd/0x1af
 [<ffffffff8102f10c>] fput+0x17/0x19
 [<ffffffff81025d81>] filp_close+0x6c/0x77
 [<ffffffff8103b01e>] put_files_struct+0x6d/0xc1
 [<ffffffff81015dbb>] do_exit+0x27f/0x8c5
 [<ffffffff8104d3b7>] cpuset_exit+0x0/0x6e
 [<ffffffff8102d772>] get_signal_to_deliver+0x432/0x483
 [<ffffffff8105fa88>] do_notify_resume+0xc2/0x7d3
 [<ffffffff81062667>] ptregscall_common+0x67/0xb0
 [<ffffffff810622d6>] sysret_signal+0x21/0x31
 [<0000003c406c48c6>]

---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff81069e97>] .... __spin_trylock+0x16/0x71
.....[<ffffffff8106b0ea>] ..   ( <= oops_begin+0x28/0x77)


Code: 0f 0b eb fe 55 48 89 e5 53 48 8d 5f 08 48 83 ec 08 85 f6 89 
RIP  [<ffffffff810b8e0a>] rt_downgrade_write+0x4/0x8
 RSP <ffff81005d999c18>
 <1>Fixing recursive fault but reboot is needed!


Version-Release number of selected component (if applicable):
# uname -a
Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.20-19.el5rt #1 SMP PREEMPT Mon
Apr 16 12:14:21 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
Everytime.

Steps to Reproduce:
1. You'll need a system with IB hardware for this. Do service opensmd start ;
service opensmd stop. 
2. 
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Gurhan Ozen 2007-04-17 23:46:45 UTC
This behavior can be observed with ibping program as well. Just run ibping. 

Comment 2 Doug Ledford 2007-07-10 20:50:51 UTC
This was resolved with the OFED 1.2 final code and updated rt port patch used to
build the kernel-rt-2.6.21-32.ofed.3.el5rt kernel (this was a scratch build, but
the updated patches were submitted to Clark Williams to be included in his rt
kernel).

Comment 3 Clark Williams 2007-07-26 14:48:46 UTC
applied to -35; testing

Comment 4 Gurhan Ozen 2007-08-13 20:57:40 UTC
Verified with -35:

[root@dell-pe1950-02 ~]# service opensmd start
Starting IB Subnet Manager                                 [  OK  ]
[root@dell-pe1950-02 ~]# service opensmd stop ; service opensmd start
Stopping IB Subnet Manager.......                          [  OK  ]
Starting IB Subnet Manager                                 [  OK  ]
[root@dell-pe1950-02 ~]# uname -a
Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.21-35.el5rt #1 SMP PREEMPT RT
Thu Jul 26 11:59:02 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux


Note You need to log in before you can comment on or make changes to this bug.