Bug 236771 - [RHEL5 RT][OPENIB]Stopping openSM process on RT kernel gives a kernel backtrace
Summary: [RHEL5 RT][OPENIB]Stopping openSM process on RT kernel gives a kernel backtrace
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel   
(Show other bugs)
Version: 1.0
Hardware: All Linux
Target Milestone: ---
: ---
Assignee: Doug Ledford
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2007-04-17 16:17 UTC by Gurhan Ozen
Modified: 2008-02-27 19:57 UTC (History)
2 users (show)

Fixed In Version: -35
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-10-02 14:52:19 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Gurhan Ozen 2007-04-17 16:17:23 UTC
Description of problem:
 When shutting down opensm service on RT kernel , I get the following backtrace:

------------[ cut here ]------------
kernel BUG at kernel/rt.c:344!
invalid opcode: 0000 [1] PREEMPT SMP 
CPU 1 
Modules linked in: autofs4 hidp l2cap bluetooth nfs lockd nfs_acl sunrpc
iscsi_tcp ib_iser libiscsi scsi_transport_iscsi ib_ucm rdma_ucm ib_srp ib_sdp
rdma_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad
loop dm_multipath video sbs i2c_ec i2c_core dock button battery asus_acpi
backlight ac parport_pc lp parport sg pcspkr ib_ipath ata_generic ib_mthca
ib_mad ib_core shpchp bnx2 serio_raw ide_cd cdrom dm_snapshot dm_zero dm_mirror
dm_mod ata_piix libata megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd
Pid: 4198, comm: opensm Not tainted 2.6.20-19.el5rt #1
RIP: 0010:[<ffffffff810b8e0a>]  [<ffffffff810b8e0a>] rt_downgrade_write+0x4/0x8
RSP: 0000:ffff81005d999c18  EFLAGS: 00010282
RAX: ffff81007cece828 RBX: ffff810076c907f8 RCX: ffff810076c90828
RDX: ffff81007cece828 RSI: 0000000000000000 RDI: ffff81007cece780
RBP: ffff81005d999c18 R08: 0000000000000000 R09: 0000000000000001
R10: ffff81005d96f6c0 R11: 0000000000000000 R12: ffff810076c90800
R13: ffff810076c907f8 R14: 0000000000000000 R15: ffff81007cece6c0
FS:  0000000000000000(0000) GS:ffff81000d510540(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003c40946ae8 CR3: 0000000001001000 CR4: 00000000000006e0
Process opensm (pid: 4198, threadinfo ffff81005d998000, task ffff81005d996700)
Stack:  ffff81005d999c58 ffffffff883614e3 ffff8100786b5d30 0000000000000008
 ffff8100786b57a0 ffff81005d96f6c0 ffff8100786b57a0 ffff8100019b1180
 ffff81005d999c98 ffffffff81012db7 ffff81007805a378 ffff81005d96f6c0
Call Trace:
 [<ffffffff883614e3>] :ib_umad:ib_umad_close+0xb7/0x10f
 [<ffffffff81012db7>] __fput+0xdd/0x1af
 [<ffffffff8102f10c>] fput+0x17/0x19
 [<ffffffff81025d81>] filp_close+0x6c/0x77
 [<ffffffff8103b01e>] put_files_struct+0x6d/0xc1
 [<ffffffff81015dbb>] do_exit+0x27f/0x8c5
 [<ffffffff8104d3b7>] cpuset_exit+0x0/0x6e
 [<ffffffff8102d772>] get_signal_to_deliver+0x432/0x483
 [<ffffffff8105fa88>] do_notify_resume+0xc2/0x7d3
 [<ffffffff81062667>] ptregscall_common+0x67/0xb0
 [<ffffffff810622d6>] sysret_signal+0x21/0x31

| preempt count: 00000001 ]
| 1-level deep critical section nesting:
.. [<ffffffff81069e97>] .... __spin_trylock+0x16/0x71
.....[<ffffffff8106b0ea>] ..   ( <= oops_begin+0x28/0x77)

Code: 0f 0b eb fe 55 48 89 e5 53 48 8d 5f 08 48 83 ec 08 85 f6 89 
RIP  [<ffffffff810b8e0a>] rt_downgrade_write+0x4/0x8
 RSP <ffff81005d999c18>
 <1>Fixing recursive fault but reboot is needed!

Version-Release number of selected component (if applicable):
# uname -a
Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.20-19.el5rt #1 SMP PREEMPT Mon
Apr 16 12:14:21 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

Steps to Reproduce:
1. You'll need a system with IB hardware for this. Do service opensmd start ;
service opensmd stop. 
Actual results:

Expected results:

Additional info:

Comment 1 Gurhan Ozen 2007-04-17 23:46:45 UTC
This behavior can be observed with ibping program as well. Just run ibping. 

Comment 2 Doug Ledford 2007-07-10 20:50:51 UTC
This was resolved with the OFED 1.2 final code and updated rt port patch used to
build the kernel-rt-2.6.21-32.ofed.3.el5rt kernel (this was a scratch build, but
the updated patches were submitted to Clark Williams to be included in his rt

Comment 3 Clark Williams 2007-07-26 14:48:46 UTC
applied to -35; testing

Comment 4 Gurhan Ozen 2007-08-13 20:57:40 UTC
Verified with -35:

[root@dell-pe1950-02 ~]# service opensmd start
Starting IB Subnet Manager                                 [  OK  ]
[root@dell-pe1950-02 ~]# service opensmd stop ; service opensmd start
Stopping IB Subnet Manager.......                          [  OK  ]
Starting IB Subnet Manager                                 [  OK  ]
[root@dell-pe1950-02 ~]# uname -a
Linux dell-pe1950-02.rhts.boston.redhat.com 2.6.21-35.el5rt #1 SMP PREEMPT RT
Thu Jul 26 11:59:02 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

Note You need to log in before you can comment on or make changes to this bug.