Bug 713519

Summary: [RHEL5.7] ses: kobject_add failed for ArrayDevice03 with -EEXIST
Product: Red Hat Enterprise Linux 5 Reporter: Martin Wilck <martin.wilck>
Component: kernelAssignee: James Takahashi (IBM) <nobody+PNT0273897>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.7CC: coughlan, gasmith, lcm, ltroan, peterm, qcai, thenzl
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
On servers with SAS expanders/enclosures (e.g., Fujitsu PRIMERGY TX300S6), firmware may incorrectly report duplicate element names which results in the following warnings being logged: kobject_add failed for ArrayDevice03 with -EEXIST, don't try to register things with the same name in the same directory. Please see your firmware vendor for a fix for this issue. Otherwise, you may blacklist the 'ses' module to prevent it from loading.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-17 23:44:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 697486, 718609    
Attachments:
Description Flags
log file with some information about the enclosure. none

Description Martin Wilck 2011-06-15 16:22:37 UTC
Description of problem:

Multiple kernel messages when loading ses.ko:

kobject_add failed for ArrayDevice03 with -EEXIST, don't try to register
things with the same name in the same directory.

Call Trace:
 [<ffffffff80155274>] kobject_add+0x166/0x191
 [<ffffffff801cd822>] class_device_add+0xa6/0x422
 [<ffffffff80057a83>] kobject_get+0x12/0x17
 [<ffffffff88775727>] :enclosure:enclosure_component_register+0xa6/0xdb
 [<ffffffff88781a9b>] :ses:ses_intf_add+0x49f/0x680
 [<ffffffff801ce005>] class_interface_register+0x76/0xb5
 [<ffffffff8822100d>] :ses:ses_init+0xd/0x35
 [<ffffffff800a93a7>] sys_init_module+0xbd/0x206
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

It is highly likely (though not verified yet) that pulling a disk in the backplane will cause a kernel panic.

Version-Release number of selected component (if applicable):
2.6.18-266.el5

How reproducible:
always

Steps to Reproduce:
1. install on PRIMERGY TX300S6 with SAS expander (enclosure) backplane

Actual results:
see above

Expected results:
no error messages, no panic

Additional info:
See bug #619422 for the same problem on RHEL6. Under RHEL5.6 the problem didn't exist because ses support was lacking. A possible workaround is to disable the ses module.

Comment 1 Martin Wilck 2011-06-15 16:25:53 UTC
Created attachment 504903 [details]
log file with some information about the enclosure.

The interesting part is sg_ses -p 7 /dev/sg0:

Element descriptor In diagnostic page:
  generation code: 0x0
    Element type: Array device, subenclosure id: 0
    Overall descriptor: ArrayDevicesInSubEnclsr0
      Element 1 descriptor: ArrayDevice00
      Element 2 descriptor: ArrayDevice01
      Element 3 descriptor: ArrayDevice02
      Element 4 descriptor: ArrayDevice03
      Element 5 descriptor: ArrayDevice03
      Element 6 descriptor: ArrayDevice03
      Element 7 descriptor: ArrayDevice03
      Element 8 descriptor: ArrayDevice03
      Element 9 descriptor: ArrayDevice03
      Element 10 descriptor: ArrayDevice03
      Element 11 descriptor: ArrayDevice03
      Element 12 descriptor: ArrayDevice03

The firmware uses the same name for all Elements 4...12. This could be a FW bug, but the kernel should be able to handle it gracefully.

Comment 2 Tom Coughlan 2011-06-15 21:11:59 UTC
From:

https://bugzilla.redhat.com/show_bug.cgi?id=619422#c39

"It's the fact that ses.ko can't handle non-unique element names."

Considering the short time remaining, we may need to consider a way to disable ses by default in 5.7. Or remove it and ship it as a DUP, some other alternative...

Comment 3 James Takahashi (IBM) 2011-06-16 00:16:08 UTC
(In reply to comment #2)
> From:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=619422#c39
> 
> "It's the fact that ses.ko can't handle non-unique element names."

Could someone please enable IBM access to bz619422 so we can read the entire thread?  Thanks in advance.

Comment 4 Qian Cai 2011-06-16 02:38:35 UTC
Could this problem is smiliar to,
https://bugzilla.redhat.com/show_bug.cgi?id=703084

Comment 5 Martin Wilck 2011-06-16 09:28:25 UTC
The panic situation when a disk is pulled from the expander has *not* been reproduced on RHEL5.7.

Thus, as there is a workaround available (disable ses), and apparently no panic, we can accept this as a limitation.

Comment 6 Tom Coughlan 2011-06-16 19:17:16 UTC
Okay, so as I understand it, no code change for 5.7 is needed, but we will need a Technical Note to warn about the problem. Martin, would you be willing to draft that? 

James, if this is correct, then you can set the technical_note flag, and close this BZ.

Comment 7 James Takahashi (IBM) 2011-06-17 23:44:01 UTC
(In reply to comment #6)
> Okay, so as I understand it, no code change for 5.7 is needed, but we will need
> a Technical Note to warn about the problem. Martin, would you be willing to
> draft that? 
> 
> James, if this is correct, then you can set the technical_note flag, and close
> this BZ.

I added a tech note, and am closing per Tom's and peterm's recommendation.

Comment 8 James Takahashi (IBM) 2011-06-17 23:44:01 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
On Fujitsu PRIMERGY TX300S6 servers with SAS expander, firmware may incorrectly report duplicate element names which result in the following warnings being logged:
kobject_add failed for ArrayDevice03 with -EEXIST, don't try to register things with the same name in the same directory.
Please see your firmware vendor for a fix for this issue.  Otherwise, you may blacklist the 'ses' module to prevent it from loading.

Comment 9 Martin Wilck 2011-06-20 09:08:21 UTC
The problem is not limited to PRIMERGY TX300S6. It will occur on all PRIMERGYs with SAS expander. Actually, it will occur on all SAS expanders with a certain LSI firmware, there may be expanders from other vendors than Fujitsu. I'd appreciate a modified release note like this:

On some servers with SAS expanders (enclosures), firmware may incorrectly report duplicate element names which result in the following warnings being logged:
kobject_add failed for ArrayDevice03 with -EEXIST, don't try to register things with the same name in the same directory.
This happens e.g. on Fujitsu PRIMERGY TX300S6.
Please see your firmware vendor for a fix for this issue.  Otherwise, you may blacklist the 'ses' module to prevent it from loading.

Comment 10 James Takahashi (IBM) 2011-06-21 00:44:56 UTC
Tweaking tech note per Martin's comment #9.

Comment 11 James Takahashi (IBM) 2011-06-21 00:44:56 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,3 +1,3 @@
-On Fujitsu PRIMERGY TX300S6 servers with SAS expander, firmware may incorrectly report duplicate element names which result in the following warnings being logged:
+On servers with SAS expanders/enclosures (e.g., Fujitsu PRIMERGY TX300S6), firmware may incorrectly report duplicate element names which results in the following warnings being logged:
 kobject_add failed for ArrayDevice03 with -EEXIST, don't try to register things with the same name in the same directory.
 Please see your firmware vendor for a fix for this issue.  Otherwise, you may blacklist the 'ses' module to prevent it from loading.