Bug 1394089

Summary:

[LLNL 7.4 Bug] 7.3 regression: the kernel does not create the /sys/block/<sd device>/devices/enclosure_device symlinks

Product:

Red Hat Enterprise Linux 7

Reporter:

Ben Woodard <woodard>

Component:

kernel

Assignee:

Maurizio Lombardi <mlombard>

kernel sub component:

Storage Drivers

QA Contact:

guazhang <guazhang>

Status:

CLOSED ERRATA

Docs Contact:

Jana Heves <jsvarova>

Severity:

urgent

Priority:

urgent

CC:

akkornel, bdonahue, behlendorf1, bubrown, bugproxy, christopher.voltz, dhoward, emilne, foraker1, guazhang, hannsj_uhl, hutter2, jeff.johnson, jjarvis, jkachuck, joseph.szczypek, jsvarova, karen.skweres, linda.knippers, mkolaja, mlombard, myamazak, salmy, tgummels, tom.vaden, trinh.dao, troy.ablan, woodard, yizhang

Version:

7.3

Keywords:

Patch, Regression, ZStream

Target Milestone:

Target Release:

7.4

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-3.10.0-650.el7

Doc Type:

If docs needed, set a value

Doc Text:

Due to a regression, the kernel previously failed to create the /sys/block/<sd device>/devices/enclosure_device symlinks. The provided patch corrects the call to the scsi_is_sas_rphy() function, which is now made on the SAS end device, instead of the SCSI device.

Story Points:

---

Clone Of:

Clones:

1425678 1427426 1427815 1460204 (view as bug list)

Environment:

Last Closed:

2017-08-02 04:28:27 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1299988, 1354610, 1381646, 1425678, 1427426, 1427815, 1446211, 1455358, 1460204, 1473286

Attachments:

Description	Flags
boot log	none
debug printks	none
full dmesg log with debug printks	none
Consolidated patches from comments 15, 37, and 44	none
dmesg output of kernel-3.10.0-620.el7_bz1394089_v2.x86_64 on DL360+D3700	none
dmesg output of kernel-3.10.0-620.el7_bz1394089_v2.x86_64 on 4520+D6020	none
symlinks debug patch	none
dmesg from newest patch	none
dmesg from newest patch - one bad drive example	none
sosreport	none

Description Ben Woodard 2016-11-11 01:16:13 UTC

Description of problem:
The 7.3 kernel does not create the does not create the /sys/block/<sd device>/devices/enclosure_device symlinks the way that the 7.2 kernel does.

Bisecting the problem we found that problem are the patches: 

6cf8131 [scsi] sas: remove is_sas_attached()
6de748f [scsi] ses: use scsi_is_sas_rphy instead of is_sas_attached
6721acd [scsi] sas: provide stub implementation for scsi_is_sas_rphy

Which were added to fix bz#1370231 https://bugzilla.redhat.com/show_bug.cgi?id=1370231

This problem prevents some scripts that that we use to manage the storage that make use of the ses tools which make use of this sysfs interface.

Comment 2 Ben Woodard 2016-11-11 01:19:39 UTC

Additional information pending.

Comment 3 Ben Woodard 2016-11-11 02:02:06 UTC

7.2 kernel:
$ ls -l /sys/block/sda/device/enclosure*                
lrwxrwxrwx 1 root root 0 Nov 10 17:28 /sys/block/sda/device/enclosure_device:SLOT 21 27   -> ../../../../port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/enclosure/0:0:1:0/SLOT 21 27  
 
7.3 kernel:
$ ls -l /sys/block/sda/device/enclosure* 
ls: cannot access /sys/block/sda/device/enclosure*: No such file or directory

Not that it is likely to matter but the affected hardware is of two types:
http://www.raidinc.com/products/jbod/2u-24-bay-ssdsas-ability-ebod
http://www.raidinc.com/products/jbod/4u-84-bay-jbod

Interestingly, on both the old and the new kernels, the /sys/class/enclosure/* entries that enclosure_device would point to are present. The symlink just isn't there on the newer kernel. Right now we have custom code which relies on this symlink in sysfs to map the block device to the enclosure slot. If there's a different officially supported way to do this mapping we're open to changing to use that in our current code. However, at the moment we perceive this an interface change and therefore a regression.

Comment 5 Ewan D. Milne 2016-12-19 21:36:12 UTC

See bug 1370231, the patches in question were added to fix an install failure with certain hardware.  It appears that your problem is that with these
patches, scsi_is_sas_rphy() is returning false where the previous code which
used is_sas_attached() would return true.

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index fe8e241..5c721ac 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -587,7 +587,7 @@ static void ses_match_to_enclosure(struct enclosure_device *edev,
 
        ses_enclosure_data_process(edev, to_scsi_device(edev->edev.parent), 0);
 
-       if (is_sas_attached(sdev))
+       if (scsi_is_sas_rphy(&sdev->sdev_gendev))
                efd.addr = sas_get_address(sdev);
 
        if (efd.addr) {

As a result efd.addr remains 0 and we consequently do not send the udev change
event, I think this is how the symlinks get generated.

Are you able to provide boot log information and dump of the sysfs hierarchy,
or attach an sosreport?  This will help understand your configuration.

Comment 6 Ben Woodard 2016-12-19 22:40:03 UTC

Created attachment 1233604 [details]
boot log

Due to maze of symlinks that is sysfs a find isn't practical, but I can tell you exactly which entry is missing. The "enclosure*" link under /sys/block/sda/device/ has disappeared. As I said earlier:

I can give you the boot log but I believe that I have provided enough information to recognize the regression where we changed the sysfs interface to the devices: 

7.2 kernel:
$ ls -l /sys/block/sda/device/enclosure*                
lrwxrwxrwx 1 root root 0 Nov 10 17:28 /sys/block/sda/device/enclosure_device:SLOT 21 27   -> ../../../../port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/enclosure/0:0:1:0/SLOT 21 27  
 
7.3 kernel:
$ ls -l /sys/block/sda/device/enclosure* 
ls: cannot access /sys/block/sda/device/enclosure*: No such file or directory

What is more clear than that? 

Yes we also identified that same patch and we reverted it locally so that this regression wouldn't impact us. We did not have a problem with the booting problem that necessitated that patch.

Attached to the boot log. I don't know what additional information you will be able to glean from it. Rather than trying to dig through a huge highly repetitive boot log, isn't there a more targeted question you can ask?
You know the patch that caused the regression.
You know the symlink that is missing.
You know the hardware.

It seems to me that what needs to happen is you need to figure out a new way to fix the original boot problem that doesn't change the sysfs interface for not impacted HW.

Comment 7 Ewan D. Milne 2016-12-20 14:07:12 UTC

Thank you for providing the boot log.  It is helpful because it provides most of
the information we need (SCSI device mappings, SAS addresses, ses enclosure probe)
all at once so we don't have to keep asking.

>I can give you the boot log but I believe that I have provided enough information
>to recognize the regression where we changed the sysfs interface to the devices: 

>What is more clear than that? 

The problem is clear, you have identified the patch that caused the regression.

>You know the patch that caused the regression.
>You know the symlink that is missing.
>You know the hardware.

Yes, but we can't revert the patch, there are a large number of systems that
won't boot without it.  As you said, we have to figure out why this didn't work
on your hardware.  Unfortunately, we do not have your hardware, and we did not
see this problem on hardware we used for testing here.  This is why we asked
for more information.

What I was curious about with the sysfs information was whether all the sysfs
symlinks were missing, or just some of them.  You mentioned sda only but I
assume it is all of them.

>7.2 kernel:
>$ ls -l /sys/block/sda/device/enclosure*                
>lrwxrwxrwx 1 root root 0 Nov 10 17:28 /sys/block/sda/device/enclosure_device:SLOT 21 27   -> ../../../../port-0:0:1/end_device-0:0:1/target0:0:1/0:0:1:0/enclosure/0:0:1:0/SLOT 21 27  

It would also help to understand the topology.  In the boot log provided 0:0:1:0
is sdb.

What we are trying to understand is why the upstream fix to use scsi_is_sas_rphy()
did not work in your configuration, it is presumably broken in those kernels
as well and we will have to fix it there too.

Comment 8 Ben Woodard 2016-12-20 19:01:59 UTC

We have found that we can't revert the patch as safely as we thought. We have found certain instances we can trigger a crash by unexpectedly pulling drives without this patch. We think that it is correlated with a management script accessing sysfs when the hotplug event occurs. Without the patch 1372041 the system crashes, but with the patch we have yet to be able to provoke the problem. Therefore we agree that the patch is required but the side effect of removing the symlinks is unwanted.

All the sysfs symlinks which map the devices back to their enclosures are missing. We are surprised that there is some hardware which creates the symlinks because we don't see them on any of our hardware. It sounds like you are saying that for all the hardware we have in the RH test lab, the symlinks are being created it is just that it is broken on all of our hardware. We do not have entirely homogeneous hardware and so we find this surprising.

Comment 9 Ewan D. Milne 2016-12-20 19:20:02 UTC

I have found an external SAS JBOD enclosure that when accessed shows the same
behavior.  I have verified that it is broken in the upstream 4.10 scsi-fixes
tree as well and am working on a solution.

I am not sure why this was not discovered with the RHEL QE testing, perhaps
it works for simple enclosures where it is not necessary to iterate.  In any
event it is clearly broken and is a regression in 7.3, we just need to fix it
in a way that does not cause a crash like 1370231 or like you are seeing.

Comment 11 Ben Woodard 2016-12-20 22:52:15 UTC

We have three different HW configurations that we have confirmed this problem on so far: 
http://www.raidinc.com/products/jbod/2u-24-bay-ssdsas-ability-ebod
http://www.raidinc.com/products/jbod/4u-84-bay-jbod
LSI CORP SAS2X36 0717

Comment 12 Ewan D. Milne 2016-12-21 00:50:42 UTC

The problem appears to be related to the SAS topology.  With the JBOD I am
using, it looks like:

/sys/devices/pci0000:00/0000:00:07.0/0000:1a:00.0/host5/port-5:0/expander-5:0/port-5:0:0/end_device-5:0:0/target5:0:0/5:0:0:0/block/sdb

And the SAS transport code called by the changes to the enclosure code only
recognizes the end device and the expander as having a PHY.

Dec 20 19:23:48 storageqe-07 kernel: sd 5:0:0:0: ses_match_to_enclosure
Dec 20 19:23:48 storageqe-07 kernel: is_sas_attached 1 scsi_is_sas_rphy 0
Dec 20 19:23:48 storageqe-07 kernel: dev_name "5:0:0:0"
Dec 20 19:23:48 storageqe-07 kernel: parent dev_name "target5:0:0"
Dec 20 19:23:48 storageqe-07 kernel: parent scsi_is_sas_rphy 0
Dec 20 19:23:48 storageqe-07 kernel: parent dev_name "end_device-5:0:0"
Dec 20 19:23:48 storageqe-07 kernel: parent scsi_is_sas_rphy 1
Dec 20 19:23:48 storageqe-07 kernel: parent dev_name "port-5:0:0"
Dec 20 19:23:48 storageqe-07 kernel: parent scsi_is_sas_rphy 0
Dec 20 19:23:48 storageqe-07 kernel: parent dev_name "expander-5:0"
Dec 20 19:23:48 storageqe-07 kernel: parent scsi_is_sas_rphy 1
Dec 20 19:23:48 storageqe-07 kernel: parent dev_name "port-5:0"
Dec 20 19:23:48 storageqe-07 kernel: parent scsi_is_sas_rphy 0
Dec 20 19:23:48 storageqe-07 kernel: parent dev_name "host5"
Dec 20 19:23:48 storageqe-07 kernel: parent scsi_is_sas_rphy 0
Dec 20 19:23:48 storageqe-07 kernel: parent dev_name "0000:1a:00.0"
Dec 20 19:23:48 storageqe-07 kernel: parent scsi_is_sas_rphy 0
Dec 20 19:23:48 storageqe-07 kernel: parent dev_name "0000:00:07.0"
Dec 20 19:23:48 storageqe-07 kernel: parent scsi_is_sas_rphy 0
Dec 20 19:23:48 storageqe-07 kernel: parent dev_name "pci0000:00"
Dec 20 19:23:48 storageqe-07 kernel: parent scsi_is_sas_rphy 0
Dec 20 19:23:48 storageqe-07 kernel: efd.addr 0
Dec 20 19:23:48 storageqe-07 kernel: sas_get_address 5764824128737142441   (this is decimal)

I doubt this is what Johannes intended.  Will discuss with him.  Have also
asked Maurizio to assist with this as he has been working on the enclosure
code.

Comment 13 Ben Woodard 2016-12-21 01:13:12 UTC

Per LLNL, they asked me to make this BZ public to facilitate collaboration while working toward resolution. I have reviewed the case and there is no information in the case which needs to be made private.

Comment 14 Tony Hutter 2016-12-21 01:31:50 UTC

I have a feeling that ses_match_to_enclosure() is getting passed the sysfs gendev device rather than the expander/endpoint device.  If you print dev_name(&sdev->sdev_gendev) in ses_match_to_enclosure(), you'll see the dev name in the form of "w:x:y:z", like how it gets set in scsi_sysfs_device_initialize:

scsi_sysfs_device_initialize()
...
        dev_set_name(&sdev->sdev_gendev, "%d:%d:%d:%d",
                     sdev->host->host_no, sdev->channel, sdev->id, sdev->lun);

        device_initialize(&sdev->sdev_dev);


...rather than the "expander-w:x:y:z" or "end_device-w:x:y:z" form from sas_expander_alloc() and sas_end_device_alloc()  (where dev->release is being set):

sas_expander_alloc()
...
        rdev->rphy.dev.release = sas_expander_release;
...
        dev_set_name(&rdev->rphy.dev, "expander-%d:%d",
                     shost->host_no, rdev->rphy.scsi_target_id);

sas_end_device_alloc()
...
        rdev->rphy.dev.release = sas_end_device_release;
...
              dev_set_name(&rdev->rphy.dev, "end_device-%d:%d",
                             shost->host_no, parent->port_identifier);

Comment 15 Ewan D. Milne 2016-12-22 21:16:41 UTC

FYI... This is what I am testing with now, it seems to fix the problem for me.
However I am unsure if it will work on all configurations.  I am still awaiting
a response from the author of the earlier upstream patch.

[PATCH RFC] ses: Fix SAS device detection in enclosure

The call to scsi_is_sas_rphy() needs to be made on the
SAS end_device, not on the SCSI device.

Signed-off-by: Ewan D. Milne <emilne>
---
 drivers/scsi/ses.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index 8c9a35c..50adabb 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -587,7 +587,7 @@ static void ses_match_to_enclosure(struct enclosure_device *edev,
 
        ses_enclosure_data_process(edev, to_scsi_device(edev->edev.parent), 0);
 
-       if (scsi_is_sas_rphy(&sdev->sdev_gendev))
+       if (scsi_is_sas_rphy(sdev->sdev_target->dev.parent))
                efd.addr = sas_get_address(sdev);
 
        if (efd.addr) {
-- 
1.8.3.1

With this change, I get the sysfs links created, e.g.:

# ls -l /sys/block/sdb/device/enclosure*
lrwxrwxrwx. 1 root root 0 Dec 22 16:03 /sys/block/sdb/device/enclosure_device:SLOT  1  -> ../../../../port-1:0:19/end_device-1:0:19/target1:0:19/1:0:19:0/enclosure/1:0:19:0/SLOT  1 

What I am not sure about yet is if this will break other topologies.

Comment 16 Troy Ablan 2017-01-03 23:19:00 UTC

Wanted to comment that the patch from comment #15 restores enclosure device symlinks on an LSI/Avago (mpt2sas) setup with SuperMicro enclosure backplanes using CentOS 7.3.

Thanks for the sleuthing and hard work!

Comment 17 Tony Hutter 2017-01-05 02:27:19 UTC

The comment #15 patch does create the enclosure_device links on our 84-bay RAID Inc. JBOD, but I hit a general protection fault after rmmod/insmodding ses.ko.  Here's the little I could capture:

2017-01-04 18:15:59 [ 3266.580855] ses 0:0:1:0: Attached Enclosure device
2017-01-04 18:15:59 [ 3266.587820] ses 0:0:4:0: Attached Enclosure device
2017-01-04 18:15:59 [ 3266.594001] ses 11:0:3:0: Attached Enclosure device
2017-01-04 18:15:59 [ 3266.600118] ses 11:0:5:0: Attached Enclosure device
2017-01-04 18:17:06 [ 3333.499627] general protection fault: 0000 [#1] SMP
2017-01-04 18:17:06 [ 3333.506745] Modules linked in: ses(E+) nfsv3 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support mlx5_ib intel_powerclamp coretemp intel_rapl iosf_mbi kvm ib_core irqbypass pcspkr mlx5_core sb_edac edac_core mei_me lpc_ich mei i2c_i801 ioatdma enclosure ipmi_devintf zfs(POE) zunicode(POE) zavl(POE) icp(POE) sg shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq binfmt_misc zcommon(POE) znvpair(POE) spl(OE) zlib_deflate nfsd nfs_acl ip_tables rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache dm_round_robin sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel scsi_transport_iscsi ghash_clmulni_intel 8021q garp stp llc mrp mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ttm aesni_intel libahci lrw gf128mul ixgbe mxm_wmi dm_multipath dca glue_helper drm mpt3sas ablk_helper ptp libata i2c_core cryptd raid_class pps_core scsi_transport_sas mdio fjes wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ses]
2017-01-04 18:17:06 [ 3333.619809] CPU: 15 PID: 150658 Comm: insmod Tainted: P           OE  ------------   3.10.0-514.0.0.1chaos.ch6.x86_64 #1
<reboot>

I've hit the GPF after one to two rmmod/insmodding cycles.  If I run without the patch, I'm able to rmmod/insmod ses.ko without issue.

Comment 18 Ewan D. Milne 2017-01-05 12:58:17 UTC

(In reply to Tony Hutter from comment #17)
> The comment #15 patch does create the enclosure_device links on our 84-bay
> RAID Inc. JBOD, but I hit a general protection fault after rmmod/insmodding
> ses.ko.  Here's the little I could capture:
> 
> ...
> 
> I've hit the GPF after one to two rmmod/insmodding cycles.  If I run without
> the patch, I'm able to rmmod/insmod ses.ko without issue.

OK, thanks, I will look into this some more on the machine I have.

Comment 19 Tony Hutter 2017-01-06 01:20:55 UTC

I'm able to reproduce the issue at will with:

rmmod ses  # remove stock ses modules
insmod drivers/scsi/ses.ko  # load ses.ko with comment #15 patch
rmmod ses
insmod drivers/scsi/ses.ko
< crash >

I managed to capture a full call trace below.  The root FS on the node is NFS mounted, so seeing the backtrace in there isn't unexpected if ses.ko is corrupting something.


general protection fault: 0000 [#1] SMP 
Modules linked in: ses(E+) dm_round_robin enclosure sd_mod crc_t10dif crct10dif_generic sg nfsv3 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp intel_rapl iosf_mbi ipmi_devintf iTCO_wdt iTCO_vendor_support kvm irqbypass mlx5_core pcspkr zfs(POE) zunicode(POE) zavl(POE) icp(POE) mei_me sb_edac edac_core mei ioatdma i2c_i801 lpc_ich shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq binfmt_misc zcommon(POE) znvpair(POE) spl(OE) zlib_deflate nfsd nfs_acl ip_tables rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache 8021q garp stp llc mrp scsi_transport_iscsi mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul syscopyarea crct10dif_common sysfillrect crc32_pclmul sysimgblt fb_sys_fops crc32c_intel ttm
 ghash_clmulni_intel ixgbe drm ahci aesni_intel dm_multipath lrw mxm_wmi libahci gf128mul dca mpt3sas glue_helper ptp ablk_helper i2c_core libata cryptd raid_class pps_core scsi_transport_sas mdio fjes wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ses]
CPU: 4 PID: 101008 Comm: awk Tainted: P           OE  ------------   3.10.0-514.0.0.1chaos.ch6.x86_64 #1
Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016}
task: ffff880fb8f28000 ti: ffff880fad024000 task.ti: ffff880fad024000
RIP: 0010:[<ffffffff811e5d8b>] 4
 [<ffffffff811e5d8b>] kmem_cache_alloc_trace+0xab/0x250
RSP: 0018:ffff880fad0279e0  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff882031c3c860 RCX: 0000000000009cc2L04
RDX: 0000000000009cc1 RSI: 00000000000003ad RDI: ffff880fad027fd8
RBP: ffff880fad027a18 R08: 0000000000019a60 R09: ffff88018fc07c00
R10: ffffffffa0411ac4 R11: ffff882035f40c00 R12: 7275736f6c636e65
R13: 00000000000000d0 R14: 0000000000000020 R15: ffff88018fc07c00A74
FS:  0000000000000000(0000) GS:ffff88103e700000(0000) knlGS:0000000000000000.94
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffff7ff39ad CR3: 0000000fb92ad000 CR4: 00000000001407e0d<4
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000?4
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff88018fc07c00 ffffffffa0411ac4 ffff882031c3c860 ffff882031c3c800
 ffff880fb740ac00 0000000000000000 ffff880fad027bf8 ffff880fad027a30
 ffffffffa0411ac4 ffff8820345eb000 ffff880fad027a88 ffffffffa03fb3c6
                                                                                                                 
Call Trace:d
 [<ffffffffa0411ac4>] ? nfs_alloc_seqid+0x24/0x60 [nfsv4]>
 [<ffffffffa0411ac4>] nfs_alloc_seqid+0x24/0x60 [nfsv4]
 [<ffffffffa03fb3c6>] nfs4_opendata_alloc+0xc6/0x4d0 [nfsv4]H
 [<ffffffffa03fe745>] nfs4_do_open+0x185/0x650 [nfsv4]
 [<ffffffffa03fed07>] nfs4_atomic_open+0xf7/0x110 [nfsv4]Q
 [<ffffffffa0413920>] nfs4_file_open+0x110/0x2b0 [nfsv4]N
 [<ffffffff81204bd7>] do_dentry_open+0x1a7/0x2e0
 [<ffffffff812b4a1c>] ? security_inode_permission+0x1c/0x30
 [<ffffffffa0413810>] ? nfs4_file_flush+0x90/0x90 [nfsv4]
 [<ffffffff81204daf>] vfs_open+0x5f/0xe0
 [<ffffffff81212678>] ? may_open+0x68/0x110
 [<ffffffff81215e2d>] do_last+0x1ed/0x12a0
 [<ffffffff81185fee>] ? __find_get_page+0x2e/0xc0
 [<ffffffff81217226>] path_openat+0x346/0x4d0A&5
 [<ffffffff811865fb>] ? unlock_page+0x2b/0x30i*5
 [<ffffffff8121900b>] do_filp_open+0x4b/0xb0\15
 [<ffffffff81226367>] ? __alloc_fd+0xa7/0x130275
 [<ffffffff81206113>] do_sys_open+0xf3/0x1f0L?5
 [<ffffffff816a89e5>] ? do_page_fault+0x35/0x90{D5
 [<ffffffff8120622e>] SyS_open+0x1e/0x20
 [<ffffffff816ad249>] system_call_fastpath+0x16/0x1b8k5
Code: 8b 50 08 83 68 1c 01 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 49 01 00 00 48 85 c0 0f 84 40 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 aa 49 63 
}L;"RIP  [<ffffffff811e5d8b>] kmem_cache_alloc_trace+0xab/0x250

Comment 20 Maurizio Lombardi 2017-01-06 09:28:12 UTC

Thanks Tony,

I am able to reproduce a crash when executing rmmod/modprobe with the rhel7 kernel (XFS as root filesystem) but the upstream kernel seems unaffected.

It looks like a memory corruption; there is a commit in the upstream kernel (not backported to rhel7) called "ses: Fix racy cleanup of /sys in remove_dev()" that fixes a race condition when unregistering a device, in some cases the kernel may use invalid memory pointers.

I guess it will fix this issue, I am going to test it.


[  193.138024] ses 5:0:39:0: Attached Enclosure device
[  660.600447] general protection fault: 0000 [#1] SMP 
[  660.605447] Modules linked in: ses(+) enclosure coretemp kvm_intel cdc_ether kvm mpt2sas usbnet mii irqbypass raid_class iTCO_wdt ipmi_ssif iTCO_vendor_support i2c_i801 scsi_transport_sas ipmi_devintf sg pcspkr ipmi_si lpc_ich ipmi_msghandler ioatdma shpchp i7core_edac dca edac_core acpi_cpufreq ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_generic drm pata_acpi ata_piix libata crc32c_intel megaraid_sas serio_raw i2c_core bnx2 fjes dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ses]
[  660.660427] CPU: 1 PID: 10528 Comm: systemd-udevd Not tainted 3.10.0_ses_fix+ #1
[  660.667816] Hardware name: IBM System x3550 M3 -[7944AC1]-/69Y4438, BIOS -[D6E162AUS-1.20]- 05/07/2014
[  660.677112] task: ffff880078586dd0 ti: ffff880177b48000 task.ti: ffff880177b48000
[  660.684585] RIP: 0010:[<ffffffff811d8f65>]  [<ffffffff811d8f65>] __kmalloc+0x95/0x240
[  660.692426] RSP: 0018:ffff880177b4bbf8  EFLAGS: 00010282
[  660.697735] RAX: 0000000000000000 RBX: 000000000000001e RCX: 000000000000e460
[  660.704861] RDX: 000000000000e45f RSI: 0000000000000000 RDI: 0000000000000003
[  660.711988] RBP: ffff880177b4bc28 R08: 0000000000019ae0 R09: ffffffff812babf1
[  660.719114] R10: ffff88017ac03c00 R11: 0000000000000004 R12: 00000000000000d0
[  660.726240] R13: 7974697275636573 R14: 000000000000001f R15: ffff88017ac03c00
[  660.733366] FS:  00007f631b1b78c0(0000) GS:ffff88017b040000(0000) knlGS:0000000000000000
[  660.741448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  660.747187] CR2: 00007f6318e1954c CR3: 000000017798e000 CR4: 00000000000007e0
[  660.754313] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  660.761440] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  660.768566] Stack:
[  660.770580]  ffffffff812babf1 000000000000001e ffff88007ec39aa0 ffff880177b4bcec
[  660.778044]  ffff88017261d320 00000000000000d0 ffff880177b4bcc8 ffffffff812babf1
[  660.785507]  ffff88017ac03b00 00000000811dbd15 0000000000000001 ffffffff81220add
[  660.792970] Call Trace:
[  660.795418]  [<ffffffff812babf1>] ? security_context_to_sid_core+0x61/0x260
[  660.802373]  [<ffffffff812babf1>] security_context_to_sid_core+0x61/0x260
[  660.809156]  [<ffffffff81220add>] ? __simple_xattr_set+0x4d/0x190
[  660.815243]  [<ffffffff81220b78>] ? __simple_xattr_set+0xe8/0x190
[  660.821330]  [<ffffffff811d8a73>] ? kfree+0x103/0x140
[  660.826378]  [<ffffffff812bc85c>] security_context_to_sid_force+0x1c/0x20
[  660.833159]  [<ffffffff812acba2>] selinux_inode_post_setxattr+0x72/0x120
[  660.839857]  [<ffffffff812a3b83>] security_inode_post_setxattr+0x33/0x50
[  660.846552]  [<ffffffff8121fbf0>] __vfs_setxattr_noperm+0x180/0x1b0
[  660.852813]  [<ffffffff8121fcd5>] vfs_setxattr+0xb5/0xc0
[  660.858122]  [<ffffffff8121fe0e>] setxattr+0x12e/0x1c0
[  660.863258]  [<ffffffff8120a8fd>] ? putname+0x3d/0x60
[  660.868305]  [<ffffffff8120baa2>] ? user_path_at_empty+0x72/0xc0
[  660.874308]  [<ffffffff811fc998>] ? __sb_start_write+0x58/0x110
[  660.880224]  [<ffffffff8168cc71>] ? __do_page_fault+0x171/0x450
[  660.886137]  [<ffffffff812201ef>] SyS_lsetxattr+0xaf/0xf0
[  660.891531]  [<ffffffff81691789>] system_call_fastpath+0x16/0x1b
[  660.897532] Code: d0 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 30 01 00 00 48 85 c0 0f 84 27 01 00 00 49 63 42 20 48 8d 4a 01 4d 8b 02 <49> 8b 5c 05 00 4c 89 e8 65 49 0f c7 08 0f 94 c0 84 c0 74 b8 49 
[  660.917662] RIP  [<ffffffff811d8f65>] __kmalloc+0x95/0x240
[  660.923162]  RSP <ffff880177b4bbf8>

Comment 21 Maurizio Lombardi 2017-01-06 14:49:20 UTC

Unfortunately the "ses: Fix racy cleanup of /sys in remove_dev()" patch doesn't fix the issue, I continue to look into it.

Comment 23 Tony Hutter 2017-01-13 02:06:36 UTC

I tested the comment #15 patch on our large filesystem and saw that it was not creating all the sysfs entries all of the time.  In fact, I was trying it out because we had noticed that an older version of the SES driver was also not creating all the links all the time.  In both the old driver and with the new patch, I'm seeing ~50% of the nodes create all their symlinks, while the others only created some of them.  For example, here's one of the nodes that didn't have all their links (below).  It's sorted by device mapper (dm-*) device number:

# for i in `ls /sys/block/ | grep dm-` ; do echo -n "$i enclosure: " && if ls /sys/block/$i/slaves/*/device/enclosure*/fault &> /dev/null ; then echo exists ; else echo missing ; fi ; done | sed 's/dm-//g' | sort -n 
0 enclosure: exists
1 enclosure: exists
2 enclosure: exists
3 enclosure: exists
4 enclosure: exists
5 enclosure: exists
6 enclosure: exists
7 enclosure: exists
8 enclosure: exists
9 enclosure: exists
10 enclosure: exists
11 enclosure: exists
12 enclosure: exists
13 enclosure: exists
14 enclosure: exists
15 enclosure: exists
16 enclosure: exists
17 enclosure: exists
18 enclosure: exists
19 enclosure: missing
20 enclosure: missing
...
77 enclosure: missing
78 enclosure: missing
79 enclosure: exists
80 enclosure: missing
81 enclosure: missing
...
157 enclosure: missing
158 enclosure: missing
159 enclosure: exists

This is with two 84 bay enclosures with 80 drives populated in each.  You can see it creates links for the first 18 drives, and also for the last drive in each enclosure (oddly enough).

Some debugging later, I saw that in the cases where it was failing, enclosure_add_device() was returning -2:

static int ses_enclosure_find_by_addr(struct enclosure_device *edev,
                                      void *data)
{
        struct efd *efd = data;
        int i;
        struct ses_component *scomp;
        int rc;

        if (!edev->component[0].scratch) {
                return 0;
        }

        for (i = 0; i < edev->components; i++) {
                scomp = edev->component[i].scratch;
                if (scomp->addr != efd->addr) {
                        continue;
                }

                rc = enclosure_add_device(edev, i, efd->dev);
                if (rc == 0) {
                        kobject_uevent(&efd->dev->kobj, KOBJ_CHANGE);
                }
                printk("debug: rc=%d\n", rc);
                return 1;
        }
        return 0;
}

Comment 25 Ewan D. Milne 2017-01-13 22:16:22 UTC

One question would be whether the problem with not all the enclosures creating
sysfs entries is a 7.3 regression.  When you said "with the old driver" (ses)
did you mean an earlier RHEL7 ses driver?

Are all the enclosures the same or are some of them different hardware.
There have been some problems in the past with probing certain enclosures.
What I mean is, is is just the sysfs links that are missing or is it the
enclosure devices themselves that do not show up.

This was what I was referring to earlier when I mentioned that I was unsure if
it work work on all configurations.  Are the missing enclosures always the same
ones?  Do they have a different connection topology?  You could see this in the
sysfs device path (i.e. not the link).

Comment 26 Maurizio Lombardi 2017-01-16 17:37:02 UTC

(In reply to Tony Hutter from comment #19)
> I'm able to reproduce the issue at will with:
> 
> rmmod ses  # remove stock ses modules
> insmod drivers/scsi/ses.ko  # load ses.ko with comment #15 patch
> rmmod ses
> insmod drivers/scsi/ses.ko
> < crash >

This is a bug in another layer of the kernel, Ewan's patch is triggering it.
I am going to open BZ to track it.

Comment 27 Maurizio Lombardi 2017-01-16 17:40:09 UTC

> This is a bug in another layer of the kernel, Ewan's patch is triggering it.
> I am going to open BZ to track it.

introduced in kernel version -464

Comment 28 Maurizio Lombardi 2017-01-17 10:18:46 UTC

Hi Tony,

please answer to comment 25, thanks.

Comment 29 Tony Hutter 2017-01-17 18:04:30 UTC

We were running the latest RH 7.3 kernel, but with these patches reverted so that we could get the enclosure_device sysfs links:

    Revert "[scsi] sas: provide stub implementation for scsi_is_sas_rphy" 
    This reverts commit 6721acd6a5ec8791e9afc3b3a3b62332fe038307.

    Revert "[scsi] ses: use scsi_is_sas_rphy instead of is_sas_attached"
    This reverts commit 6de748fe3354b4ef4bf0988af27ba5d72af1cc9c.

    Revert "[scsi] sas: remove is_sas_attached()"
    This reverts commit 6cf8131f01bdbce599928a10a0f9a929127677b1.
    

So here's the RH kernel commit we're running:

commit 4274bf5bb87f37e0899f20574f70114ce71d4285
Author: Paolo Abeni <pabeni>
Date:   Wed Oct 12 16:30:30 2016 +0200

    IB/ipoib: move back IB LL address into the hard header


And ses.c were running (aka the "old SES driver"):

commit b4265f8439389c7e77336fc9a0d443f58f10c6e5
Author: Maurizio Lombardi <mlombard>
Date:   Thu Mar 24 14:17:12 2016 -0400

    [scsi] ses: fix discovery of SATA devices in SAS enclosures



Our nodes are connected to two 84-bay enclosures in a multipath configuration, so the node sees "four" enclosures devices.  80 of the 84 slots are populated with drives.  The enclosures are the same hardware (84-bay RAID Inc JBODs).  

# ls -l /sys/class/enclosure/
0:0:0:0 -> ../../devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:0/end_device-0:0:0/target0:0:0/0:0:0:0/enclosure/0:0:0:0
0:0:81:0 -> ../../devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:1/expander-0:3/port-0:3:0/end_device-0:3:0/target0:0:81/0:0:81:0/enclosure/0:0:81:0
11:0:1:0 -> ../../devices/pci0000:00/0000:00:03.2/0000:06:00.0/host11/port-11:0/expander-11:0/port-11:0:2/end_device-11:0:2/target11:0:1/11:0:1:0/enclosure/11:0:1:0
11:0:81:0 -> ../../devices/pci0000:00/0000:00:03.2/0000:06:00.0/host11/port-11:1/expander-11:3/port-11:3:2/end_device-11:3:2/target11:0:81/11:0:81:0/enclosure/11:0:81:0


We do have cases where an enclosure doesn't show up, but that isn't the majority of the missing links.  In most cases the node can see all four enclosures, but isn't creating links for all drives.  For example, the node in Comment 23 saw all four enclosures but didn't create all the links. To give you a better idea, here's the number of drives with enclosure_device* present taken from a sampling of nodes (nodes should see 160 drives)

Node	drives with enclosure_device* sysfs link
----    ----------------------------------------
1	159
2	159
3	22
4	159
5	77
6	160
7	91
8	160
9	160
10	160
11	21
12	160
13	160
14	160
15	41
16	30
17	157
18	35
19	0
...

Comment 30 Tony Hutter 2017-01-18 01:04:49 UTC

I dunno if it's related, but I saw another case where we have two enclosures with two multipath links, but only see three /sys/class/enclosure devices:

# ls /sys/class/enclosure
0:0:0:0  0:0:81:0  11:0:81:0

# lsscsi -g | grep EBOD
[0:0:0:0]    enclosu RAIDINC  84BAY EBOD       0204  -          /dev/sg0 
[0:0:81:0]   enclosu RAIDINC  84BAY EBOD       0204  -          /dev/sg81
[11:0:1:0]   enclosu RAIDINC  84BAY EBOD       0204  -          /dev/sg163
[11:0:81:0]  enclosu RAIDINC  84BAY EBOD       0204  -          /dev/sg243

Comment 31 Tony Hutter 2017-01-18 01:10:31 UTC

(In reply to Tony Hutter from comment #30)
> I dunno if it's related, but I saw another case where we have two enclosures
> with two multipath links, but only see three /sys/class/enclosure devices:

Ignore Comment 30, the device was actually down:

 sg_ses --verbose --page=ed /dev/sg163
    inquiry cdb: 12 00 00 00 24 00 
  RAIDINC   84BAY EBOD        0204
    enclosure services device
    Receive diagnostic results cmd: 1c 01 07 ff ff 00 
receive diagnostic results:  Fixed format, current;  Sense key: Not Ready
 Additional sense: Enclosure services unavailable
Attempt to fetch Element Descriptor (SES) diagnostic page failed
    device no ready

Comment 32 Maurizio Lombardi 2017-01-18 13:24:02 UTC

(In reply to Tony Hutter from comment #29)
> We were running the latest RH 7.3 kernel, but with these patches reverted so
> that we could get the enclosure_device sysfs links:
> 
>     Revert "[scsi] sas: provide stub implementation for scsi_is_sas_rphy" 
>     This reverts commit 6721acd6a5ec8791e9afc3b3a3b62332fe038307.
> 
>     Revert "[scsi] ses: use scsi_is_sas_rphy instead of is_sas_attached"
>     This reverts commit 6de748fe3354b4ef4bf0988af27ba5d72af1cc9c.
> 
>     Revert "[scsi] sas: remove is_sas_attached()"
>     This reverts commit 6cf8131f01bdbce599928a10a0f9a929127677b1.
>     

thanks,

unfortunately this is not sufficient to determine whether or not it's a RHEL7.3 regression,
is it possible for you to test the RHEL7.2's kernel (version 3.10.0-327) on your machine?

Comment 34 Tony Hutter 2017-01-20 00:24:06 UTC

Unfortunately for us, installing an older kernel into our netboot image can't be done easily, since we have a bunch of packages in our image that are kernel version specific (like zfs, nvidia, hyperv, etc).  Also, the system is in high demand, so getting time on it is difficult.

Just to summarize where we're at:

1. RH7.3 ses.ko

No symlinks at all, but is stable (can rmmod/insmod).


2. RH7.3 ses.ko + comment #15 patch

symlinks created some of the time as described in comment 29, but can't rmmod without GPFs.


3. RH7.3 ses.ko + revert the three patches from comment #29

symlinks created some of the time, but get GPFs when removing/reinserting a number of disks (haven't tried rmmod/insmod yet)



I did more digging and think I could be on to something with the symlink issue.  I tested with RH7.3 ses.ko + comment #15 patch, and added some crude printks (patch attached) and noticed that the disks were still being detected while SES started to create the enclosure_device symlinks.  The symlinks started failing with -ENOENT (No such file or directory) on the first disk that hadn't been discovered yet.  For example, notice in the boot that 0:0:26:0 is the highest numbered disk we've detected, when we see SES's symlink creation start to fail at 0:0:27:0:

...
[   56.703415] sd 0:0:18:0: [sdr] Attached SCSI disk
[   56.703507] sd 0:0:12:0: [sdl] Attached SCSI disk
[   56.703770] sd 0:0:23:0: [sdw] Attached SCSI disk
[   56.709272] sd 0:0:17:0: [sdq] Attached SCSI disk
[   56.710065] sd 0:0:8:0: [sdh] Attached SCSI disk
[   56.711357] sd 0:0:10:0: [sdj] Attached SCSI disk
[   56.712004] sd 0:0:19:0: [sds] Attached SCSI disk
[   56.712912] sd 0:0:14:0: [sdn] Attached SCSI disk
[   56.718549] sd 0:0:16:0: [sdp] Attached SCSI disk
[   56.719839] sd 0:0:5:0: [sde] Attached SCSI disk
[   57.685196] sd 0:0:12:0: Attached scsi generic sg12 type 0
[   57.691961] sd 0:0:13:0: Attached scsi generic sg13 type 0
[   57.698555] sd 0:0:14:0: Attached scsi generic sg14 type 0
[   57.705129] sd 0:0:15:0: Attached scsi generic sg15 type 0
[   57.711686] sd 0:0:16:0: Attached scsi generic sg16 type 0
[   57.718216] sd 0:0:17:0: Attached scsi generic sg17 type 0
[   57.724780] sd 0:0:18:0: Attached scsi generic sg18 type 0
[   57.731304] sd 0:0:19:0: Attached scsi generic sg19 type 0
[   57.737858] sd 0:0:20:0: Attached scsi generic sg20 type 0
[   57.744369] sd 0:0:21:0: Attached scsi generic sg21 type 0
[   57.750884] sd 0:0:22:0: Attached scsi generic sg22 type 0
[   57.757418] sd 0:0:23:0: Attached scsi generic sg23 type 0
[   57.764694] sd 0:0:24:0: Attached scsi generic sg24 type 0
[   57.769491] sd 0:0:24:0: [sdx] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
[   57.769493] sd 0:0:24:0: [sdx] 4096-byte physical blocks
[   57.771676] sd 0:0:24:0: [sdx] Write Protect is off
[   57.771678] sd 0:0:24:0: [sdx] Mode Sense: db 00 10 08
[   57.772994] sd 0:0:24:0: [sdx] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   57.786098] sd 0:0:24:0: [sdx] Attached SCSI disk
[   57.815417] sd 0:0:25:0: Attached scsi generic sg25 type 0
[   57.822119] sd 0:0:25:0: [sdy] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
[   57.831434] sd 0:0:25:0: [sdy] 4096-byte physical blocks
[   57.831451] sd 0:0:26:0: [sdz] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
[   57.831452] sd 0:0:26:0: [sdz] 4096-byte physical blocks
[   57.833630] sd 0:0:26:0: [sdz] Write Protect is off
[   57.833631] sd 0:0:26:0: [sdz] Mode Sense: db 00 10 08
[   57.834931] sd 0:0:26:0: [sdz] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   57.847084] sd 0:0:26:0: [sdz] Attached SCSI disk
[   57.882587] sd 0:0:25:0: [sdy] Write Protect is off
[   57.888340] sd 0:0:25:0: [sdy] Mode Sense: db 00 10 08
[   57.895738] sd 0:0:25:0: [sdy] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   57.938246] device-mapper: multipath round-robin: version 1.1.0 loaded
[   57.938494] sd 0:0:25:0: [sdy] Attached SCSI disk
[   57.980800] ses_debug: 0:0:1:0, 5764824129064842522
[   57.986801] ses_debug: rc=0
[   58.058734] ses_debug: 0:0:2:0, 5764824129064880282
[   58.064743] ses_debug: rc=0
[   58.140634] ses_debug: 0:0:3:0, 5764824129064829498
[   58.146583] ses_debug: rc=0
[   58.221247] ses_debug: 0:0:4:0, 5764824129064882026
[   58.227163] ses_debug: rc=0
[   58.299347] ses_debug: 0:0:5:0, 5764824129064824566
[   58.299366] ses_debug: rc=0
[   58.375040] ses_debug: 0:0:6:0, 5764824129061792470
[   58.380975] ses_debug: rc=0
[   58.454510] ses_debug: 0:0:7:0, 5764824129049451830
[   58.460616] ses_debug: rc=0
[   58.536676] ses_debug: 0:0:8:0, 5764824129064879846
[   58.542600] ses_debug: rc=0
[   58.617497] ses_debug: 0:0:9:0, 5764824129064884190
[   58.623429] ses_debug: rc=0
[   58.699507] ses_debug: 0:0:10:0, 5764824129064262202
[   58.699532] ses_debug: rc=0
[   58.771484] ses_debug: 0:0:11:0, 5764824129064828746
[   58.777576] ses_debug: rc=0
[   58.850101] ses_debug: 0:0:12:0, 5764824129064878078
[   58.856309] ses_debug: rc=0
[   58.932471] ses_debug: 0:0:13:0, 5764824129064883994
[   58.938896] ses_debug: rc=0
[   59.013410] ses_debug: 0:0:14:0, 5764824129064884042
[   59.019619] ses_debug: rc=0
[   59.093899] ses_debug: 0:0:15:0, 5764824129064855742
[   59.100222] ses_debug: rc=0
[   59.174590] ses_debug: 0:0:16:0, 5764824129064844334
[   59.180636] ses_debug: rc=0
[   59.256396] ses_debug: 0:0:17:0, 5764824129064839942
[   59.262561] ses_debug: rc=0
[   59.340327] ses_debug: 0:0:18:0, 5764824129064883454
[   59.346288] ses_debug: rc=0
[   59.422159] ses_debug: 0:0:19:0, 5764824129064830418
[   59.428123] ses_debug: rc=0
[   59.504551] ses_debug: 0:0:20:0, 5764824129064841454
[   59.504562] ses_debug: rc=0
[   59.578479] ses_debug: 0:0:21:0, 5764824129064844882
[   59.584427] ses_debug: rc=0
[   59.660366] ses_debug: 0:0:22:0, 5764824129064852242
[   59.666303] ses_debug: rc=0
[   59.743882] ses_debug: 0:0:23:0, 5764824129064243850
[   59.749811] ses_debug: rc=0
[   59.826284] ses_debug: 0:0:24:0, 5764824129064876354
[   59.832183] ses_debug: rc=0
[   59.911356] ses_debug: 0:0:25:0, 5764824129064883990
[   59.911366] ses_debug: rc=0
[   59.984543] ses_debug: 0:0:26:0, 5764824129064841914
[   59.990460] ses_debug: rc=0
[   60.066524] ses_debug: 0:0:27:0, 5764824129064824278
[   60.072440] ses_debug: createlink1 -2
[   60.076869] ses_debug: cdev ffff88101ff8a318
[   60.081992] ses_debug: cdev devname SLOT 58 52  
[   60.087546] ses_debug: cdev device SLOT 58 52  
[   60.092955] ses_debug: cdev dev 0:0:27:0
[   60.097697] ses_debug: type 23
[   60.101474] ses_debug: number 57
[   60.105426] ses_debug: fault 0
[   60.109155] ses_debug: active 0
[   60.112989] ses_debug: status 0
[   60.116832] ses_debug: rc=-2
[   60.193019] ses_debug: 0:0:28:0, 5764824129064821270
[   60.198872] ses_debug: createlink1 -2
[   60.203347] ses_debug: cdev ffff88101ff8a5e8
[   60.208463] ses_debug: cdev devname SLOT 59 53  
[   60.213955] ses_debug: cdev device SLOT 59 53  
[   60.219335] ses_debug: cdev dev 0:0:28:0
[   60.224037] ses_debug: type 23
[   60.227765] ses_debug: number 58
[   60.231680] ses_debug: fault 0
[   60.235400] ses_debug: active 0
[   60.239208] ses_debug: status 0
[   60.243033] ses_debug: rc=-2
[   60.321518] ses_debug: 0:0:29:0, 5764824129064825026
[   60.327370] ses_debug: createlink1 -2
[   60.331750] ses_debug: cdev ffff88101ff8a8b8
[   60.336797] ses_debug: cdev devname SLOT 60 54  
[   60.342257] ses_debug: cdev device SLOT 60 54  
[   60.347608] ses_debug: cdev dev 0:0:29:0
...

Comment 35 Tony Hutter 2017-01-20 00:25:06 UTC

Created attachment 1242640 [details]
debug printks

Comment 36 Maurizio Lombardi 2017-01-20 13:53:38 UTC

(In reply to Tony Hutter from comment #34)
> Unfortunately for us, installing an older kernel into our netboot image
> can't be done easily, since we have a bunch of packages in our image that
> are kernel version specific (like zfs, nvidia, hyperv, etc).  Also, the
> system is in high demand, so getting time on it is difficult.
> 
> Just to summarize where we're at:
> 
> 1. RH7.3 ses.ko
> 
> No symlinks at all, but is stable (can rmmod/insmod).
> 
> 
> 2. RH7.3 ses.ko + comment #15 patch
> 
> symlinks created some of the time as described in comment 29, but can't
> rmmod without GPFs.
> 
> 
> 3. RH7.3 ses.ko + revert the three patches from comment #29
> 
> symlinks created some of the time, but get GPFs when removing/reinserting a
> number of disks (haven't tried rmmod/insmod yet)

We are working on fixing the GPF right now.

> 
> I did more digging and think I could be on to something with the symlink
> issue.  I tested with RH7.3 ses.ko + comment #15 patch, and added some crude
> printks (patch attached) and noticed that the disks were still being
> detected while SES started to create the enclosure_device symlinks.


Ah! This helps a lot, I understand the problem.

Comment 37 Maurizio Lombardi 2017-01-24 07:28:21 UTC

(In reply to Tony Hutter from comment #34)
> 1. RH7.3 ses.ko
> 
> No symlinks at all, but is stable (can rmmod/insmod).
> 
> 
> 2. RH7.3 ses.ko + comment #15 patch
> 
> symlinks created some of the time as described in comment 29, but can't
> rmmod without GPFs.
> 

The GPF bug is going to be fixed in RHEL7.4,
the following patch fixes it:

diff --git a/drivers/base/core.c b/drivers/base/core.c
index cb4115569ad8..16442c5e7157 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1341,6 +1341,7 @@ void device_del(struct device *dev)
 	kobject_del(&dev->kobj);
 	/* This free's the allocation done in device_add() */
 	kfree(dev->device_rh);
+	dev->device_rh = NULL;
 	put_device(parent);
 }

Comment 40 Maurizio Lombardi 2017-01-30 13:43:38 UTC

Tony,

about comment 34, can you post the entire dmesg please?

Comment 41 Tony Hutter 2017-01-30 18:08:35 UTC

Created attachment 1245968 [details]
full dmesg log with debug printks

Comment 42 Maurizio Lombardi 2017-02-01 14:53:05 UTC

Tony,

can you please try this patch? 
This should fix the sysfs links creation bug you described in comment 23 when using multipath.
You have to apply it on top of comment 15 and comment 37 patches

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 3d4ae2f..91c35d3 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -381,8 +381,10 @@ int enclosure_add_device(struct enclosure_device *edev, int component,
 
        cdev = &edev->component[component];
 
-       if (cdev->dev == dev)
+       if (cdev->dev == dev) {
+               enclosure_add_links(cdev);
                return -EEXIST;
+       }
 
        if (cdev->dev)
                enclosure_remove_links(cdev);

Comment 43 Maurizio Lombardi 2017-02-01 17:20:09 UTC

Tony,

wait please, I did a mistake in this patch, I will send you a new one soon

Comment 44 Maurizio Lombardi 2017-02-01 19:16:36 UTC

Tony,

I fixed the patch in comment 42,
please let me know if it solves the problem with the symlinks.

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 3d4ae2f..585a7ee 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -375,21 +375,33 @@ int enclosure_add_device(struct enclosure_device *edev, int component,
                         struct device *dev)
 {
        struct enclosure_component *cdev;
+       int r;
 
        if (!edev || component >= edev->components)
                return -EINVAL;
 
        cdev = &edev->component[component];
 
-       if (cdev->dev == dev)
+       if (cdev->dev == dev) {
+               if (!cdev->links_created) {
+                       r = enclosure_add_links(cdev);
+                       if (!r)
+                               cdev->links_created = 1;
+               }
                return -EEXIST;
+       }
 
        if (cdev->dev)
                enclosure_remove_links(cdev);
 
        put_device(cdev->dev);
        cdev->dev = get_device(dev);
-       return enclosure_add_links(cdev);
+       r = enclosure_add_links(cdev);
+       if (!r)
+               cdev->links_created = 1;
+       else
+               cdev->links_created = 0;
+       return r;
 }
 EXPORT_SYMBOL_GPL(enclosure_add_device);
 
diff --git a/include/linux/enclosure.h b/include/linux/enclosure.h
index a4cf57c..9826a81 100644
--- a/include/linux/enclosure.h
+++ b/include/linux/enclosure.h
@@ -102,6 +102,7 @@ struct enclosure_component {
        int active;
        int locate;
        int slot;
+       int links_created;
        enum enclosure_status status;
        int power_status;
 };

Comment 45 Tony Hutter 2017-02-01 19:42:33 UTC

Thanks, I'll give it a test

Comment 46 Tony Hutter 2017-02-02 01:53:16 UTC

Nice, that seems to fix it.  Great work Maurizio!

Comment 47 Maurizio Lombardi 2017-02-02 14:14:10 UTC

Thanks Tony,

This requires a fix in the upstream Linux kernel, I am going to prepare a patch and submit it.

Comment 48 Jeff Johnson 2017-02-22 17:33:48 UTC

I tested the patches in comments 15, 37 and 44 on 3.10.0-514.2.2 in a system with 120 multipath SAS disks and it works. 

# ls /sys/block/dm-8/slaves/sdei/device/enclosure_device\:DISK\ 18/
active  device  fault  locate  power  power_status  slot  status  type  uevent

Comment 49 Christopher Voltz 2017-02-23 13:16:14 UTC

Created attachment 1256925 [details]
Consolidated patches from comments 15, 37, and 44

Just to make it simpler for other testers and to make sure we are all testing the same thing, I consolidated the patches from comments 15, 37, and 44 into a single patch.

Comment 50 Christopher Voltz 2017-02-23 13:53:20 UTC

I tested the consolidated patch on a Hewlett Packard Enterprise Apollo 4520 with 46 multi-path SAS drives. The Apollo 4520 enclosure contains 2 server nodes, each with an H240 Smart HBA connected to 2 boxes of 23 drives each (such that each node can see both boxes and thus can access all 46 drives). With the patched kernel, all symlinks were created properly and no GPF was generated when the ses module was removed or inserted.

Comment 51 Maurizio Lombardi 2017-02-23 14:00:15 UTC

Thanks for testing it.

The patch in comment 15 has been accepted into the upstream Linux kernel and is ready to be backported to RHEL.

The patch in comment 37 has been merged into RHEL.

The patch in cooment 44 is still under review and not yet accepted by the upstream's enclosure driver maintainer.

Comment 59 Maurizio Lombardi 2017-03-22 09:01:01 UTC

Hi Tony,

I am trying to have the latest patch (comment 44) accepted by the upstream Linux kernel community, they however suggested an alternative patch which is a bit different from mine.

I prepared a test kernel, is it possible for you to test it?
This test kernel also prints a few debug messages, can you please send me the dmesg output?

Thanks.

http://people.redhat.com/~mlombard/.bz1394089/kernel-3.10.0-620.el7_bz1394089_v2.x86_64.rpm

Comment 60 Christopher Voltz 2017-03-22 12:56:26 UTC

Does the test kernel include the other patches (comments 15 and 37) as well? If so, I can try and test it.

Comment 61 Maurizio Lombardi 2017-03-22 12:59:50 UTC

yes, it includes all the necessary patches

Comment 62 Christopher Voltz 2017-03-23 19:49:55 UTC

I tested the new kernel with multipath enabled on:
 * HPE Apollo 4520 Gen 9 with H244br, H240, and H241 HBAs and a D6020 JBOD attached (2 + 46 + 69 disks)
 * HPE DL360 Gen 9 with H240ar and H241 HBAs and a D3700 JBOD attached (2 + 25 disks)

The patched kernel worked:
  * /sys/block/*/device/enclosure* symlinks created for all drives
  * /sys/class/enclosure symlinks created for all enclosures
  * ses could be removed and inserted repeatedly without error
  * drives could be removed and reinserted without error

Comment 63 Maurizio Lombardi 2017-03-24 08:25:16 UTC

Thanks Cristopher for testing it.
Can you also post the output of the dmesg command after booting with the test kernel?

Comment 64 Christopher Voltz 2017-03-24 12:24:00 UTC

Created attachment 1266047 [details]
dmesg output of kernel-3.10.0-620.el7_bz1394089_v2.x86_64 on DL360+D3700

Comment 65 Christopher Voltz 2017-03-24 12:25:40 UTC

Created attachment 1266048 [details]
dmesg output of kernel-3.10.0-620.el7_bz1394089_v2.x86_64 on 4520+D6020

Comment 66 Christopher Voltz 2017-03-24 12:27:52 UTC

As requested, I have attached the full dmesg logs for both test configurations.

Comment 67 Maurizio Lombardi 2017-03-30 16:02:10 UTC

Christopher,

thanks for testing it.
The dmesg however is not really useful, is it possible to test the kernel on the same machine used for comment 23?

Comment 68 Tony Hutter 2017-03-31 00:02:11 UTC

(In reply to Maurizio Lombardi from comment #59)
> Hi Tony,
> 
> I am trying to have the latest patch (comment 44) accepted by the upstream
> Linux kernel community, they however suggested an alternative patch which is
> a bit different from mine.
> 
> I prepared a test kernel, is it possible for you to test it?
> This test kernel also prints a few debug messages, can you please send me
> the dmesg output?
> 
> Thanks.
> 
> http://people.redhat.com/~mlombard/.bz1394089/kernel-3.10.0-620.
> el7_bz1394089_v2.x86_64.rpm

Maurizio, can you point me to a patch with the changes you want me to test?  For us the patch is a lot easier to test than a RPM.

Comment 69 Maurizio Lombardi 2017-04-03 11:23:04 UTC

Created attachment 1268352 [details]
symlinks debug patch

Hi Tony,

You can find the patch attached to this message.
Please test it, thanks!

Comment 70 Christopher Voltz 2017-04-03 12:17:09 UTC

(In reply to Maurizio Lombardi from comment #67)
> Christopher,
> 
> thanks for testing it.
> The dmesg however is not really useful, is it possible to test the kernel on
> the same machine used for comment 23?

Sorry but I don't have access to the machine used in comment 23 (different company and different geographical location).

Comment 71 Tony Hutter 2017-04-03 20:56:33 UTC

Thanks for the patch Maurizio.  I'll see if I can get some time on our machines to test it..

Comment 72 Maurizio Lombardi 2017-04-05 11:20:50 UTC

We are running out of time for RHEL7.4.
The patch in comment 15 is the most important one because it fixes a regression and I will proceed to merge it in RHEL ASAP.
comment 37 's patch has been merged in RHEL already.

The patch in comment 44 has not been approved by upstream yet so I will defer it, I will open a separate BZ to track it.

Comment 74 Tony Hutter 2017-04-10 22:38:40 UTC

Maurizio, I finally got around to testing the new patch (https://bugzilla.redhat.com/attachment.cgi?id=1268352&action=diff) and it worked fine.  Symlinks were all created correctly.

Comment 75 Maurizio Lombardi 2017-04-11 06:56:32 UTC

Tony,

do you have the dmesg output?

Comment 76 Tony Hutter 2017-04-11 16:26:18 UTC

Created attachment 1270891 [details]
dmesg from newest patch

Comment 77 Tony Hutter 2017-04-11 16:31:34 UTC

Created attachment 1270892 [details]
dmesg from newest patch - one bad drive example

Here's a (partial) dmesg output from an enclosure with a drive acting up (drive is physically present, but missing from /sys/class/block/*).  The bad drive is unrelated to your patch; it's probably a problem with the enclosure slot.  I just though it might be interesting output since it exercises some of your printks.

Comment 78 Rafael Aquini 2017-04-12 20:19:07 UTC

Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 80 Rafael Aquini 2017-04-13 20:06:42 UTC

Patch(es) available on kernel-3.10.0-650.el7

Comment 82 guazhang@redhat.com 2017-04-18 02:55:15 UTC

Hello
the bug have been fixed on kernel-3.10.0-650.el7, I will move it to verified.

3.10.0-555.el7.x86_64
[root@storageqe-07 ~]# modprobe ses
[root@storageqe-07 ~]#  ls -l /sys/block/sd*/device/en*
ls: cannot access /sys/block/sd*/device/en*: No such file or directory

3.10.0-650.el7.x86_64 
[root@storageqe-07 ~]# ls -l /sys/block/sd*/device/en*
lrwxrwxrwx. 1 root root 0 Apr 17 22:09 /sys/block/sdaa/device/enclosure_device:SLOT  7  -> ../../../../port-1:1:19/end_device-1:1:19/target1:0:39/1:0:39:0/enclosure/1:0:39:0/SLOT  7 
lrwxrwxrwx. 1 root root 0 Apr 17 22:09 /sys/block/sdab/device/enclosure_device:SLOT  8  -> ../../../../port-1:1:19/end_device-1:1:19/target1:0:39/1:0:39:0/enclosure/1:0:39:0/SLOT  8 
lrwxrwxrwx. 1 root root 0 Apr 17 22:09 /sys/block/sdac/device/enclosure_dev

Comment 83 John Jarvis 2017-05-25 22:14:55 UTC

Requesting a z-stream fix for this issue.  Customer impact:  (from https://bugzilla.redhat.com/show_bug.cgi?id=1455358#c3)

"Without this patch, the symlink in sysfs which binds a SAS device to an enclosure slot does not get created. This makes disk hotplug near impossible on large JBOD disk drawers."

Comment 84 John Jarvis 2017-05-25 22:15:21 UTC

*** Bug 1455358 has been marked as a duplicate of this bug. ***

Comment 85 IBM Bug Proxy 2017-05-26 04:50:40 UTC

------- Comment From cdeadmin.com 2017-05-26 00:41 EDT-------
cde00 (cdeadmin.com) added native attachment /tmp/AIXOS06838293/dmesg_newest_patch.txt on 2017-05-25 23:32:56
cde00 (cdeadmin.com) added native attachment /tmp/AIXOS06838293/r19-monitor3-dmesg on 2017-05-25 23:32:56
cde00 (cdeadmin.com) added native attachment /tmp/AIXOS06838293/r19-osd2-dmesg on 2017-05-25 23:32:56
cde00 (cdeadmin.com) added native attachment /tmp/AIXOS06838293/debug_printks.patch on 2017-05-25 23:32:56
cde00 (cdeadmin.com) added native attachment /tmp/AIXOS06838293/dmesg_newest_patch_one_drive_missing.txt on 2017-05-25 23:32:56
cde00 (cdeadmin.com) added native attachment /tmp/AIXOS06838293/console.jet21.gz on 2017-05-25 23:32:56
cde00 (cdeadmin.com) added native attachment /tmp/AIXOS06838293/0001-fix-symlinks.patch on 2017-05-25 23:32:56
cde00 (cdeadmin.com) added native attachment /tmp/AIXOS06838293/fix-enclosure.patch on 2017-05-25 23:32:56
cde00 (cdeadmin.com) added native attachment /tmp/AIXOS06838293/dmesg-with-debug.txt on 2017-05-25 23:32:56

Comment 86 IBM Bug Proxy 2017-05-26 04:51:05 UTC

Created attachment 1282478 [details]
sosreport

Comment 89 Steve Almy 2017-06-06 20:58:39 UTC

(In reply to John Jarvis from comment #83)
> Requesting a z-stream fix for this issue.  Customer impact:  (from
> https://bugzilla.redhat.com/show_bug.cgi?id=1455358#c3)
> 
> "Without this patch, the symlink in sysfs which binds a SAS device to an
> enclosure slot does not get created. This makes disk hotplug near impossible
> on large JBOD disk drawers."

Is it accurate to say that this issue is not fixed for customers running multipath, or did I put the pieces together incorrectly here?

Is this sufficient for LLNL, without multipath support?

Comment 90 Tony Hutter 2017-06-06 21:13:24 UTC

The patch is working for us at LLNL. The bug itself is unrelated to multipath.  I think the comment about making "hotplug near impossible" has to do with the fact that you need the sysfs links to tell you the mappings between disks and slot numbers.  Without the links, if say, disk /dev/sdac failed, you wouldn't know which slot number to yank the drive from.  The sysfs links provide that:

> $ readlink /sys/class/block/sdac/device/enclosure_device*
> ../../../../../../port-0:0:0/end_device-0:0:0/target0:0:0/0:0:0:0/enclosure/0:0:0:0/SLOT 60 54  
> $

Comment 92 IBM Bug Proxy 2017-06-08 12:20:49 UTC

------- Comment From dougmill.com 2017-06-08 08:06 EDT-------
FYI, I have been running a 3.10.0-663.el7.ppc64le kernel (7.4 Beta) and not seen any issues - all symlinks for locate LEDs are always present from the /sys/block/*/device/enclosure*/locate path.

Comment 94 Joseph Kachuck 2017-06-09 13:45:03 UTC

Hello,
This bug has been copied as 7.3 z-stream (EUS) bug #1460204

Thank You
Joe Kachuck

Comment 95 IBM Bug Proxy 2017-06-28 10:50:57 UTC

------- Comment From cdeadmin.com 2017-06-28 06:45 EDT-------
This CMVC defect is being cancelled by the CDE Bridge because the corresponding CQ Defect [SW388722] was transferred out of the bridge domain.
Here are the additional details:
New Subsystem = ppc_triage
New Release = unspecified
New Component = redhat_linux
New OwnerInfo = Chavez, Luciano (chavez.com)
To continue tracking this issue, please follow CQ defect [SW388722].

Comment 97 errata-xmlrpc 2017-08-02 04:28:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842