Bug 596496 - [RHEL6] mpt2sas returns ENXIO for enclosure_identifier
[RHEL6] mpt2sas returns ENXIO for enclosure_identifier
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: udev (Show other bugs)
6.0
All Linux
medium Severity medium
: rc
: ---
Assigned To: Harald Hoyer
Karel Volný
: OtherQA
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-26 16:05 EDT by Issue Tracker
Modified: 2011-04-18 14:34 EDT (History)
8 users (show)

See Also:
Fixed In Version: udev-147-2.21.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-15 09:50:43 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
path_id patch to handle sas (5.79 KB, patch)
2010-07-08 08:46 EDT, Harald Hoyer
no flags Details | Diff

  None (edit)
Description Issue Tracker 2010-05-26 16:05:55 EDT
Escalated to Bugzilla from IssueTracker
Comment 1 Issue Tracker 2010-05-26 16:05:56 EDT
Event posted on 05-26-2010 04:02pm EDT by kbaxley

Identical to an issue reported by Fujitsu with RHEL6:

https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=885313

From: Jim Garlick <garlick@llnl.gov>
To: bwoodard@llnl.gov
Cc: behlendorf1@llnl.gov, morrone2@llnl.gov
Subject: Re: mpt2sas update
Date: 05/25/2010 06:29:04 PM


With the following patch in udev's path_id.c, I can get /dev/disk/by-path
populated (with "unknown" in the enclosure part of the path):

--- path_id.c.bak       2010-03-10 11:30:01.000000000 -0800
+++ path_id.c   2010-05-25 16:25:47.000000000 -0700
@@ -148,17 +148,13 @@
               return NULL;
       }
       enc = udev_device_get_sysattr_value(sasdev, "enclosure_identifier");
-       if (enc == NULL) {
-               parent = NULL;
-               goto out;
-       }
       bay = udev_device_get_sysattr_value(sasdev, "bay_identifier");
       if (bay == NULL) {
               parent = NULL;
               goto out;
       }

-       path_prepend(path, "sas-%s-%s", enc, bay);
+       path_prepend(path, "sas-%s-%s", enc ? enc : "unknown", bay);
out:
       udev_device_unref(sasdev);
       return parent;

The problem seems to be that udev gets ENXIO when it tries to read the
enclosure_identifier path from sys.  An example path is:

/sys/devices/pci0000:00/0000:00:0f.0/0000:07:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/sas_device//end_device-0:0:1/enclosure_identifier

The relevant code in the mpt2sas driver is in mpt2sas_transport.c:

static int
_transport_get_enclosure_identifier(struct sas_rphy *rphy, u64 *identifier)
{
       struct MPT2SAS_ADAPTER *ioc = rphy_to_ioc(rphy);
       struct _sas_node *sas_expander;
       unsigned long flags;

       spin_lock_irqsave(&ioc->sas_node_lock, flags);
       sas_expander = mpt2sas_scsih_expander_find_by_sas_address(ioc,
           rphy->identify.sas_address);
       spin_unlock_irqrestore(&ioc->sas_node_lock, flags);

       if (!sas_expander)
               return -ENXIO;

       *identifier = sas_expander->enclosure_logical_id;
       return 0;
}

So that needs fixing...

Jim
This event sent from IssueTracker by kbaxley  [LLNL (HPC)]
 issue 943543
Comment 2 Issue Tracker 2010-05-26 16:05:57 EDT
Event posted on 05-26-2010 04:04pm EDT by kbaxley

The fact that udev has this problem seems to point to a deeper problem in
the kernel. At first I thought that the bug might be addressed by moving
to the 5.xx. mpt2sas driver. see: IT#886383 & BZ#591971 but looking at the
code it simply looks like a simple search through all of the enclosure IDs
and I don't think that code is wrong. However, when looking around at the
patches for the mpt2sas driver posted to the LKML I happened to notice this
the following patch:
http://kerneltrap.org/mailarchive/linux-scsi/2010/3/12/6850653/thread the
latest 2.6.32-29 kernel doesn't have that patch:


  284          cdev->groups = enclosure_groups;
  285  
  286          err = device_register(cdev);
  287          if (err)
  288                  ERR_PTR(err);
  289  
  290          return ecomp;
  291  }


However, the more that I look at it the more that I become convinced that
the patch that I referenced is not going to help this problem.


This event sent from IssueTracker by kbaxley  [LLNL (HPC)]
 issue 943543
Comment 3 Issue Tracker 2010-05-26 16:06:21 EDT
Event posted on 05-26-2010 04:06pm EDT by kbaxley


150 hard drives behind mptsas fabric (10 ports, 2 per card, each with 15
drives).


This event sent from IssueTracker by kbaxley 
 issue 943543
Comment 4 Issue Tracker 2010-05-26 16:10:47 EDT
Event posted on 05-26-2010 04:10pm EDT by kbaxley

udev-147-2.15.el6.x86_64 is what we're testing with here.


This event sent from IssueTracker by kbaxley 
 issue 943543
Comment 5 RHEL Product and Program Management 2010-05-26 16:26:22 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 6 Issue Tracker 2010-06-02 11:01:46 EDT
Event posted on 2010-06-02 08:01 PDT by woodard

mpt2sas driver version 05.100.00.02 doesn't seem to help.

So can we get some assistance on the enclosure handling so that it
doesn't pass ENXIO up to udev.





This event sent from IssueTracker by woodard 
 issue 943543
Comment 7 Phil Knirsch 2010-06-04 10:59:27 EDT
Isn't this one a duplicate of #591133?

Thanks & regards, Phil
Comment 8 Ben Woodard 2010-06-04 11:20:09 EDT
They are supposed to be distinct.

This one is not supposed to be associated with udev. One is supposed to be associated with getting the kernel, in particular the mpt2sas driver, fixed so that it doesn't pass ENXIO up to udev which doesn't know how to handle it. One is supposed to be to teach udev how to handle the corner cases when some driver doesn't pass the enclosure information up properly. Belt & suspenders. Somehow, the cases got muddied. Maybe it is clearer if you go back to the ITs.
Comment 15 Harald Hoyer 2010-07-08 08:41:00 EDT
Btw, upstream has now a better sas patch for path_id.
They also rejected  enc ? enc : "unknown", because nobody should rely on a link, which disappears, if the kernel driver is fixed.
Comment 16 Harald Hoyer 2010-07-08 08:46:13 EDT
Created attachment 430334 [details]
path_id patch to handle sas
Comment 18 Harald Hoyer 2010-07-09 06:40:28 EDT
Here are test packages:
http://people.redhat.com/harald/downloads/udev/udev-147-2.20.0.sastest.1.el6/
Comment 20 Issue Tracker 2010-07-09 14:16:54 EDT
Event posted on 07-09-2010 02:16pm EDT by tgummels

+       end_dev_wwn = udev_device_get_sysattr_value(sas_dev,
"sas_address");
+
+       path_prepend(path, "sas-%s", end_dev_wwn);

Just having the wwn in the name is too little information for our
purposes.  We need the name to contain components that uniquely identify
an enclosure connection (channel) and a drive bay.  This is so we can
easily and consistently map the by-path names to drive locations on all
of our server nodes.

For example, see our latest patch submitted against issue 364360
(attached).  The naming convention it implements lets us create the
desired mapping using patterns like:

a1      pci-0000:83:00.0-sas-phy-3:0-0x[a-f0-9]{16}-0
a2      pci-0000:83:00.0-sas-phy-3:0-0x[a-f0-9]{16}-1
..
b1      pci-0000:83:00.0-sas-phy-3:4-0x[a-f0-9]{16}-0
b2      pci-0000:83:00.0-sas-phy-3:4-0x[a-f0-9]{16}-1

Here the phy-x:y component identifies a channel and the trailing number
identifies a drive bay within that channel.

Thanks,
Ned


This event sent from IssueTracker by tgummels 
 issue 943543
Comment 39 Karel Volný 2010-10-19 07:39:45 EDT
I have to say I'm pretty confused about the patches related to this bug ...

in recent udev-147-2.29.el6, there are the patches:

udev-147-sas.patch
udev-147-patch_id-sas.patch
udev-147-patch_id-sas2.patch
udev-147-patch_id-sas3.patch
udev-147-virtio-blk-patch_id.patch

after applying everything within the %prep phase, I get the following differences against the original sources:

--- path_id.c~  2009-08-08 15:42:05.000000000 +0200
+++ path_id.c   2010-10-19 13:01:09.425046791 +0200
@@ -121,7 +121,74 @@
 
 static struct udev_device *handle_scsi_sas(struct udev_device *parent, char **path)
 {
-       return NULL;
+       struct udev *udev  = udev_device_get_udev(parent);
+       struct udev_device *end_dev;
+       struct udev_device *sasdev;
+       struct udev_device *phydev;
+       struct udev_device *parent_dev;
+       struct udev_device *child_dev;
+       char syspath[UTIL_PATH_SIZE], *base;
+       char phy_path[UTIL_PATH_SIZE];
+       const char *end_name, *name, *enc, *bay;
+       const char *port_path;
+       int scsi_host;
+       int num_phys = 8;
+       int i;
+
+       parent_dev = parent;
+       while (1) {
+               child_dev = parent_dev;
+               parent_dev = udev_device_get_parent(child_dev);
+               if (parent_dev == NULL)
+                       return NULL;
+               name = udev_device_get_sysname(parent_dev);
+               if (strstr(name, "end_device-")) {
+                       end_dev = parent_dev;
+                       end_name = name;
+               } else if (sscanf(name, "host%d", &scsi_host)) {
+                       port_path = udev_device_get_syspath(child_dev);
+                       break;
+               }
+       }
+
+       base = strdup(udev_device_get_syspath(end_dev));
+       if (!base)
+               return NULL;
+       snprintf(syspath, sizeof(syspath), "%s/sas_device/%s", base, end_name);
+       free(base);
+
+       for (i = 0 ; i < num_phys ; i++) {
+               snprintf(phy_path, sizeof(phy_path), "%s/phy-%d:%d",
+                               port_path, scsi_host, i);
+               if (phydev = udev_device_new_from_syspath(udev, phy_path)) {
+                       udev_device_unref(phydev);
+                       break;
+               }
+       }
+       if (i == num_phys)
+               return NULL;
+
+       sasdev = udev_device_new_from_syspath(udev, syspath);
+       if (sasdev == NULL) {
+               fprintf(stderr, "unable to access '%s'\n", syspath);
+               return NULL;
+       }
+       enc = udev_device_get_sysattr_value(sasdev, "enclosure_identifier");
+        if (enc == NULL) {
+                parent = NULL;
+                goto out;
+        }
+
+       bay = udev_device_get_sysattr_value(sasdev, "bay_identifier");
+       if (bay == NULL) {
+               parent = NULL;
+               goto out;
+       }
+
+       path_prepend(path, "sas-%s-%s-%s", strrchr(phy_path, '/') + 1, enc, bay);
+out:
+       udev_device_unref(sasdev);
+       return parent;
 }
 
 static struct udev_device *handle_scsi_iscsi(struct udev_device *parent, char **path)
@@ -448,6 +515,9 @@
                } else if (strcmp(subsys, "xen") == 0) {
                        path_prepend(&path, "xen-%s", udev_device_get_sysname(parent));
                        parent = skip_subsystem(parent, "xen");
+               } else if (strcmp(subsys, "virtio") == 0) {
+                       path_prepend(&path, "virtio-pci-%s", udev_device_get_sysname(parent));
+                       parent = skip_subsystem(parent, "virtio");
                }
 
                parent = udev_device_get_parent(parent);


but looking at the comments, I see a few suspicious places:

1) comment #1:

it removes the code:

-       if (enc == NULL) {
-               parent = NULL;
-               goto out;
-       }

which stays present in the latest version

2) comment #16/attachment #430334 [details]:

this patch doesn't seem to be applied at all, none of its hunks is present in the actual code

3) comment #20:

no mention of "wwn" in the sources/patches

4) comment #22:

this seems to correspond to the changes that actually got applied, except for the part adding:

+               } else if (strcmp(subsys, "virtio") == 0) {
+                       path_prepend(&path, "virtio-pci-%s", udev_device_get_sysname(parent));
+                       parent = skip_subsystem(parent, "virtio");

however, if I get it right, this is NOT the patch applied to the testing package used in comment #25

also, guessing from comment #26, the current patch is not compatible with upstream, which is undesirable and may lead to regressions in the future

*****

in addition, the Changelog for udev does not mention fixing this bug

please clarify the points above - which code is actually supposed to be included?
Comment 40 Harald Hoyer 2010-10-19 08:29:40 EDT
(In reply to comment #39)
> but looking at the comments, I see a few suspicious places:
> 
> 1) comment #1:
> 
> it removes the code:
> 
> -       if (enc == NULL) {
> -               parent = NULL;
> -               goto out;
> -       }
> 
> which stays present in the latest version

comment #15

> 
> 2) comment #16/attachment #430334 [details]:
> 
> this patch doesn't seem to be applied at all, none of its hunks is present in
> the actual code

this was the upstream fix, which did not satisfy the customers needs.

> 
> 3) comment #20:
> 
> no mention of "wwn" in the sources/patches

this was the upstream fix, which did not satisfy the customers needs.

> 
> 4) comment #22:
> 
> this seems to correspond to the changes that actually got applied, except for
> the part adding:
> 
> +               } else if (strcmp(subsys, "virtio") == 0) {
> +                       path_prepend(&path, "virtio-pci-%s",
> udev_device_get_sysname(parent));
> +                       parent = skip_subsystem(parent, "virtio");
> 
> however, if I get it right, this is NOT the patch applied to the testing
> package used in comment #25

This was added for virtio devices (another bug)

> 
> also, guessing from comment #26, the current patch is not compatible with
> upstream, which is undesirable and may lead to regressions in the future

yes

> 
> *****
> 
> in addition, the Changelog for udev does not mention fixing this bug
> 
> please clarify the points above - which code is actually supposed to be
> included?

https://bugzilla.redhat.com/show_bug.cgi?id=537185 was the main path_id for sas bug, together with this one. I might have forgotten to mention this one.
Comment 41 Karel Volný 2010-10-19 11:25:15 EDT
okay, thanks for the information

so, in the end, the code looks like what is desired, so I can confirm with SanityOnly

(although, in my personal opinion, I do not like those obfuscating patches over patches, and the fact we diverge from upstream ...)
Comment 42 releng-rhel@redhat.com 2010-11-15 09:50:43 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.