624985 – multipathd fails to run "getuid_callout" if that is a script

Bug 624985 - multipathd fails to run "getuid_callout" if that is a script

Summary: multipathd fails to run "getuid_callout" if that is a script

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	device-mapper-multipath
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Ben Marzinski
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-08-18 10:07 UTC by Bernd Schubert
Modified:	2018-11-14 11:27 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-01-18 18:13:20 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Bernd Schubert 2010-08-18 10:07:56 UTC

multipathd fails to run the "getuid_callout", for example we have defined

    device {
        vendor "DDN"
        product "S2A"
        getuid_callout          "/opt/ddn/bin/ddn_multipath_alias %n"
        # Use the following directive to utilize DDN S2A getuid callout script
        # which uses a predefined lookup table to generate uid name based
        # on S2A LUN name.
        # getuid_callout          "/sbin/scsi_id -g -p 0x80 -s /block/%n"
        prio_callout            "/sbin/mpath_prio_alua %d"
        path_grouping_policy    group_by_prio
        # Review DDN Firmware Release notes and match polling_interval to
        # host timeout value
        polling_interval        70
        path_checker            tur
        failback                immediate
        no_path_retry           fail
    }

/opt/ddn/bin/ddn_multipath_alias will return the serial number or an alias of the device and is a shell script. However, multipathd fails to run it, in syslog we get messages like this

Aug 18 11:57:05 localhost multipathd: cannot get the the wwid for sdx
Aug 18 11:57:05 localhost multipathd: /opt/ddn/bin/ddn_multipath_alias exitted with 255

Now if I run it manually or if it is run by "multipath" (note the missing 'd'), it works fine

root@rhel5-nfs@phys-mds0:~# /opt/ddn/bin/ddn_multipath_alias sdx
ost_crashfs_5

I have straced it and looked through the code and noticed that multipathd goes into a chroot. It seems multipathd bind mounts all binaries and libs it need into that chroot. But that does not work for shell scripts. As our alias script got rather complex we might switch to python, but I don't expect multipathd to handle that any better. 
While it is certainly good for a daemon to go into a chroot, it does not really to be suitable for multipathd to do that.

Comment 1 Ben Marzinski 2010-08-18 14:19:27 UTC

Multipathd has its own namespace on a ramfs, and it copies any callouts into that ramfs, so that it can execute them even if it loses access to the underlying device.  However it isn't smart enough to copy the script interpreter for scripts. To make it do that, you need a dummy device configuration like this in the devices section of /etc/multipath.conf

  device {
    vendor "dummy"
    product "pull_in_bash"
    prio_callout "/bin/bash"
  }


You can replace bash with whatever script interpreter you need for multipathd to copy into its ramfs. Does this fix the problem for you?

Comment 2 Ben Marzinski 2010-08-18 14:21:06 UTC

The vendor and product fields don't matter, as long as they are something that no real device is going to use.

Comment 3 Bernd Schubert 2010-08-19 12:26:45 UTC

Thanks for your help Ben. So all the trouble only for the rare case that the root filesystem is located on a multipath device itself? Shouldn't that be an option then? The only use case I can see for the root filesystem on a multipath device, would still work better by using drbd...

Unfortunately, it did not help to simply add /bin/bash. I also tried to add several other binaries called our script, but it still failed. I didn't try to add '[' and ']', though. 

So we have the choice to 

a) Rewrite multipathd.  I have looked into the sources and won't be a fast solution.

b) Rewrite our script into C or C++. That then leaves the problem of other call outs done by that program. For example our script calls scsi_id itself again.

c) Use the bindings file, but see bug 624987.


Thanks,
Bernd

Comment 4 Jason Rappleye 2010-09-18 00:08:02 UTC

Hi Bernd,

We just ran into the same problem. It turns out that multipathd doesn't chroot - it just has it's own namespace for /sbin, /bin, and /tmp. One of our scripts is written in perl and works just fine, since the interpreter is in /usr/bin/perl. The other is in bash and execve fails with ENOENT due to the lack of /bin/bash in multipathd's private namespace.

The ticket is a little old and I imagine that you've already addressed this, but I suspect your idea about rewriting in python with work just fine. Assuming, of course, it doesn't reference any files - executable or otherwise - in /sbin, /bin, and /tmp that multipathd doesn't pull in by default. Luckily, /sbin/scsi_id is one of those.

> a) Rewrite multipathd.  I have looked into the sources and won't be a fast
> solution.

If you're really considering that, why not just switch to the upstream multipathd? The offending code has disappeared.

Jason

Comment 6 Chris Williams 2012-01-18 18:13:20 UTC

This BZ has been around for a while with no updates. Closing NOTABUG. If this is still an issue please open a case with Red Hat Support via the Customer Portal.

Note You need to log in before you can comment on or make changes to this bug.