Bug 999915 - RHEL 6.4 ioctl() hangs on multipath devices with no paths after volume unmap
RHEL 6.4 ioctl() hangs on multipath devices with no paths after volume unmap
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
6.4
All Linux
unspecified Severity high
: rc
: ---
Assigned To: Ben Marzinski
Lin Li
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-22 07:18 EDT by eliranz
Modified: 2016-08-11 15:23 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-11 15:23:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Comment (590.19 KB, text/plain)
2013-09-01 10:51 EDT, eliranz
no flags Details

  None (edit)
Description eliranz 2013-08-22 07:18:09 EDT
Description of problem:

### Research:

## Actions: I mapped -> rescanned -> unmapped -> rescanned 13 volumes to a host running RHEL 6.3 and a host running RHEL 6.4. On both platforms dead multipath devices remain after the unmap+rescan.


When executing natively implemented sg_inq on the device, on RHEL 6.4 the command hang until killing the process from other session:

## RHEL 6.4 - Executing sg_inq on the same device ##
root@royr-rhel64-x64:/ $ sg_inq /dev/mapper/mpathr
^C
root@royr-rhel64-x64:/ $

## RHEL 6.4 - strace on sg_inq ##
root@royr-rhel64-x64:/ $ strace sg_inq /dev/mapper/mpathr
execve("/usr/bin/sg_inq", ["sg_inq", "/dev/mapper/mpathr"], [/* 43 vars */]) = 0
brk(0)  = 0x2240000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730ce000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=82595, ...}) = 0
mmap(NULL, 82595, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f06730b9000
close(3)= 0
open("/usr/lib64/libsgutils2.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \223\240x3\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=159288, ...}) = 0
mmap(0x3378a00000, 2252096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3378a00000
mprotect(0x3378a22000, 2093056, PROT_NONE) = 0
mmap(0x3378c21000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x3378c21000
close(3)= 0
open("/lib64/libc.so.6", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355\341x3\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1922152, ...}) = 0
mmap(0x3378e00000, 3745960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3378e00000
mprotect(0x3378f8a000, 2093056, PROT_NONE) = 0
mmap(0x3379189000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x189000) = 0x3379189000
mmap(0x337918e000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x337918e000
close(3)= 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730b8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730b7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730b6000
arch_prctl(ARCH_SET_FS, 0x7f06730b7700) = 0
mprotect(0x3379189000, 16384, PROT_READ) = 0
mprotect(0x337881f000, 4096, PROT_READ) = 0
munmap(0x7f06730b9000, 82595)   = 0
brk(0)  = 0x2240000
brk(0x2261000)  = 0x2261000
open("/proc/devices", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730cd000
read(3, "Character devices:\n  1 mem\n  4 /"..., 1024) = 500
close(3)= 0
munmap(0x7f06730cd000, 4096)= 0
open("/dev/mapper/mpathr", O_RDONLY|O_NONBLOCK) = 3
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 6), ...}) = 0
ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[6]=[12, 00, 00, 00, 24, 00], mx_sb_len=32, iovec_count=0, dxfer_len=36, timeout=60000, flags=0

On RHEL 6.3 on the other hand, when running the exact same scenario an IOError is raised immediately:

## RHEL 6.3 - Executing sg_inq on the same device ##
root@royr-rhel63-x64:/ $ sg_inq /dev/mapper/mpathe
Both SCSI INQUIRY and fetching ATA information failed on /dev/mapper/mpathe
root@royr-rhel63-x64:/ $

## RHEL 6.3 - strace on sg_inq ##
execve("/usr/bin/sg_inq", ["sg_inq", "/dev/mapper/mpathe"], [/* 43 vars */]) = 0
brk(0)  = 0x20bd000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5f8000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=36222, ...}) = 0
mmap(NULL, 36222, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbbab5ef000
close(3)= 0
open("/usr/lib64/libsgutils2.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \223\340\2467\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=159288, ...}) = 0
mmap(0x37a6e00000, 2252096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x37a6e00000
mprotect(0x37a6e22000, 2093056, PROT_NONE) = 0
mmap(0x37a7021000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x37a7021000
close(3)= 0
open("/lib64/libc.so.6", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355!\2477\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1922152, ...}) = 0
mmap(0x37a7200000, 3745960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x37a7200000
mprotect(0x37a738a000, 2093056, PROT_NONE) = 0
mmap(0x37a7589000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x189000) = 0x37a7589000
mmap(0x37a758e000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x37a758e000
close(3)= 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5ee000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5ed000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5ec000
arch_prctl(ARCH_SET_FS, 0x7fbbab5ed700) = 0
mprotect(0x37a7589000, 16384, PROT_READ) = 0
mprotect(0x37a6c1f000, 4096, PROT_READ) = 0
munmap(0x7fbbab5ef000, 36222)   = 0
brk(0)  = 0x20bd000
brk(0x20de000)  = 0x20de000
open("/proc/devices", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5f7000
read(3, "Character devices:\n  1 mem\n  4 /"..., 1024) = 484
close(3)= 0
munmap(0x7fbbab5f7000, 4096)= 0
open("/dev/mapper/mpathe", O_RDONLY|O_NONBLOCK) = 3
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 6), ...}) = 0
ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[6]=[12, 00, 00, 00, 24, 00], mx_sb_len=32, iovec_count=0, dxfer_len=36, timeout=60000, flags=0}) = -1 EAGAIN (Resource temporarily unavailable)
ioctl(3, 0x30d, 0x7fffa25d2ff0) = -1 EAGAIN (Resource temporarily unavailable)
write(2, "Both SCSI INQUIRY and fetching A"..., 76) = 76
close(3)= 0
exit_group(99)  = ?


## Conclusions:
Dead unmapped volume devices in RHEL 6.4 don't respond with an error to ioctl(), rather hanging the process (both Python and native)


Version-Release number of selected component (if applicable):
RHEL 6.4 with device-mapper-multipath 0.4.9-64

How reproducible:
Consistence 

Steps to Reproduce:
#Map some volumes to the host
#Run rescan 
#Unmmap volumes from the host
#Run rescan 
#Run sg_inq

Actual results:
ioctl hangs

Expected results:
sg_inq should return an error/warning message
Comment 2 Ben Marzinski 2013-08-22 16:45:12 EDT
Can you give me the

# multipath -ll

output from both RHEL-6.3 and RHEL-6.4, as well what versions of the device-mapper-multipath and kernel packages you used in both (I see you're using device-mapper-multipath 0.4.9-64).

I'm not currently able to recreate your issue.
Comment 3 eliranz 2013-09-01 10:51:21 EDT
Created attachment 915760 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
Comment 4 eliranz 2013-09-29 02:54:32 EDT
Any updates???
Comment 5 Ben Marzinski 2013-10-01 17:56:34 EDT
This issue appears to have been introduced by a change in the 2.6.32-319.el6 kernel.  I'm looking into it now.
Comment 6 Ben Marzinski 2013-10-02 16:34:24 EDT
The multipath kernel code was intentionally changed to queue ioctls when queue_if_no_paths is set, instead of returning an error.  If you change your device configuration to not queue indefinitely, after queueing is disabled, sg_inq will return like it did in RHEL-6.3
Comment 7 eliranz 2013-10-10 10:24:53 EDT
Hi Ben,

Can you please explain what are the reasons to change the behavior of the driver to queue requests instead of return an immediate error when queue_if_no_paths is set?

This change is a real problem for our applications. Instead of clearly returning an error and displaying the situation to the user, the application will just get stuck. 

There was a long process where the default multipath settings were set for XIV arrays and this change breaks this process.
Comment 9 Mike Snitzer 2013-10-11 16:25:19 EDT
(In reply to eliranz from comment #7)
> Hi Ben,
> 
> Can you please explain what are the reasons to change the behavior of the
> driver to queue requests instead of return an immediate error when
> queue_if_no_paths is set?
> 
> This change is a real problem for our applications. Instead of clearly
> returning an error and displaying the situation to the user, the application
> will just get stuck. 
> 
> There was a long process where the default multipath settings were set for
> XIV arrays and this change breaks this process.

The upstream patch in question is here:
http://git.kernel.org/linus/7ba10aa6fba

It is a patch that just fell out of testing a different mpath change
proposed by David Jeffery.

The related dm-devel thread starts here (this is midpoint in response to
David's initial patch proposal):
http://www.redhat.com/archives/dm-devel/2012-September/msg00020.html

And ultimately I posted the patch in question to dm-devel here:
http://www.redhat.com/archives/dm-devel/2012-September/msg00205.html

So anyway, taking a step back. This bug is all about the ioctl
hanging now if queue_if_no_path is set.  I'm surprised RHEL6.3 didn't
respond that way before my change.

The patch makes it so that if queue_if_no_path is _not_ set the ioctl
will fail immediately.  If queue_if_no_path is set, it'll queue the
ioctl.  As is evidenced from header from commit 7ba10aa6fba
Comment 11 Mike Snitzer 2013-10-12 10:36:41 EDT
(In reply to Mike Snitzer from comment #9)
> 
> So anyway, taking a step back. This bug is all about the ioctl
> hanging now if queue_if_no_path is set.  I'm surprised RHEL6.3 didn't
> respond that way before my change.

After looking closer, I'm not surprised.  IBM's case doesn't have m->queue_io set whereas the scenario where an mpath device doesn't have any paths on mpath table load does have m->queue_io.

> The patch makes it so that if queue_if_no_path is _not_ set the ioctl
> will fail immediately.  If queue_if_no_path is set, it'll queue the
> ioctl.  As is evidenced from header from commit 7ba10aa6fba

My broader point is an mpath ioctl, which is destined to an underlying path, should queue_if_no_path just like the IO path does.  That is why this fix was sent upstream and tagged for inclusion in all upstream stable linux kernel trees.

IBM's application should _not_ be sending ioctl to an mpath device if it doesn't have any paths.  I can appreciate that the change in question creates problems for them because they never had to worry about the ioctl hanging.  But the previous queue_if_no_path inconsistency of the IO path queuing but the ioctl path _not_ queuing when there are no paths was never something an application should rely on.

One possibility for a workaround for IBM is to add a new RHEL6-only dm-mpath configuration option that allows them to set 'fail_ioctl_if_no_path'.  I'd _really_ rather avoid doing that but if IBM cannot see a way forward we can consider it.  That said, RHEL7 also queues the ioctl if no paths and queue_if_no_path is set so unless IBM changes their application to only issue ioctls when there are paths available it'll just fail for them in RHEL7.
Comment 12 eliranz 2013-10-13 10:49:40 EDT
The way we test that an mpath is valid (and has valid paths) is by sending a request. What best practice does redhat recommend for identifying a device has a valid path?
Comment 13 Mike Snitzer 2013-10-13 20:59:38 EDT
(In reply to eliranz from comment #12)
> The way we test that an mpath is valid (and has valid paths) is by sending a
> request. What best practice does redhat recommend for identifying a device
> has a valid path?

What do you mean by "sending a request"?  You configure queue_if_no_path so the request will never complete if there aren't any valid paths available.

You can evaluate the mpath device's state with 'multipath -ll'.  Alternatively, you can issue dmsetup commands to see if the mapth device has configured paths, e.g.: dmsetup table <mpath device name>

Or to get finer grained path status info via dmsetup, use: dmsetup info <mpath device name>

Paths in a path group that are active are denoted with 'A', paths that are failed with 'F'.
Comment 14 eliranz 2013-10-21 09:37:22 EDT
We can see that the dmsetup status <device> identifies whether there is a valid path. However, it takes a few seconds to discover that there is no path available, and in case we fall under this period and call our scsi inquiry we will remain stuck for ever (or until the path is back). This does reduce the chances of being stuck, but does not fully solve the problem.

We want to emphasize that this problem is not just with our application. You can see exactly the same behavior with sg_utils package provided by redhat.

From what we understand if a device lost all its paths forever, any inquiry will just remain stuck forever. Did we get it wrong? Are we missing something? Maybe the solution should be some timeout that will cause the inquiry to return with an error.
Comment 15 Mike Snitzer 2013-10-21 10:22:30 EDT
(In reply to eliranz from comment #14)
> We can see that the dmsetup status <device> identifies whether there is a
> valid path. However, it takes a few seconds to discover that there is no
> path available, and in case we fall under this period and call our scsi
> inquiry we will remain stuck for ever (or until the path is back). This does
> reduce the chances of being stuck, but does not fully solve the problem.
> 
> We want to emphasize that this problem is not just with our application. You
> can see exactly the same behavior with sg_utils package provided by redhat.

Sure, but the point is the user asked for IO to be queue_if_no_path.  Path recovery is required to get that queued IO to complete.

The fix in question just applies the same policy to ioctls too -- which should've been the case from the start.

> From what we understand if a device lost all its paths forever, any inquiry
> will just remain stuck forever. Did we get it wrong? Are we missing
> something? Maybe the solution should be some timeout that will cause the
> inquiry to return with an error.

We are exploring the possibility of a timeout for queue_if_no_path as part of another thread on dm-devel, the latest RFC patch for this is here (and still requires testing):
https://patchwork.kernel.org/patch/3070391/

NOTE: this patch focuses on timing out outstanding IO requests, not on ioctls.  If/when an IO request timeouts it'll disable m->queue_if_no_path, so the next retry of the ioctl will fail with -EIO.
Comment 16 Ben Marzinski 2013-10-21 12:30:03 EDT
If you set

no_path_retry <some_number>

Then you won't queue your ioctls forever.  After the last path fails, multipathd will retry for a set number of times, and then fail the ioctls along with the IO.

The only reason this wouldn't work is if you needed your IO to continue to be queued while your ioctls failed.

What would be more useful would probably be a stable library interface to get information from multipath about the devices.
Comment 17 eliranz 2013-10-28 09:40:16 EDT
We will investigate your recommendations.
Thanks a lot for your help.
Comment 19 Ben Marzinski 2014-03-26 16:36:11 EDT
Have you tried setting no_path_retry, to see if it resolved your issue?
Comment 20 eliranz 2014-03-30 09:15:41 EDT
Hi Ben,

The proposed change requires changing the multipath.conf file. 
Our standards are to use the default OS settings, especially on Red Hat which adjusted to XIV not long ago. Therefore, this proposed solution is not fit for us.

Currently, we're looking for workaround to solve this issue.

- Eliran
Comment 21 Ben Marzinski 2014-04-04 12:58:13 EDT
I'm not sure what solution there will be in RHEL6, short of changing that configuration.  I understand your issue.  Our default configs are simply what the vendors have given us.  But IBM is not very proactive on updating these configurations.  Would it be possible for you to contact IBM to see if this is an allowable change.  I know a large number of customer do change this. While you are at it, you should ask them if you can change the path_selector to
service-time, since this can result in higher performance on some setups.

I would recommend a configuration of

devices {
    device {
        vendor "IBM"
        product "2810XIV"
        path_selector "service-time 0"
        features "0
        no_path_retry "12"
    }
}


setting no_path_retry to 12 will do 12 retries (which at 5 seconds apart will take one minute) and then fail over the device.

If your issue with changing the defaults is not due to vendor or OS support, but to make installations more standard, then you still may consider talking to IBM. We simply use the configuration that IBM gives us, and if people complain to them, they will be more likely to update their configuration.

Note You need to log in before you can comment on or make changes to this bug.