| Summary: | RHEL 6.4 ioctl() hangs on multipath devices with no paths after volume unmap | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | eliranz | ||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Lin Li <lilin> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.4 | CC: | agk, alonma, bmarzins, coughlan, dwysocha, eliranz, heinzm, lilin, msnitzer, prajnoha, prockai, rbalakri, zkabelac | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-08-11 19:23:52 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Can you give me the # multipath -ll output from both RHEL-6.3 and RHEL-6.4, as well what versions of the device-mapper-multipath and kernel packages you used in both (I see you're using device-mapper-multipath 0.4.9-64). I'm not currently able to recreate your issue. Created attachment 915760 [details]
Comment
(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
Any updates??? This issue appears to have been introduced by a change in the 2.6.32-319.el6 kernel. I'm looking into it now. The multipath kernel code was intentionally changed to queue ioctls when queue_if_no_paths is set, instead of returning an error. If you change your device configuration to not queue indefinitely, after queueing is disabled, sg_inq will return like it did in RHEL-6.3 Hi Ben, Can you please explain what are the reasons to change the behavior of the driver to queue requests instead of return an immediate error when queue_if_no_paths is set? This change is a real problem for our applications. Instead of clearly returning an error and displaying the situation to the user, the application will just get stuck. There was a long process where the default multipath settings were set for XIV arrays and this change breaks this process. (In reply to eliranz from comment #7) > Hi Ben, > > Can you please explain what are the reasons to change the behavior of the > driver to queue requests instead of return an immediate error when > queue_if_no_paths is set? > > This change is a real problem for our applications. Instead of clearly > returning an error and displaying the situation to the user, the application > will just get stuck. > > There was a long process where the default multipath settings were set for > XIV arrays and this change breaks this process. The upstream patch in question is here: http://git.kernel.org/linus/7ba10aa6fba It is a patch that just fell out of testing a different mpath change proposed by David Jeffery. The related dm-devel thread starts here (this is midpoint in response to David's initial patch proposal): http://www.redhat.com/archives/dm-devel/2012-September/msg00020.html And ultimately I posted the patch in question to dm-devel here: http://www.redhat.com/archives/dm-devel/2012-September/msg00205.html So anyway, taking a step back. This bug is all about the ioctl hanging now if queue_if_no_path is set. I'm surprised RHEL6.3 didn't respond that way before my change. The patch makes it so that if queue_if_no_path is _not_ set the ioctl will fail immediately. If queue_if_no_path is set, it'll queue the ioctl. As is evidenced from header from commit 7ba10aa6fba (In reply to Mike Snitzer from comment #9) > > So anyway, taking a step back. This bug is all about the ioctl > hanging now if queue_if_no_path is set. I'm surprised RHEL6.3 didn't > respond that way before my change. After looking closer, I'm not surprised. IBM's case doesn't have m->queue_io set whereas the scenario where an mpath device doesn't have any paths on mpath table load does have m->queue_io. > The patch makes it so that if queue_if_no_path is _not_ set the ioctl > will fail immediately. If queue_if_no_path is set, it'll queue the > ioctl. As is evidenced from header from commit 7ba10aa6fba My broader point is an mpath ioctl, which is destined to an underlying path, should queue_if_no_path just like the IO path does. That is why this fix was sent upstream and tagged for inclusion in all upstream stable linux kernel trees. IBM's application should _not_ be sending ioctl to an mpath device if it doesn't have any paths. I can appreciate that the change in question creates problems for them because they never had to worry about the ioctl hanging. But the previous queue_if_no_path inconsistency of the IO path queuing but the ioctl path _not_ queuing when there are no paths was never something an application should rely on. One possibility for a workaround for IBM is to add a new RHEL6-only dm-mpath configuration option that allows them to set 'fail_ioctl_if_no_path'. I'd _really_ rather avoid doing that but if IBM cannot see a way forward we can consider it. That said, RHEL7 also queues the ioctl if no paths and queue_if_no_path is set so unless IBM changes their application to only issue ioctls when there are paths available it'll just fail for them in RHEL7. The way we test that an mpath is valid (and has valid paths) is by sending a request. What best practice does redhat recommend for identifying a device has a valid path? (In reply to eliranz from comment #12) > The way we test that an mpath is valid (and has valid paths) is by sending a > request. What best practice does redhat recommend for identifying a device > has a valid path? What do you mean by "sending a request"? You configure queue_if_no_path so the request will never complete if there aren't any valid paths available. You can evaluate the mpath device's state with 'multipath -ll'. Alternatively, you can issue dmsetup commands to see if the mapth device has configured paths, e.g.: dmsetup table <mpath device name> Or to get finer grained path status info via dmsetup, use: dmsetup info <mpath device name> Paths in a path group that are active are denoted with 'A', paths that are failed with 'F'. We can see that the dmsetup status <device> identifies whether there is a valid path. However, it takes a few seconds to discover that there is no path available, and in case we fall under this period and call our scsi inquiry we will remain stuck for ever (or until the path is back). This does reduce the chances of being stuck, but does not fully solve the problem. We want to emphasize that this problem is not just with our application. You can see exactly the same behavior with sg_utils package provided by redhat. From what we understand if a device lost all its paths forever, any inquiry will just remain stuck forever. Did we get it wrong? Are we missing something? Maybe the solution should be some timeout that will cause the inquiry to return with an error. (In reply to eliranz from comment #14) > We can see that the dmsetup status <device> identifies whether there is a > valid path. However, it takes a few seconds to discover that there is no > path available, and in case we fall under this period and call our scsi > inquiry we will remain stuck for ever (or until the path is back). This does > reduce the chances of being stuck, but does not fully solve the problem. > > We want to emphasize that this problem is not just with our application. You > can see exactly the same behavior with sg_utils package provided by redhat. Sure, but the point is the user asked for IO to be queue_if_no_path. Path recovery is required to get that queued IO to complete. The fix in question just applies the same policy to ioctls too -- which should've been the case from the start. > From what we understand if a device lost all its paths forever, any inquiry > will just remain stuck forever. Did we get it wrong? Are we missing > something? Maybe the solution should be some timeout that will cause the > inquiry to return with an error. We are exploring the possibility of a timeout for queue_if_no_path as part of another thread on dm-devel, the latest RFC patch for this is here (and still requires testing): https://patchwork.kernel.org/patch/3070391/ NOTE: this patch focuses on timing out outstanding IO requests, not on ioctls. If/when an IO request timeouts it'll disable m->queue_if_no_path, so the next retry of the ioctl will fail with -EIO. If you set no_path_retry <some_number> Then you won't queue your ioctls forever. After the last path fails, multipathd will retry for a set number of times, and then fail the ioctls along with the IO. The only reason this wouldn't work is if you needed your IO to continue to be queued while your ioctls failed. What would be more useful would probably be a stable library interface to get information from multipath about the devices. We will investigate your recommendations. Thanks a lot for your help. Have you tried setting no_path_retry, to see if it resolved your issue? Hi Ben, The proposed change requires changing the multipath.conf file. Our standards are to use the default OS settings, especially on Red Hat which adjusted to XIV not long ago. Therefore, this proposed solution is not fit for us. Currently, we're looking for workaround to solve this issue. - Eliran I'm not sure what solution there will be in RHEL6, short of changing that configuration. I understand your issue. Our default configs are simply what the vendors have given us. But IBM is not very proactive on updating these configurations. Would it be possible for you to contact IBM to see if this is an allowable change. I know a large number of customer do change this. While you are at it, you should ask them if you can change the path_selector to
service-time, since this can result in higher performance on some setups.
I would recommend a configuration of
devices {
device {
vendor "IBM"
product "2810XIV"
path_selector "service-time 0"
features "0
no_path_retry "12"
}
}
setting no_path_retry to 12 will do 12 retries (which at 5 seconds apart will take one minute) and then fail over the device.
If your issue with changing the defaults is not due to vendor or OS support, but to make installations more standard, then you still may consider talking to IBM. We simply use the configuration that IBM gives us, and if people complain to them, they will be more likely to update their configuration.
|
Description of problem: ### Research: ## Actions: I mapped -> rescanned -> unmapped -> rescanned 13 volumes to a host running RHEL 6.3 and a host running RHEL 6.4. On both platforms dead multipath devices remain after the unmap+rescan. When executing natively implemented sg_inq on the device, on RHEL 6.4 the command hang until killing the process from other session: ## RHEL 6.4 - Executing sg_inq on the same device ## root@royr-rhel64-x64:/ $ sg_inq /dev/mapper/mpathr ^C root@royr-rhel64-x64:/ $ ## RHEL 6.4 - strace on sg_inq ## root@royr-rhel64-x64:/ $ strace sg_inq /dev/mapper/mpathr execve("/usr/bin/sg_inq", ["sg_inq", "/dev/mapper/mpathr"], [/* 43 vars */]) = 0 brk(0) = 0x2240000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730ce000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=82595, ...}) = 0 mmap(NULL, 82595, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f06730b9000 close(3)= 0 open("/usr/lib64/libsgutils2.so.2", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \223\240x3\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=159288, ...}) = 0 mmap(0x3378a00000, 2252096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3378a00000 mprotect(0x3378a22000, 2093056, PROT_NONE) = 0 mmap(0x3378c21000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x3378c21000 close(3)= 0 open("/lib64/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355\341x3\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1922152, ...}) = 0 mmap(0x3378e00000, 3745960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3378e00000 mprotect(0x3378f8a000, 2093056, PROT_NONE) = 0 mmap(0x3379189000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x189000) = 0x3379189000 mmap(0x337918e000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x337918e000 close(3)= 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730b8000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730b7000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730b6000 arch_prctl(ARCH_SET_FS, 0x7f06730b7700) = 0 mprotect(0x3379189000, 16384, PROT_READ) = 0 mprotect(0x337881f000, 4096, PROT_READ) = 0 munmap(0x7f06730b9000, 82595) = 0 brk(0) = 0x2240000 brk(0x2261000) = 0x2261000 open("/proc/devices", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06730cd000 read(3, "Character devices:\n 1 mem\n 4 /"..., 1024) = 500 close(3)= 0 munmap(0x7f06730cd000, 4096)= 0 open("/dev/mapper/mpathr", O_RDONLY|O_NONBLOCK) = 3 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 6), ...}) = 0 ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[6]=[12, 00, 00, 00, 24, 00], mx_sb_len=32, iovec_count=0, dxfer_len=36, timeout=60000, flags=0 On RHEL 6.3 on the other hand, when running the exact same scenario an IOError is raised immediately: ## RHEL 6.3 - Executing sg_inq on the same device ## root@royr-rhel63-x64:/ $ sg_inq /dev/mapper/mpathe Both SCSI INQUIRY and fetching ATA information failed on /dev/mapper/mpathe root@royr-rhel63-x64:/ $ ## RHEL 6.3 - strace on sg_inq ## execve("/usr/bin/sg_inq", ["sg_inq", "/dev/mapper/mpathe"], [/* 43 vars */]) = 0 brk(0) = 0x20bd000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5f8000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=36222, ...}) = 0 mmap(NULL, 36222, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbbab5ef000 close(3)= 0 open("/usr/lib64/libsgutils2.so.2", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \223\340\2467\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=159288, ...}) = 0 mmap(0x37a6e00000, 2252096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x37a6e00000 mprotect(0x37a6e22000, 2093056, PROT_NONE) = 0 mmap(0x37a7021000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x37a7021000 close(3)= 0 open("/lib64/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355!\2477\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1922152, ...}) = 0 mmap(0x37a7200000, 3745960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x37a7200000 mprotect(0x37a738a000, 2093056, PROT_NONE) = 0 mmap(0x37a7589000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x189000) = 0x37a7589000 mmap(0x37a758e000, 18600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x37a758e000 close(3)= 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5ee000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5ed000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5ec000 arch_prctl(ARCH_SET_FS, 0x7fbbab5ed700) = 0 mprotect(0x37a7589000, 16384, PROT_READ) = 0 mprotect(0x37a6c1f000, 4096, PROT_READ) = 0 munmap(0x7fbbab5ef000, 36222) = 0 brk(0) = 0x20bd000 brk(0x20de000) = 0x20de000 open("/proc/devices", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbbab5f7000 read(3, "Character devices:\n 1 mem\n 4 /"..., 1024) = 484 close(3)= 0 munmap(0x7fbbab5f7000, 4096)= 0 open("/dev/mapper/mpathe", O_RDONLY|O_NONBLOCK) = 3 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 6), ...}) = 0 ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[6]=[12, 00, 00, 00, 24, 00], mx_sb_len=32, iovec_count=0, dxfer_len=36, timeout=60000, flags=0}) = -1 EAGAIN (Resource temporarily unavailable) ioctl(3, 0x30d, 0x7fffa25d2ff0) = -1 EAGAIN (Resource temporarily unavailable) write(2, "Both SCSI INQUIRY and fetching A"..., 76) = 76 close(3)= 0 exit_group(99) = ? ## Conclusions: Dead unmapped volume devices in RHEL 6.4 don't respond with an error to ioctl(), rather hanging the process (both Python and native) Version-Release number of selected component (if applicable): RHEL 6.4 with device-mapper-multipath 0.4.9-64 How reproducible: Consistence Steps to Reproduce: #Map some volumes to the host #Run rescan #Unmmap volumes from the host #Run rescan #Run sg_inq Actual results: ioctl hangs Expected results: sg_inq should return an error/warning message