Bug 1133064
| Summary: | Device mapper multipath will not create some maps even though block devices are available from NetApp FAS6240 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jason Czech <jason.czech> | ||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Lin Li <lilin> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 6.4 | CC: | agk, bdonahue, bmarzins, ctatman, dwysocha, heinzm, jason.czech, jbrassow, lilin, msnitzer, prajnoha, prockai, rbalakri, salmy, zkabelac | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-09-29 21:48:12 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
It looks like your problem in the run recorded in coeus_multipath_-v6.txt is that the dm device hpwnetapp02_000 does exist, but it's not a multipath device. Looking here: Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table 360001ff00c68030f8d0108000000fa00 NF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table 360001ff00c68030f8d0108000000fa00 NF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm status 360001ff00c68030f8d0108000000fa00 NF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info 360001ff00c68030f8d0108000000fa00 OF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info 360001ff00c68030f8d0108000000fa00 NF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table hpwnetapp02_000 NF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table ddn9900_001_a_sata NF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table ddn9900_001_a_sata NF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm status ddn9900_001_a_sata NF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info ddn9900_001_a_sata OF [16384] (*1) Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info ddn9900_001_a_sata NF [16384] (*1) There are 2 calls to dm table, a call to dm status and 2 calls to dm info for 360001ff00c68030f8d0108000000fa00 and ddn9900_001_a_sata, but only a single call to dm table for hpwnetapp02_000. That first call is to grab the device type. If the device isn't a multipath device, it's ignored, so it appears that hpwnetapp02_000 must exist but not be a multipath device. That makes sense with what appear later. Aug 22 11:00:37 | hpwnetapp02_000: set ACT_CREATE (map does not exist) Means that there is no multipath device with this name or wwid. Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info hpwnetapp02_000 NF [16384] (*1) Aug 22 11:00:37 | hpwnetapp02_000: map already present Happens when multipath tries to actually create the device, and finds that the device name is already in use. I'm not sure how this is happening, or what hpwnetapp02_000 actually is. If you could show me the output of # dmsetup table I could answer that better. There have been some advancements, but no root cause identified. By updating 19 pacakges and adding rd_NO_MULTIPATH to the kernel boot options I am able to get the system to boot properly. These are the updated packages from the otherwise base RHEL6.4 install: bfa-firmware-3.2.21.1-2.el6.noarch.rpm device-mapper-multipath-0.4.9-72.el6_5.3.x86_64.rpm device-mapper-multipath-libs-0.4.9-72.el6_5.3.x86_64.rpm dracut-004-336.el6_5.2.noarch.rpm dracut-fips-004-336.el6_5.2.noarch.rpm dracut-kernel-004-336.el6_5.2.noarch.rpm dracut-network-004-336.el6_5.2.noarch.rpm initscripts-9.03.40-2.el6_5.3.x86_64.rpm kernel-2.6.32-431.23.3.el6.x86_64.rpm kernel-devel-2.6.32-431.23.3.el6.x86_64.rpm kernel-doc-2.6.32-431.23.3.el6.noarch.rpm kernel-firmware-2.6.32-431.23.3.el6.noarch.rpm kernel-headers-2.6.32-431.23.3.el6.x86_64.rpm kpartx-0.4.9-72.el6_5.3.x86_64.rpm libgudev1-147-2.51.el6.x86_64.rpm libgudev1-devel-147-2.51.el6.x86_64.rpm libudev-147-2.51.el6.x86_64.rpm libudev-devel-147-2.51.el6.x86_64.rpm udev-147-2.51.el6.x86_64.rpm I have an older system without these updates that is currently booted and has one inacessible multipath device. It does not show up in dmsetup output. I'll attach output of multipath -v6 and dmsetup table from that system. Created attachment 935899 [details]
Kraken "multipath -v6" and "dmsetup table" output
I know this one sort of fell through the cracks, but are you still seeing this? Looking at your last attachment, it doesn't appear that in this case hpwnetapp02_006 already exists. Still, messages like device-mapper: create ioctl on hpwnetapp02_006 failed: Device or resource bus almost always happen because one of the path devices IS currently in use. I have not been able to test a more recent version of RHEL6 on these systems. The problem as I described continued to exist until the workaround I stated in comment #3 fixed this issue. Do you still have a system without the updates from comment #3? If so, could you run # dmestup info as well as # multipath -v6 if it's changed since what you posted in comment #4. Unfortunately I don't have any systems without the updates to test with at this point since we've applied the patches/workaround to all our impacted systems. Since you fixed this issue by updating patches to RHEL-6.5 it does seem very possible that whatever your issue is, has already been fixed in a previous release. I haven't been able to reproduce this, and I'm leaning towards closing this with INSUFFICIENT_DATA. Since patching per comment #3 the bug still exists if the rd_NO_MULTIPATH kernel boot option is not set. The problem only occurred with LUNs from a pair of NetApp FAS6240 controllers running clustered OnTap. LUNs from other vendor's storage controllers weren't impacted, so unless you have that specific hardware/storage OS to troubleshoot you might have trouble recreating the issue. Unfortunately, I'm not able to reboot these production hosts at will to troubleshoot, I don't have a viable test host, and I can't afford to spend time on it since we've already found a workaround. Hi Jason, Because there is no NetApp FAS6240 in our lab, could you help to feedback test result on RHEL-6.8 once the patch is available? thanks a lot! Hi Lin, When will the patch be available? Is it also available in RHEL7.x? Will I be able to apply a subset of packages (per Comment #3) or will I be required to upgrade all packages to RHEL6.8 revisions? These systems operate under strict change control and applying a full set of upgraded packages could potentially be very disruptive. There currently is no patch in the works. Without the ability to recreate this, and get information off a system when this is happening, this is going to be a tricky to figure out what's going wrong. Looking through the code around this area, I don't see anything that could cause this without one of the path devices actually being in use, and if a path device is in use, there's nothing that multipath can do. The kernel simply won't allow two systems to exclusively grab a device. I agree and understand that it will be difficult to fix without being able to recreate the problem. The systems that exhibit the behavior are in production so I don't have the liberty of using them for testing. It would be difficult to imagine something is using the device since the problem happens during boot up before any of the devices are mounted. We are using host based zoning so only one system has access to any given device. The startup process aborts because the system can't find the path to one of the devices that defined in /etc/fstab. The particular device would change from one reboot to the next, but it would consistently only be one device that was unavailable. Is there any more information on what happening here? Are you still able to see this, and if so, are the devices that multipath is ignoring in use? The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 929682 [details] coeus messages and config files Description of problem: Device mapper multipath will not create maps even though block devices are available from NetApp FAS6240. It errors with "device or resource busy". Removing the block devs, flsuhing unused maps, and rescanning results in the same error. All maps except one are successfully created. Version-Release number of selected component (if applicable): Stock 6.4 installation with the following upgraded packages bfa-firmware-3.2.21.1-2.el6.noarch device-mapper-multipath-0.4.9-72.el6_5.3.x86_64 device-mapper-multipath-libs-0.4.9-72.el6_5.3.x86_64 kernel-2.6.32-431.23.3.el6.x86_64 kernel-devel-2.6.32-431.23.3.el6.x86_64 kernel-doc-2.6.32-431.23.3.el6.noarch kernel-firmware-2.6.32-431.23.3.el6.noarch kernel-headers-2.6.32-431.23.3.el6.x86_64 kpartx-0.4.9-72.el6_5.3.x86_64 How reproducible: Every reboot results in one of the NetApp maps refusing to be created. The specific map changes at random. Steps to Reproduce: 1. Reboot server Actual results: One NetApp device mapper path does not get created. Expected results: /dev/mapper/hpwnetapp02_000 and /dev/mapper/hpwnetapp02_001 should be created. Additional info: Kernel boot options: title Red Hat Enterprise Linux Server (2.6.32-431.23.3.el6.x86_64) root (hd0,0) kernel /vmlinuz-2.6.32-431.23.3.el6.x86_64 ro root=UUID=4249a09d-0804-4199-aa18-6630a98f8ab7 intel_iommu=on rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb vga=0x303 noresume transparent_hugepage=never log_buf_len=4M initrd /initramfs-2.6.32-431.23.3.el6.x86_64.img Attached dmesg output, /etc/multipath.conf, /etc/multipath/bindings, /etc/multipath/wwids, /etc/lvm/lvm.conf, /var/log/messages, multipath -v6 output