Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1133064

Summary: Device mapper multipath will not create some maps even though block devices are available from NetApp FAS6240
Product: Red Hat Enterprise Linux 6 Reporter: Jason Czech <jason.czech>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Lin Li <lilin>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: agk, bdonahue, bmarzins, ctatman, dwysocha, heinzm, jason.czech, jbrassow, lilin, msnitzer, prajnoha, prockai, rbalakri, salmy, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-29 21:48:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
coeus messages and config files
none
Kraken "multipath -v6" and "dmsetup table" output none

Description Jason Czech 2014-08-22 15:09:59 UTC
Created attachment 929682 [details]
coeus messages and config files

Description of problem: Device mapper multipath will not create maps even though block devices are available from NetApp FAS6240. It errors with "device or resource busy". Removing the block devs, flsuhing unused maps, and rescanning results in the same error. All maps except one are successfully created.


Version-Release number of selected component (if applicable): Stock 6.4 installation with the following upgraded packages

bfa-firmware-3.2.21.1-2.el6.noarch
device-mapper-multipath-0.4.9-72.el6_5.3.x86_64
device-mapper-multipath-libs-0.4.9-72.el6_5.3.x86_64
kernel-2.6.32-431.23.3.el6.x86_64
kernel-devel-2.6.32-431.23.3.el6.x86_64
kernel-doc-2.6.32-431.23.3.el6.noarch
kernel-firmware-2.6.32-431.23.3.el6.noarch
kernel-headers-2.6.32-431.23.3.el6.x86_64
kpartx-0.4.9-72.el6_5.3.x86_64

How reproducible: Every reboot results in one of the NetApp maps refusing to be created.  The specific map changes at random.

Steps to Reproduce:
1. Reboot server

Actual results:

One NetApp device mapper path does not get created.

Expected results:

/dev/mapper/hpwnetapp02_000 and /dev/mapper/hpwnetapp02_001 should be created.

Additional info:

Kernel boot options:
title Red Hat Enterprise Linux Server (2.6.32-431.23.3.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-431.23.3.el6.x86_64 ro root=UUID=4249a09d-0804-4199-aa18-6630a98f8ab7 intel_iommu=on rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb vga=0x303 noresume transparent_hugepage=never log_buf_len=4M
        initrd /initramfs-2.6.32-431.23.3.el6.x86_64.img

Attached dmesg output, /etc/multipath.conf, /etc/multipath/bindings, /etc/multipath/wwids, /etc/lvm/lvm.conf, /var/log/messages, multipath -v6 output

Comment 2 Ben Marzinski 2014-08-28 15:19:06 UTC
It looks like your problem in the run recorded in coeus_multipath_-v6.txt is that the dm device hpwnetapp02_000 does exist, but it's not a multipath device.

Looking here:
Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table 360001ff00c68030f8d0108000000fa00  NF   [16384] (*1)
Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table 360001ff00c68030f8d0108000000fa00  NF   [16384] (*1)
Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm status 360001ff00c68030f8d0108000000fa00  NF   [16384] (*1)
Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info 360001ff00c68030f8d0108000000fa00  OF   [16384] (*1)
Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info 360001ff00c68030f8d0108000000fa00  NF   [16384] (*1)

Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table hpwnetapp02_000  NF   [16384] (*1)

Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table ddn9900_001_a_sata  NF   [16384] (*1)
Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm table ddn9900_001_a_sata  NF   [16384] (*1)
Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm status ddn9900_001_a_sata  NF   [16384] (*1)
Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info ddn9900_001_a_sata  OF   [16384] (*1)
Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info ddn9900_001_a_sata  NF   [16384] (*1)

There are 2 calls to dm table, a call to dm status and 2 calls to dm info for 360001ff00c68030f8d0108000000fa00 and ddn9900_001_a_sata, but only a single call to dm table for hpwnetapp02_000. That first call is to grab the device type.  If the device isn't a multipath device, it's ignored, so it appears that 
hpwnetapp02_000 must exist but not be a multipath device. That makes sense with what appear later.

Aug 22 11:00:37 | hpwnetapp02_000: set ACT_CREATE (map does not exist)

Means that there is no multipath device with this name or wwid.

Aug 22 11:00:37 | libdevmapper: ioctl/libdm-iface.c(1724): dm info hpwnetapp02_000  NF   [16384] (*1)
Aug 22 11:00:37 | hpwnetapp02_000: map already present

Happens when multipath tries to actually create the device, and finds that the device name is already in use.

I'm not sure how this is happening, or what hpwnetapp02_000 actually is. If you could show me the output of

# dmsetup table

I could answer that better.

Comment 3 Jason Czech 2014-09-09 19:08:14 UTC
There have been some advancements, but no root cause identified.  By updating 19 pacakges and adding rd_NO_MULTIPATH to the kernel boot options I am able to get the system to boot properly.  These are the updated packages from the otherwise base RHEL6.4 install:

bfa-firmware-3.2.21.1-2.el6.noarch.rpm
device-mapper-multipath-0.4.9-72.el6_5.3.x86_64.rpm
device-mapper-multipath-libs-0.4.9-72.el6_5.3.x86_64.rpm
dracut-004-336.el6_5.2.noarch.rpm
dracut-fips-004-336.el6_5.2.noarch.rpm
dracut-kernel-004-336.el6_5.2.noarch.rpm
dracut-network-004-336.el6_5.2.noarch.rpm
initscripts-9.03.40-2.el6_5.3.x86_64.rpm
kernel-2.6.32-431.23.3.el6.x86_64.rpm
kernel-devel-2.6.32-431.23.3.el6.x86_64.rpm
kernel-doc-2.6.32-431.23.3.el6.noarch.rpm
kernel-firmware-2.6.32-431.23.3.el6.noarch.rpm
kernel-headers-2.6.32-431.23.3.el6.x86_64.rpm
kpartx-0.4.9-72.el6_5.3.x86_64.rpm
libgudev1-147-2.51.el6.x86_64.rpm
libgudev1-devel-147-2.51.el6.x86_64.rpm
libudev-147-2.51.el6.x86_64.rpm
libudev-devel-147-2.51.el6.x86_64.rpm
udev-147-2.51.el6.x86_64.rpm

I have an older system without these updates that is currently booted and has one inacessible multipath device.  It does not show up in dmsetup output.  I'll attach output of multipath -v6 and dmsetup table from that system.

Comment 4 Jason Czech 2014-09-09 19:09:39 UTC
Created attachment 935899 [details]
Kraken "multipath -v6" and "dmsetup table" output

Comment 5 Ben Marzinski 2015-10-16 23:38:14 UTC
I know this one sort of fell through the cracks, but are you still seeing this? Looking at your last attachment, it doesn't appear that in this case hpwnetapp02_006 already exists.  Still, messages like

 device-mapper: create ioctl on hpwnetapp02_006 failed: Device or resource bus

almost always happen because one of the path devices IS currently in use.

Comment 7 Jason Czech 2015-10-20 20:56:05 UTC
I have not been able to test a more recent version of RHEL6 on these systems.  The problem as I described continued to exist until the workaround I stated in comment #3 fixed this issue.

Comment 9 Ben Marzinski 2015-10-21 21:46:50 UTC
Do you still have a system without the updates from comment #3? If so, could you run

# dmestup info

as well as

# multipath -v6

if it's changed since what you posted in comment #4.

Comment 10 Jason Czech 2015-11-04 20:01:09 UTC
Unfortunately I don't have any systems without the updates to test with at this point since we've applied the patches/workaround to all our impacted systems.

Comment 12 Ben Marzinski 2015-11-05 16:10:13 UTC
Since you fixed this issue by updating patches to RHEL-6.5 it does seem very possible that whatever your issue is, has already been fixed in a previous release. I haven't been able to reproduce this, and I'm leaning towards closing this with INSUFFICIENT_DATA.

Comment 13 Jason Czech 2015-11-05 21:08:46 UTC
Since patching per comment #3 the bug still exists if the rd_NO_MULTIPATH kernel boot option is not set.  The problem only occurred with LUNs from a pair of NetApp FAS6240 controllers running clustered OnTap.  LUNs from other vendor's storage controllers weren't impacted, so unless you have that specific hardware/storage OS to troubleshoot you might have trouble recreating the issue.  Unfortunately, I'm not able to reboot these production hosts at will to troubleshoot, I don't have a viable test host, and I can't afford to spend time on it since we've already found a workaround.

Comment 20 Lin Li 2015-12-22 08:06:08 UTC
Hi Jason,
Because there is no NetApp FAS6240 in our lab, could you help to feedback test result on RHEL-6.8 once the patch is available?
thanks a lot!

Comment 21 Jason Czech 2016-01-19 14:05:41 UTC
Hi Lin,

When will the patch be available?  Is it also available in RHEL7.x?  Will I be able to apply a subset of packages (per Comment #3) or will I be required to upgrade all packages to RHEL6.8 revisions?  These systems operate under strict change control and applying a full set of upgraded packages could potentially be very disruptive.

Comment 22 Ben Marzinski 2016-01-19 15:54:48 UTC
There currently is no patch in the works. Without the ability to recreate this, and get information off a system when this is happening, this is going to be a tricky to figure out what's going wrong.  Looking through the code around this area, I don't see anything that could cause this without one of the path devices actually being in use, and if a path device is in use, there's nothing that multipath can do. The kernel simply won't allow two systems to exclusively grab a device.

Comment 23 Jason Czech 2016-01-21 20:01:20 UTC
I agree and understand that it will be difficult to fix without being able to recreate the problem.  The systems that exhibit the behavior are in production so I don't have the liberty of using them for testing.

It would be difficult to imagine something is using the device since the problem happens during boot up before any of the devices are mounted.  We are using host based zoning so only one system has access to any given device.  The startup process aborts because the system can't find the path to one of the devices that defined in /etc/fstab.  The particular device would change from one reboot to the next, but it would consistently only be one device that was unavailable.

Comment 25 Ben Marzinski 2016-08-12 16:15:52 UTC
Is there any more information on what happening here? Are you still able to see this, and if so, are the devices that multipath is ignoring in use?

Comment 27 Red Hat Bugzilla 2023-09-14 02:46:12 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days