Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 907360

Summary: dracut does not create the mpath node after attaching the FCoE Luns
Product: Red Hat Enterprise Linux 7 Reporter: Xiaowei Li <xiaoli>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: agk, bmarzins, dracut-maint-list, harald, heinzm, jcastillo, msnitzer, prajnoha, qcai, ruyang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: device-mapper-multipath-0.4.9-46.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 11:50:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 874073    
Attachments:
Description Flags
dracut.log none

Description Xiaowei Li 2013-02-04 08:17:26 UTC
Description of problem:

Test environment:
1. Emulex One Connect 10Gb CNA (kernel module: lpfc)
2. FCoE SAN boot

After installing the OS to the lpfc FCoE Luns then rebooting it, the dracut went to the shell since it didn't create the multipath node. 

Here are some investigation information
==== the FCoE Luns are attached but multipath is not created.
dracut> Could not boot
dracut:/# ls /dev/mapper/
control
dracut:/# multipath -ll
dracut:/# ls /dev/sd*
/dev/sda  /dev/sdb  /dev/sdc  /dev/sdd	/dev/sde  /dev/sdf  /dev/sdg  /dev/sdh
dracut:/# ps -ef|grep multi
root       183     1  0 02:38 ?        00:00:00 multipathd -B
root       639   630  0 02:46 ttyS1    00:00:00 grep multi

==== from the dracut debug log, it started 'multipathd -B' before scanning the lpfc Luns. So I reproduced this issue according to the following steps in the dracut environment.

>> multipathd -B is running >>
1. modprobe -r lpfc
2. modprobe lpfc
3. multipath -ll
>>> don't create the multipath node >>>

I repeated the above steps many times but didn't see the multipath nodes are created after performing 'modprobe lpfc'.

I only saw this issue when doing the FCoE SAN boot, didn't see it for FC SAN boot.So I guess if it's caused by some asynchronization mechanism。 If you need the server to reproduce this issue please let me know.

Version-Release number of selected component (if applicable):
dracut-024-18.git20130102.el7.x86_64.rpm   
device-mapper-multipath-0.4.9-42.el7.x86_64.rpm  
kernel-3.7.0-0.33.el7.x86_64.rpm

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.
  
Actual results:
fail to boot from the lpfc FCoE Lun.

Expected results:
lpfc FCoE SAN boot works well


Additional info:

Comment 1 Xiaowei Li 2013-02-04 08:20:10 UTC
Created attachment 692652 [details]
dracut.log

Comment 3 Harald Hoyer 2013-02-11 12:35:21 UTC
reassigning to device-mapper-multipath

Comment 13 Xiaowei Li 2013-03-15 06:27:06 UTC
the server storageqe-17.rhts.eng.bos.redhat.com is configured to boot from lpfc-fcoe EMC cx400 array. 

reproduced the issue with the kernel-kernel-3.7.0-0.36.el7.x86_64.
but didn't see the issue with kernel-3.8.0-0.40.el7.x86_64

loaned the server to you.

Comment 17 Ben Marzinski 2013-03-27 22:24:17 UTC
So there is nothing in the code that forces the pthread condition structure to get initialized before the waiter waits on it, other than the fact that the
thread that initializes it gets started a number of blocking calls before the
thread that waits on it is started. Also, the thread that initializes it does so pretty much right away, while the other thread doesn't wait on it till a while in.

However, since I changed the pthread condition structure and the pthread mutex structure to be statically initialized, I haven't been able to reproduce the issue after about an hour of repeated reboots.  I have always previously reproduced it within a half an hour.  If it doesn't reproduce by tomorrow morning, I'm pretty sure that I've found the issue.

To test, I wrote a small program that waited on a pthread_cond_t before initializing it, and sending the signal, and it behaves identically to multipathd when it's experiencing this issue.  the signaler calls pthread_cond_signal(), which returns without error, but the waiter is never awoken.

I suppose I could dig into the pthread code and figure out how to check if
a condition structure has been initialized or not, but multipath needs to force this structure to get initialized before it's used at any rate.

Comment 18 Ben Marzinski 2013-03-28 15:20:01 UTC
something went wrong with my rebooter script in the middle of the night, but it wasn't that the node hung in dracut. The node booted fine. For some reason, I just wasn't able to ping it for half an hour, so my script quit.  You can have the machine back. I'll be making a build with the patch today.

Comment 19 Ben Marzinski 2013-03-28 15:43:10 UTC
*** Bug 895800 has been marked as a duplicate of this bug. ***

Comment 20 Xiaowei Li 2013-03-29 02:04:43 UTC
(In reply to comment #18)
> something went wrong with my rebooter script in the middle of the night, but
> it wasn't that the node hung in dracut. The node booted fine. For some
> reason, I just wasn't able to ping it for half an hour, so my script quit. 
> You can have the machine back. I'll be making a build with the patch today.

thanks a lot.

Comment 21 Xiaowei Li 2013-03-29 11:06:14 UTC
verified with device-mapper-multipath-0.4.9-46.el7

1. install RHEL-7.0-20130313.n.2 
2. update to device-mapper-multipath-0.4.9-46.el7
3. reboot, need to manually scan the multipath after dropping to dracut shell.
4. dracut --force --add multipath --include /etc/multipath
5. reboot
OS can reboot successfully without manual multipath scanning

Comment 23 Xiaowei Li 2013-04-03 03:32:54 UTC
(In reply to comment #22)
> I'm confused.  Why did this get moved back to assigned?  Comment #21 makes
> it sound like it works once you updated the initramfs.

I set the wrong status by mistake. sorry for this. I have changed it to verified.

Comment 24 Ludek Smid 2014-06-13 11:50:21 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.