Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1445958

Summary:

Unable to boot from iSCSI SAN boot volume after enabling multipath in initramfs

Product:

Red Hat Enterprise Linux 6

Reporter:

shivamerla1 <shiva.krishna>

Component:

device-mapper-multipath

Assignee:

Ben Marzinski <bmarzins>

Status:

CLOSED WONTFIX

QA Contact:

Lin Li <lilin>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

6.8

CC:

agk, bmarzins, heinzm, lilin, msnitzer, prajnoha, rbalakri, rhandlin, shiva.krishna, zkabelac

Target Milestone:

Keywords:

OtherQA

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1451852 (view as bug list)

Environment:

Last Closed:

2017-12-06 10:51:25 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1451852

Attachments:

Description	Flags
Screen shot after boot is hung.	none
sosreport before enabling multipath and rebuilding initramfs	none

Description shivamerla1 2017-04-26 22:10:59 UTC

Created attachment 1274424 [details]
Screen shot after boot is hung.

Description of problem:
We had a SAN boot system, installed without multipath enabled. So, after the system is up, we enabled multipathd and configured settings in /etc/multipath.conf for Nimble storage devices. We did rebuild the initramfs and verified all necessary files are included for multipathd using lsinitrd.

[root@rtp-lenovo-centos68 ~]# lsinitrd | grep multipath
drwxr-xr-x   2 root     root            0 Apr 26 13:45 etc/multipath
-rw-------   1 root     root          282 Mar 24 13:59 etc/multipath/bindings
-rw-r--r--   1 root     root          877 Apr 26 13:43 etc/multipath.conf
-rw-r--r--   1 root     root         1012 Mar 22 19:48 etc/udev/rules.d/40-multipath.rules
-rwxr-xr-x   1 root     root       303288 Apr 26 13:45 lib64/libmultipath.so
drwxr-xr-x   2 root     root            0 Apr 26 13:45 lib64/multipath
-rwxr-xr-x   1 root     root         6592 Apr 26 13:45 lib64/multipath/libcheckcciss_tur.so
-rwxr-xr-x   1 root     root         8680 Apr 26 13:45 lib64/multipath/libcheckdirectio.so
-rwxr-xr-x   1 root     root         7904 Apr 26 13:45 lib64/multipath/libcheckemc_clariion.so
-rwxr-xr-x   1 root     root         6560 Apr 26 13:45 lib64/multipath/libcheckhp_sw.so
-rwxr-xr-x   1 root     root        13496 Apr 26 13:45 lib64/multipath/libcheckhp_tur.so
-rwxr-xr-x   1 root     root         8056 Apr 26 13:45 lib64/multipath/libcheckrdac.so
-rwxr-xr-x   1 root     root         5608 Apr 26 13:45 lib64/multipath/libcheckreadsector0.so
-rwxr-xr-x   1 root     root        12384 Apr 26 13:45 lib64/multipath/libchecktur.so
-rwxr-xr-x   1 root     root         9416 Apr 26 13:45 lib64/multipath/libprioalua.so
-rwxr-xr-x   1 root     root         4016 Apr 26 13:45 lib64/multipath/libprioconst.so
-rwxr-xr-x   1 root     root         5600 Apr 26 13:45 lib64/multipath/libprioemc.so
-rwxr-xr-x   1 root     root         6576 Apr 26 13:45 lib64/multipath/libpriohds.so
-rwxr-xr-x   1 root     root         5376 Apr 26 13:45 lib64/multipath/libpriohp_sw.so
-rwxr-xr-x   1 root     root         7632 Apr 26 13:45 lib64/multipath/libprioontap.so
-rwxr-xr-x   1 root     root         4400 Apr 26 13:45 lib64/multipath/libpriorandom.so
-rwxr-xr-x   1 root     root         5416 Apr 26 13:45 lib64/multipath/libpriordac.so
lrwxrwxrwx   1 root     root           14 Apr 26 13:45 lib64/multipath/libpriotpg_pref.so -> libprioalua.so
-rwxr-xr-x   1 root     root         7264 Apr 26 13:45 lib64/multipath/libprioweighted.so
-rwxr--r--   1 root     root        42192 Apr 26 13:45 lib/modules/2.6.32-573.el6.x86_64/kernel/drivers/md/dm-multipath.ko
-rwxr-xr-x   1 root     root          238 Jan 15  2010 pre-pivot/02multipathd-stop.sh
-rwxr-xr-x   1 root     root          202 Jul 24  2015 pre-trigger/02multipathd.sh
-rwxr-xr-x   1 root     root        19576 Apr 26 13:45 sbin/multipath
-rwxr-xr-x   1 root     root        74424 Apr 26 13:45 sbin/multipathd

But on reboot, soon after multipathd starts, iSCSI sessions goes into recovery and path is failed and boot is hung after this point. The screen shot with the error is attached.

Before enabling multipath, we were able to successfully reboot the system multiple times without issues.

Version-Release number of selected component (if applicable):
RHEL 6.8
2.6.32-573.el6.x86_64

How reproducible:
Consistently

Steps to Reproduce:
1. Install RHEL 6.8 on a SAN boot volume from Nimble storage arrays(iSCSI)
2. Done enable multipath during install
3. Once the system is up, enable multipathd service and settings in multipath.conf
4. Rebuild initramfs using command ( dracut -f --add multipath ).
5. Reboot the system and it hangs during the boot and path failure is reported as being in offline state.

Actual results:
Screenshot attached with the failure. Not able to collect any additional logs, as system wont boot. We have overwritten the original initramfs after enabling multipath.

Expected results:
System should reboot fine and mount root disk on multipath device instead of /dev/sda.

Additional info:

Comment 2 Ben Marzinski 2017-04-27 20:34:20 UTC

Would it be possible for you to capture the boot messages from the serial console? The screenshot by itself doesn't give me much information.

Could you also create an sosreport of the system before you try to reboot it, and upload that?

Comment 3 shivamerla1 2017-04-28 15:38:00 UTC

Unfortunately i am not able to collect the console logs as its a remote system, but i have attached the sosreport before creating new initramfs image.

We have brought the system up by mounting the snapshot of original volume, but this time when i enable multipath and rebuild initramfs i am getting segfault from dracut.

[root@~]# dracut --add multipath /boot/initramfs-2.6.32-573.el6.x86_64-mpath.img
/sbin/dracut: line 281: 30158 Segmentation fault      (core dumped) depmod -a -b "$initdir" $kernel
E: "depmod -a 2.6.32-573.el6.x86_64" failed.


verbose mode..

I: Installing /usr/share/dracut/modules.d/96insmodpost/insmodpost.sh
I: Installing /sbin/biosdevname
I: Installing /lib/udev/rules.d/71-biosdevname.rules
I: Installing /usr/share/dracut/modules.d/97biosdevname/parse-biosdevname.sh
I: Installing /bin/mount
I: Installing /bin/mknod
I: Installing /bin/mkdir
I: Installing /sbin/killall5
I: Installing /bin/sleep
I: Installing /usr/sbin/chroot
I: Installing /lib64/libacl.so.1.1.0
I: Installing /bin/ls
I: Installing /usr/bin/flock
I: Installing /bin/cp
I: Installing /bin/mv
I: Installing /bin/dmesg
I: Installing /bin/rm
I: Installing /bin/ln
I: Installing /usr/bin/mkfifo
I: Installing /lib64/libnih.so.1.0.0
I: Installing /lib64/libaudit.so.1.0.0
I: Installing /sbin/reboot
I: Installing /usr/bin/less
I: Installing /usr/share/dracut/modules.d/99base/init
I: Installing /usr/share/dracut/modules.d/99base/initqueue
I: Installing /usr/share/dracut/modules.d/99base/loginit
I: Installing /sbin/switch_root
I: Installing /usr/share/dracut/modules.d/99base/dracut-lib.sh
I: Installing /usr/share/dracut/modules.d/99base/parse-hostname.sh
I: Installing /usr/share/dracut/modules.d/99base/parse-root-opts.sh
I: Installing /usr/share/dracut/modules.d/99base/parse-blacklist.sh
I: Installing /usr/share/dracut/modules.d/99base/selinux-loadpolicy.sh
/sbin/dracut: line 281: 23349 Segmentation fault      (core dumped) depmod -a -b "$initdir" $kernel
E: "depmod -a 2.6.32-573.el6.x86_64" failed.

Comment 4 shivamerla1 2017-04-28 15:38:43 UTC

Created attachment 1274968 [details]
sosreport before enabling multipath and rebuilding initramfs

Comment 5 shivamerla1 2017-04-28 15:39:14 UTC

Here is the multipath.conf settings we use.

defaults {
    user_friendly_names yes
    find_multipaths     no
}
blacklist {
    devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode "^hd[a-z]"
    device {
        vendor  ".*"
        product ".*"
    }
}
blacklist_exceptions {
    device {
        vendor  "Nimble"
        product "Server"
    }
}
devices {
    device {
        vendor               "Nimble"
        product              "Server"
        path_grouping_policy group_by_prio
        prio                 "alua"
        hardware_handler     "1 alua"
        path_selector        "service-time 0"
        path_checker         tur
        features             "1 queue_if_no_path"
        failback             immediate
        rr_weight            uniform
        rr_min_io_rq         1
        dev_loss_tmo         infinity
        fast_io_fail_tmo     1
        no_path_retry        30
    }
}

Comment 6 Ben Marzinski 2017-05-02 23:24:41 UTC

Looking at the screenshot, this appears to be happening in late boot, so you are running off of the actual filesystem.

If you run

# chkconfig multipathd off

before rebooting after remaking your initramfs, multipathd should not start up during late boot.  Does this fix your problem?  I have no idea what's causing the dracut segfault. Have you tried it without the "--add multipath" to see if that works?

Comment 7 shivamerla1 2017-05-03 19:47:35 UTC

I have tried by disabling multipathd on boot and rebuilding initrd. But same issue, the system doesn't boot.

[root@rtp-lenovo-centos68 ~]# multipath -ll
May 03 15:16:33 | DM multipath kernel driver not loaded
May 03 15:16:33 | /etc/multipath.conf does not exist, blacklisting all devices.
May 03 15:16:33 | A sample multipath.conf file is located at
May 03 15:16:33 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf
May 03 15:16:33 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf
May 03 15:16:33 | DM multipath kernel driver not loaded

Edit multipath.conf and start multipathd.

[root@rtp-lenovo-centos68 ~]# vim /etc/multipath.conf
[root@rtp-lenovo-centos68 ~]# 

[root@rtp-lenovo-centos68 ~]# service multipathd start
Starting multipathd daemon:                                [  OK  ]
[root@rtp-lenovo-centos68 ~]# 

rebuild initramfs without add multipath option and verified that multipathd is not in initramfs.

[root@rtp-lenovo-centos68 ~]# dracut -f
[root@rtp-lenovo-centos68 ~]# 
[root@rtp-lenovo-centos68 ~]# lsinitrd | grep multipath
-rwxr--r--   1 root     root        42192 May  3 15:18 lib/modules/2.6.32-573.el6.x86_64/kernel/drivers/md/dm-multipath.ko


[root@rtp-lenovo-centos68 ~]# ls -l /dev/mapper/
total 0
crw-rw----. 1 root root 10, 58 May  3 15:10 control
lrwxrwxrwx. 1 root root      7 May  3 15:10 vg_rtplenovocentos68-lv_home -> ../dm-2
lrwxrwxrwx. 1 root root      7 May  3 15:10 vg_rtplenovocentos68-lv_root -> ../dm-0
lrwxrwxrwx. 1 root root      7 May  3 15:10 vg_rtplenovocentos68-lv_swap -> ../dm-1
[root@rtp-lenovo-centos68 ~]# 
[root@rtp-lenovo-centos68 ~]# 
[root@rtp-lenovo-centos68 ~]# chkconfig --list | grep multipath
multipathd     	0:off	1:off	2:off	3:off	4:off	5:off	6:off
[root@rtp-lenovo-centos68 ~]# 
[root@rtp-lenovo-centos68 ~]# reboot


even with this, the system is not booting, will try to drop into dracut shell and debug.

Comment 8 Ben Marzinski 2017-05-08 23:40:02 UTC

Can you possibly increase fast_io_fail_tmo.  I'm worried that what's happening is that the scsi layer isn't trying long enough before failing the IO to a temporarily down path.  If this is your root filesystem, it may be that iscsi isn't able to run whatever needs to be run to restore access to the device (since that may require accessing the root filesystem, which is down).

Hopefully, if it can just retry for longer, whatever is causing the device to be temporarily down will resolve.

So, could you see if

fast_io_fail_tmo 15

resolves the issue. That might be overkill, but I'd rather it waited too long than not long enough, for debugging purposes.

Comment 9 shivamerla1 2017-05-17 00:59:30 UTC

Thanks. The system is booting fine with fast_io_fail_tmo of 5. Does this value affect iSCSI as well?. We already have replacement_timeout as 5 for iSCSI in iscsid.conf.

Comment 10 Ben Marzinski 2017-05-17 15:22:14 UTC

fast_io_fail_tmo effects the recovery_tmo sysfs value, which is, I'm pretty sure, also set by the replacement_timeout option. So, setting it to 5 in both places certainly can't hurt, but if fast_io_fail_tmo is set in multipath.conf, it should not be necessary to set replacement_timeout in iscsid.conf (although it is possible that iscsid may overwrite the sysfs value later on, so setting both to the same value is clearly the safest option).

I trust you would like me to change the fast_io_fail_tmo value in the builtin configuration for Nimble storage devices, correct?

Comment 11 shivamerla1 2017-05-17 16:13:31 UTC

Ben, QA teams want to perform more tests with this value with FC as well. I will update the bug once the testing is done and we can update the default hardware table. Thanks again for your help.

Comment 12 Ben Marzinski 2017-05-18 14:35:07 UTC

There is plenty of time before the next RHEL-6 release, but I'd just like to point out that the window for getting this change into RHEL-7.4 (where the same config exists) is pretty small now. So, if it is important to you to get this into RHEL-7.4 (I assume that the same problem is possible there, although changes to iscsi may make it less of an issue) it would be helpful if you could QA this quickly.

Comment 13 shivamerla1 2017-05-23 22:28:39 UTC

Hi Ben, if there is still chance, can you push this change for RHEL 7.4?

Comment 14 Ben Marzinski 2017-05-24 18:03:01 UTC

Maybe. I'll try to line up all the ACKs.

Comment 15 shivamerla1 2017-05-24 19:22:14 UTC

Thanks.

Comment 16 Lin Li 2017-07-12 09:23:18 UTC

Hello shivamerla1,
Because we don't have Nimble storage, could you provide test result once the fixed package is available?
Thanks.

Comment 17 Jan Kurik 2017-12-06 10:51:25 UTC

Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Comment 18 Red Hat Bugzilla 2023-09-14 03:57:02 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days