Bug 1445958
| Summary: | Unable to boot from iSCSI SAN boot volume after enabling multipath in initramfs | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | shivamerla1 <shiva.krishna> | ||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Lin Li <lilin> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.8 | CC: | agk, bmarzins, heinzm, lilin, msnitzer, prajnoha, rbalakri, rhandlin, shiva.krishna, zkabelac | ||||||
| Target Milestone: | rc | Keywords: | OtherQA | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 1451852 (view as bug list) | Environment: | |||||||
| Last Closed: | 2017-12-06 10:51:25 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1451852 | ||||||||
| Attachments: |
|
||||||||
Would it be possible for you to capture the boot messages from the serial console? The screenshot by itself doesn't give me much information. Could you also create an sosreport of the system before you try to reboot it, and upload that? Unfortunately i am not able to collect the console logs as its a remote system, but i have attached the sosreport before creating new initramfs image. We have brought the system up by mounting the snapshot of original volume, but this time when i enable multipath and rebuild initramfs i am getting segfault from dracut. [root@~]# dracut --add multipath /boot/initramfs-2.6.32-573.el6.x86_64-mpath.img /sbin/dracut: line 281: 30158 Segmentation fault (core dumped) depmod -a -b "$initdir" $kernel E: "depmod -a 2.6.32-573.el6.x86_64" failed. verbose mode.. I: Installing /usr/share/dracut/modules.d/96insmodpost/insmodpost.sh I: Installing /sbin/biosdevname I: Installing /lib/udev/rules.d/71-biosdevname.rules I: Installing /usr/share/dracut/modules.d/97biosdevname/parse-biosdevname.sh I: Installing /bin/mount I: Installing /bin/mknod I: Installing /bin/mkdir I: Installing /sbin/killall5 I: Installing /bin/sleep I: Installing /usr/sbin/chroot I: Installing /lib64/libacl.so.1.1.0 I: Installing /bin/ls I: Installing /usr/bin/flock I: Installing /bin/cp I: Installing /bin/mv I: Installing /bin/dmesg I: Installing /bin/rm I: Installing /bin/ln I: Installing /usr/bin/mkfifo I: Installing /lib64/libnih.so.1.0.0 I: Installing /lib64/libaudit.so.1.0.0 I: Installing /sbin/reboot I: Installing /usr/bin/less I: Installing /usr/share/dracut/modules.d/99base/init I: Installing /usr/share/dracut/modules.d/99base/initqueue I: Installing /usr/share/dracut/modules.d/99base/loginit I: Installing /sbin/switch_root I: Installing /usr/share/dracut/modules.d/99base/dracut-lib.sh I: Installing /usr/share/dracut/modules.d/99base/parse-hostname.sh I: Installing /usr/share/dracut/modules.d/99base/parse-root-opts.sh I: Installing /usr/share/dracut/modules.d/99base/parse-blacklist.sh I: Installing /usr/share/dracut/modules.d/99base/selinux-loadpolicy.sh /sbin/dracut: line 281: 23349 Segmentation fault (core dumped) depmod -a -b "$initdir" $kernel E: "depmod -a 2.6.32-573.el6.x86_64" failed. Created attachment 1274968 [details]
sosreport before enabling multipath and rebuilding initramfs
Here is the multipath.conf settings we use.
defaults {
user_friendly_names yes
find_multipaths no
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
device {
vendor ".*"
product ".*"
}
}
blacklist_exceptions {
device {
vendor "Nimble"
product "Server"
}
}
devices {
device {
vendor "Nimble"
product "Server"
path_grouping_policy group_by_prio
prio "alua"
hardware_handler "1 alua"
path_selector "service-time 0"
path_checker tur
features "1 queue_if_no_path"
failback immediate
rr_weight uniform
rr_min_io_rq 1
dev_loss_tmo infinity
fast_io_fail_tmo 1
no_path_retry 30
}
}
Looking at the screenshot, this appears to be happening in late boot, so you are running off of the actual filesystem. If you run # chkconfig multipathd off before rebooting after remaking your initramfs, multipathd should not start up during late boot. Does this fix your problem? I have no idea what's causing the dracut segfault. Have you tried it without the "--add multipath" to see if that works? I have tried by disabling multipathd on boot and rebuilding initrd. But same issue, the system doesn't boot. [root@rtp-lenovo-centos68 ~]# multipath -ll May 03 15:16:33 | DM multipath kernel driver not loaded May 03 15:16:33 | /etc/multipath.conf does not exist, blacklisting all devices. May 03 15:16:33 | A sample multipath.conf file is located at May 03 15:16:33 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf May 03 15:16:33 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf May 03 15:16:33 | DM multipath kernel driver not loaded Edit multipath.conf and start multipathd. [root@rtp-lenovo-centos68 ~]# vim /etc/multipath.conf [root@rtp-lenovo-centos68 ~]# [root@rtp-lenovo-centos68 ~]# service multipathd start Starting multipathd daemon: [ OK ] [root@rtp-lenovo-centos68 ~]# rebuild initramfs without add multipath option and verified that multipathd is not in initramfs. [root@rtp-lenovo-centos68 ~]# dracut -f [root@rtp-lenovo-centos68 ~]# [root@rtp-lenovo-centos68 ~]# lsinitrd | grep multipath -rwxr--r-- 1 root root 42192 May 3 15:18 lib/modules/2.6.32-573.el6.x86_64/kernel/drivers/md/dm-multipath.ko [root@rtp-lenovo-centos68 ~]# ls -l /dev/mapper/ total 0 crw-rw----. 1 root root 10, 58 May 3 15:10 control lrwxrwxrwx. 1 root root 7 May 3 15:10 vg_rtplenovocentos68-lv_home -> ../dm-2 lrwxrwxrwx. 1 root root 7 May 3 15:10 vg_rtplenovocentos68-lv_root -> ../dm-0 lrwxrwxrwx. 1 root root 7 May 3 15:10 vg_rtplenovocentos68-lv_swap -> ../dm-1 [root@rtp-lenovo-centos68 ~]# [root@rtp-lenovo-centos68 ~]# [root@rtp-lenovo-centos68 ~]# chkconfig --list | grep multipath multipathd 0:off 1:off 2:off 3:off 4:off 5:off 6:off [root@rtp-lenovo-centos68 ~]# [root@rtp-lenovo-centos68 ~]# reboot even with this, the system is not booting, will try to drop into dracut shell and debug. Can you possibly increase fast_io_fail_tmo. I'm worried that what's happening is that the scsi layer isn't trying long enough before failing the IO to a temporarily down path. If this is your root filesystem, it may be that iscsi isn't able to run whatever needs to be run to restore access to the device (since that may require accessing the root filesystem, which is down). Hopefully, if it can just retry for longer, whatever is causing the device to be temporarily down will resolve. So, could you see if fast_io_fail_tmo 15 resolves the issue. That might be overkill, but I'd rather it waited too long than not long enough, for debugging purposes. Thanks. The system is booting fine with fast_io_fail_tmo of 5. Does this value affect iSCSI as well?. We already have replacement_timeout as 5 for iSCSI in iscsid.conf. fast_io_fail_tmo effects the recovery_tmo sysfs value, which is, I'm pretty sure, also set by the replacement_timeout option. So, setting it to 5 in both places certainly can't hurt, but if fast_io_fail_tmo is set in multipath.conf, it should not be necessary to set replacement_timeout in iscsid.conf (although it is possible that iscsid may overwrite the sysfs value later on, so setting both to the same value is clearly the safest option). I trust you would like me to change the fast_io_fail_tmo value in the builtin configuration for Nimble storage devices, correct? Ben, QA teams want to perform more tests with this value with FC as well. I will update the bug once the testing is done and we can update the default hardware table. Thanks again for your help. There is plenty of time before the next RHEL-6 release, but I'd just like to point out that the window for getting this change into RHEL-7.4 (where the same config exists) is pretty small now. So, if it is important to you to get this into RHEL-7.4 (I assume that the same problem is possible there, although changes to iscsi may make it less of an issue) it would be helpful if you could QA this quickly. Hi Ben, if there is still chance, can you push this change for RHEL 7.4? Maybe. I'll try to line up all the ACKs. Thanks. Hello shivamerla1, Because we don't have Nimble storage, could you provide test result once the fixed package is available? Thanks. Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/ The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 1274424 [details] Screen shot after boot is hung. Description of problem: We had a SAN boot system, installed without multipath enabled. So, after the system is up, we enabled multipathd and configured settings in /etc/multipath.conf for Nimble storage devices. We did rebuild the initramfs and verified all necessary files are included for multipathd using lsinitrd. [root@rtp-lenovo-centos68 ~]# lsinitrd | grep multipath drwxr-xr-x 2 root root 0 Apr 26 13:45 etc/multipath -rw------- 1 root root 282 Mar 24 13:59 etc/multipath/bindings -rw-r--r-- 1 root root 877 Apr 26 13:43 etc/multipath.conf -rw-r--r-- 1 root root 1012 Mar 22 19:48 etc/udev/rules.d/40-multipath.rules -rwxr-xr-x 1 root root 303288 Apr 26 13:45 lib64/libmultipath.so drwxr-xr-x 2 root root 0 Apr 26 13:45 lib64/multipath -rwxr-xr-x 1 root root 6592 Apr 26 13:45 lib64/multipath/libcheckcciss_tur.so -rwxr-xr-x 1 root root 8680 Apr 26 13:45 lib64/multipath/libcheckdirectio.so -rwxr-xr-x 1 root root 7904 Apr 26 13:45 lib64/multipath/libcheckemc_clariion.so -rwxr-xr-x 1 root root 6560 Apr 26 13:45 lib64/multipath/libcheckhp_sw.so -rwxr-xr-x 1 root root 13496 Apr 26 13:45 lib64/multipath/libcheckhp_tur.so -rwxr-xr-x 1 root root 8056 Apr 26 13:45 lib64/multipath/libcheckrdac.so -rwxr-xr-x 1 root root 5608 Apr 26 13:45 lib64/multipath/libcheckreadsector0.so -rwxr-xr-x 1 root root 12384 Apr 26 13:45 lib64/multipath/libchecktur.so -rwxr-xr-x 1 root root 9416 Apr 26 13:45 lib64/multipath/libprioalua.so -rwxr-xr-x 1 root root 4016 Apr 26 13:45 lib64/multipath/libprioconst.so -rwxr-xr-x 1 root root 5600 Apr 26 13:45 lib64/multipath/libprioemc.so -rwxr-xr-x 1 root root 6576 Apr 26 13:45 lib64/multipath/libpriohds.so -rwxr-xr-x 1 root root 5376 Apr 26 13:45 lib64/multipath/libpriohp_sw.so -rwxr-xr-x 1 root root 7632 Apr 26 13:45 lib64/multipath/libprioontap.so -rwxr-xr-x 1 root root 4400 Apr 26 13:45 lib64/multipath/libpriorandom.so -rwxr-xr-x 1 root root 5416 Apr 26 13:45 lib64/multipath/libpriordac.so lrwxrwxrwx 1 root root 14 Apr 26 13:45 lib64/multipath/libpriotpg_pref.so -> libprioalua.so -rwxr-xr-x 1 root root 7264 Apr 26 13:45 lib64/multipath/libprioweighted.so -rwxr--r-- 1 root root 42192 Apr 26 13:45 lib/modules/2.6.32-573.el6.x86_64/kernel/drivers/md/dm-multipath.ko -rwxr-xr-x 1 root root 238 Jan 15 2010 pre-pivot/02multipathd-stop.sh -rwxr-xr-x 1 root root 202 Jul 24 2015 pre-trigger/02multipathd.sh -rwxr-xr-x 1 root root 19576 Apr 26 13:45 sbin/multipath -rwxr-xr-x 1 root root 74424 Apr 26 13:45 sbin/multipathd But on reboot, soon after multipathd starts, iSCSI sessions goes into recovery and path is failed and boot is hung after this point. The screen shot with the error is attached. Before enabling multipath, we were able to successfully reboot the system multiple times without issues. Version-Release number of selected component (if applicable): RHEL 6.8 2.6.32-573.el6.x86_64 How reproducible: Consistently Steps to Reproduce: 1. Install RHEL 6.8 on a SAN boot volume from Nimble storage arrays(iSCSI) 2. Done enable multipath during install 3. Once the system is up, enable multipathd service and settings in multipath.conf 4. Rebuild initramfs using command ( dracut -f --add multipath ). 5. Reboot the system and it hangs during the boot and path failure is reported as being in offline state. Actual results: Screenshot attached with the failure. Not able to collect any additional logs, as system wont boot. We have overwritten the original initramfs after enabling multipath. Expected results: System should reboot fine and mount root disk on multipath device instead of /dev/sda. Additional info: