Red Hat Bugzilla – Bug 1351430
System fails to boot and drops into emergency shell after failing to mount /boot on multipath device
Last modified: 2016-11-04 04:20:11 EDT
Description of problem: UCS SAN boot system fails to boot up and drops into emergency shell. We believe this is caused by following patch. In our case, stand-by path gets added first and any further I/O gets blocked until an active path is added to multipath map. Unfortunately multipath is delaying adding of active paths to the map until udev processing is complete. By this time /dev/sd path gets mounted as root and any further multipath reloads will fail after this leading it to be stuck with stand-by path forever. Version-Release number of selected component (if applicable): 7.2 [root@artemis ~]# uname -r 3.10.0-327.22.2.el7.x86_64 How reproducible: Consistent everytime. Steps to Reproduce: 1. Configure UCS system to boot from Nimble volume. 2. On reboot all paths are discovered, but multipath map is created only with standby path and all other paths are orphaned. 3. System fails to boot. Actual results: System fails to boot up. Expected results: All active paths are added quickly after discovery and boot up. Additional info: Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: Direct-Access Nimble Server 1.0 PQ: 0 ANSI: 5 Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: supports implicit TPGS Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: port group 02 rel port 07 Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: rtpg failed with 8000002 Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: port group 02 state S non-preferred supports tolusna Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: Attached Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: sd 1:0:1:0: [sdb] 524288000 512-byte logical blocks: (268 GB/250 GiB) Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sdb: add path (uevent) Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sdb: spurious uevent, path already in pathvec Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com kernel: device-mapper: multipath round-robin: version 1.0.0 loaded Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: load table [0 524288000 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 1 1 8:16 20] Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: event checker started Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sdb [8:16]: path added to devmap mpatha Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sda: add path (uevent) Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sdf: spurious uevent, path already in pathvec Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdh: add path (uevent) Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdh: spurious uevent, path already in pathvec Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sde: add path (uevent) Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sde: spurious uevent, path already in pathvec Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdc: add path (uevent) Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdc: spurious uevent, path already in pathvec Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdd: add path (uevent) Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdd: spurious uevent, path already in pathvec Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdg: add path (uevent) Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdg: spurious uevent, path already in pathvec Jun 29 23:05:43 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: startup incomplete. Still waiting on udev Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: startup incomplete. Still waiting on udev Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com systemd[1]: Job dev-disk-by\x2duuid-e6cecf6a\x2de56f\x2d4a31\x2dbc8f\x2dd43e1ae2e071.device/start timed out. Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-e6cecf6a\x2de56f\x2d4a31\x2dbc8f\x2dd43e1ae2e071.device. -- Subject: Unit dev-disk-by\x2duuid-e6cecf6a\x2de56f\x2d4a31\x2dbc8f\x2dd43e1ae2e071.device has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit dev-disk-by\x2duuid-e6cecf6a\x2de56f\x2d4a31\x2dbc8f\x2dd43e1ae2e071.device has failed. -- -- The result is timeout. Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com systemd[1]: Dependency failed for /boot. -- The start-up result is done. Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com systemd[1]: Starting Emergency Shell... -- Subject: Unit emergency.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel Jun 29 23:07:13 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: startup incomplete. Still waiting on udev
Patch which seems to have caused this. https://www.redhat.com/archives/dm-devel/2016-March/msg00146.html
I'm fairly certain that the delay can safely be removed in the initramfs. Are you able to boot the machine up at all? If so, I can give you a test package that will disable this behavior during the initramfs.
Yes, we are able to boot into rescue kernel. Please provide the test package we can verify that.
hello shivamerla1, There is no Nimble storage in our lab, so could you feedback test result once the fixed version is available? thanks
Sure, we can test the private packages and provide feedback.
Could you please try the packages available at http://people.redhat.com/~bmarzins/device-mapper-multipath/rpms/RHEL7/bz1350931/ and see if this resolves your issue. You will need to remake the initramfs after installing these packages. These packages should also fix Bug 1350931
Thanks Ben, These packages seems to fix the issue we have encountered. System boots fine without issues. Please let us know when will updates be available with the fix.
Do you know if this also fixes Bug 1350931?
Yes Ben, I did basic sanity test and it does fix the Bug 1350931. Thanks
Hello shivamerla1 Would you pls help test this bug and BZ1291406 with the fixed package? Thanks in advance. Yi
We have already verified the fix with private package. It looks good from our side.
(In reply to shivamerla1 from comment #14) > We have already verified the fix with private package. It looks good from > our side. Hi shivamerla1 Thanks for your confirmation, change to VERIFIED. Yi
Is it possible to provide this fix as part of 7.2 Errata release for device-mapper-multipath?. Currently its only available through Beta repos, which our customers might not be willing to upgrade.
Ben: The doc text for the release note description is nearly identical to the doc text that was provided for the release note description in BZ#1350931. Is this the same fix? It seems as though we only need to describe this once in the release notes (while referencing both bugs). Steven
(In reply to Steven J. Levine from comment #17) > Ben: > > The doc text for the release note description is nearly identical to the doc > text that was provided for the release note description in BZ#1350931. Is > this the same fix? It seems as though we only need to describe this once in > the release notes (while referencing both bugs). > > Steven Yep. These are the same issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2536.html