Bug 1351430 - System fails to boot and drops into emergency shell after failing to mount /boot on multipath device
Summary: System fails to boot and drops into emergency shell after failing to mount /b...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: device-mapper-multipath   
(Show other bugs)
Version: 7.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Zhang Yi
Steven J. Levine
URL:
Whiteboard:
Keywords: OtherQA
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-30 04:57 UTC by shivamerla1
Modified: 2016-11-04 08:20 UTC (History)
12 users (show)

Fixed In Version: device-mapper-multipath-0.4.9-95.el7
Doc Type: Bug Fix
Doc Text:
~~~Included in the Release Notes as a description for BZ#1350931~~~ Cause: When multipathd created a new multipath device, it didn't allow any more paths to be added until it saw the udev change event for the multipath device being created, even if it created the device with no usable paths. Consequence: If a multipath device was created with no usable paths, udev hangs trying to get information on the device, and bootup can timeout Fix: Multipathd now allows paths to be added to a newly created multipath device, if it currently has no usable paths. Result: usable paths are immediately added to new devices that have none, and udev doen't hang.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 08:20:11 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2536 normal SHIPPED_LIVE device-mapper-multipath bug fix and enhancement update 2016-11-03 14:18:10 UTC

Description shivamerla1 2016-06-30 04:57:31 UTC
Description of problem:
UCS SAN boot system fails to boot up and drops into emergency shell. We believe this is caused by following patch. In our case, stand-by path gets added first and any further I/O gets blocked until an active path is added to multipath map. Unfortunately multipath is delaying adding of active paths to the map until udev processing is complete. By this time /dev/sd path gets mounted as root and any further multipath reloads will fail after this leading it to be stuck with stand-by path forever.

Version-Release number of selected component (if applicable):
7.2 
[root@artemis ~]# uname -r
3.10.0-327.22.2.el7.x86_64



How reproducible:
Consistent everytime.

Steps to Reproduce:
1. Configure UCS system to boot from Nimble volume.
2. On reboot all paths are discovered, but multipath map is created only with standby path and all other paths are orphaned.
3. System fails to boot.

Actual results:
System fails to boot up.

Expected results:
All active paths are added quickly after discovery and boot up.

Additional info:

Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: Direct-Access     Nimble   Server           1.0  PQ: 0 ANSI: 5
Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: supports implicit TPGS
Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: port group 02 rel port 07
Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: rtpg failed with 8000002
Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: port group 02 state S non-preferred supports tolusna
Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: scsi 1:0:1:0: alua: Attached
Jun 29 23:04:41 artemis.rtpspt.nimblestorage.com kernel: sd 1:0:1:0: [sdb] 524288000 512-byte logical blocks: (268 GB/250 GiB)

Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sdb: add path (uevent)
Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sdb: spurious uevent, path already in pathvec
Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com kernel: device-mapper: multipath round-robin: version 1.0.0 loaded
Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: load table [0 524288000 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 1 1 8:16 20]
Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: event checker started
Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sdb [8:16]: path added to devmap mpatha



Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sda: add path (uevent)
Jun 29 23:04:43 artemis.rtpspt.nimblestorage.com multipathd[720]: sdf: spurious uevent, path already in pathvec
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdh: add path (uevent)
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdh: spurious uevent, path already in pathvec
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sde: add path (uevent)
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sde: spurious uevent, path already in pathvec
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdc: add path (uevent)
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdc: spurious uevent, path already in pathvec
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdd: add path (uevent)
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdd: spurious uevent, path already in pathvec
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdg: add path (uevent)
Jun 29 23:04:44 artemis.rtpspt.nimblestorage.com multipathd[720]: sdg: spurious uevent, path already in pathvec


Jun 29 23:05:43 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: startup incomplete. Still waiting on udev
Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: startup incomplete. Still waiting on udev


Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com systemd[1]: Job dev-disk-by\x2duuid-e6cecf6a\x2de56f\x2d4a31\x2dbc8f\x2dd43e1ae2e071.device/start timed out.
Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-e6cecf6a\x2de56f\x2d4a31\x2dbc8f\x2dd43e1ae2e071.device.
-- Subject: Unit dev-disk-by\x2duuid-e6cecf6a\x2de56f\x2d4a31\x2dbc8f\x2dd43e1ae2e071.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dev-disk-by\x2duuid-e6cecf6a\x2de56f\x2d4a31\x2dbc8f\x2dd43e1ae2e071.device has failed.
--
-- The result is timeout.
Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com systemd[1]: Dependency failed for /boot.

-- The start-up result is done.
Jun 29 23:06:13 artemis.rtpspt.nimblestorage.com systemd[1]: Starting Emergency Shell...
-- Subject: Unit emergency.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Jun 29 23:07:13 artemis.rtpspt.nimblestorage.com multipathd[720]: mpatha: startup incomplete. Still waiting on udev

Comment 1 shivamerla1 2016-06-30 04:58:33 UTC
Patch which seems to have caused this.

https://www.redhat.com/archives/dm-devel/2016-March/msg00146.html

Comment 3 Ben Marzinski 2016-06-30 16:29:02 UTC
I'm fairly certain that the delay can safely be removed in the initramfs. Are you able to boot the machine up at all? If so, I can give you a test package that will disable this behavior during the initramfs.

Comment 4 shivamerla1 2016-06-30 21:06:07 UTC
Yes, we are able to boot into rescue kernel. Please provide the test package we can verify that.

Comment 5 Lin Li 2016-07-01 05:40:43 UTC
hello shivamerla1,
There is no Nimble storage in our lab, so could you feedback test result once the fixed version is available?
thanks

Comment 6 shivamerla1 2016-07-01 15:48:54 UTC
Sure, we can test the private packages and provide feedback.

Comment 7 Ben Marzinski 2016-07-14 19:02:34 UTC
Could you please try the packages available at

http://people.redhat.com/~bmarzins/device-mapper-multipath/rpms/RHEL7/bz1350931/

and see if this resolves your issue. You will need to remake the initramfs after installing these packages.

These packages should also fix Bug 1350931

Comment 8 shivamerla1 2016-07-15 16:53:44 UTC
Thanks Ben, These packages seems to fix the issue we have encountered. System boots fine without issues. Please let us know when will updates be available with the fix.

Comment 9 Ben Marzinski 2016-07-20 14:49:08 UTC
Do you know if this also fixes Bug 1350931?

Comment 10 Raunak Kumar 2016-07-20 17:37:54 UTC
Yes Ben, I did basic sanity test and it does fix the Bug 1350931. Thanks

Comment 12 Zhang Yi 2016-08-16 12:33:51 UTC
Hello shivamerla1

Would you pls help test this bug and BZ1291406 with the fixed package? 
Thanks in advance.

Yi

Comment 14 shivamerla1 2016-08-16 16:44:35 UTC
We have already verified the fix with private package. It looks good from our side.

Comment 15 Zhang Yi 2016-08-17 08:24:52 UTC
(In reply to shivamerla1 from comment #14)
> We have already verified the fix with private package. It looks good from
> our side.

Hi shivamerla1 
Thanks for your confirmation, change to VERIFIED.

Yi

Comment 16 shivamerla1 2016-09-09 21:21:27 UTC
Is it possible to provide this fix as part of 7.2 Errata release for device-mapper-multipath?. Currently its only available through Beta repos, which our customers might not be willing to upgrade.

Comment 17 Steven J. Levine 2016-10-14 16:03:09 UTC
Ben:

The doc text for the release note description is nearly identical to the doc text that was provided for the release note description in BZ#1350931.  Is this the same fix?  It seems as though we only need to describe this once in the release notes (while referencing both bugs).

Steven

Comment 18 Ben Marzinski 2016-10-24 17:14:21 UTC
(In reply to Steven J. Levine from comment #17)
> Ben:
> 
> The doc text for the release note description is nearly identical to the doc
> text that was provided for the release note description in BZ#1350931.  Is
> this the same fix?  It seems as though we only need to describe this once in
> the release notes (while referencing both bugs).
> 
> Steven

Yep. These are the same issue.

Comment 20 errata-xmlrpc 2016-11-04 08:20:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2536.html


Note You need to log in before you can comment on or make changes to this bug.