Bug 1385184 - System fails to boot and drops into emergency shell, even with updated multipath-tools package of 7.3 Beta
Summary: System fails to boot and drops into emergency shell, even with updated multip...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: device-mapper-multipath
Version: 7.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: Lin Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-15 00:25 UTC by shivamerla1
Modified: 2021-09-03 12:09 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-20 14:25:15 UTC
Target Upstream Version:


Attachments (Terms of Use)
Journal, multipath and lvm logs. (492.52 KB, application/x-gzip)
2016-10-15 00:25 UTC, shivamerla1
no flags Details

Description shivamerla1 2016-10-15 00:25:15 UTC
Created attachment 1210683 [details]
Journal, multipath and lvm logs.

Description of problem:
System fails to boot and enter into emergency mode. We have updated the multipath.conf after install and rebuilt initrd. Sometimes we are seeing issue where /dev/sd gets mounted as / and multipath device creation fails on root device and few times lvm activation fails for /home partition.

Version-Release number of selected component (if applicable):
3.10.0-327.el7.x86_64

How reproducible:
Consistently on every reboot.

Steps to Reproduce:
1. Install RHEL 7.2 update
2. Update /etc/multipath.conf with Nimble device settings
3. rebuild initramfs
4. reboot.

Actual results:
System drops into emergency mode during boot saying home/boot partitions not found. /dev/sd device gets mounted as /

Expected results:
Systems boots with root mounted as multipath device.

Additional info:

After updating multipath.conf and rebuilding initramfs, every boot falls into emergency shell.



Oct 14 11:45:40 localhost.localdomain systemd[1]: Job dev-mapper-rhel\x2dhome.device/start timed out.
Oct 14 11:45:40 localhost.localdomain systemd[1]: Job dev-mapper-rhel\x2dhome.device/start finished, result=timeout
Oct 14 11:45:40 localhost.localdomain systemd[1]: Timed out waiting for device dev-mapper-rhel\x2dhome.device.
Oct 14 11:45:40 localhost.localdomain systemd[1]: Job home.mount/start finished, result=dependency
Oct 14 11:45:40 localhost.localdomain systemd[1]: Dependency failed for /home


Oct 14 11:45:40 localhost.localdomain systemd[1]: Job dev-disk-by\x2duuid-db3f03d7\x2dbcfb\x2d436d\x2d9fa8\x2dc3246e87558e.device/start timed out.
Oct 14 11:45:40 localhost.localdomain systemd[1]: Job dev-disk-by\x2duuid-db3f03d7\x2dbcfb\x2d436d\x2d9fa8\x2dc3246e87558e.device/start finished, result=timeout
Oct 14 11:45:40 localhost.localdomain systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-db3f03d7\x2dbcfb\x2d436d\x2d9fa8\x2dc3246e87558e.device.
Oct 14 11:45:40 localhost.localdomain systemd[1]: Job boot.mount/start finished, result=dependency
Oct 14 11:45:40 localhost.localdomain systemd[1]: Dependency failed for /boot.

Comment 2 Ben Marzinski 2016-11-29 23:13:26 UTC
So, something clearly seems off here. Looking at journal, you can see that the boot sequence makes it past switching from the initramfs to the regular filesystem

Here:
Oct 14 11:44:09 localhost.localdomain systemd[1]: Starting Switch Root.
Oct 14 11:44:10 localhost.localdomain systemd[1]: Stopped Switch Root.

Before this point multipath is running in the intramfs, and after it, multipath is running on the regular root filesystem.

While running in the initramfs, I see:
Oct 14 11:44:06 localhost.localdomain systemd-udevd[476]: '/sbin/multipath -c /dev/sda' [566] exit with return code 1
Oct 14 11:44:06 localhost.localdomain systemd-udevd[482]: '/sbin/multipath -c /dev/sdb' [575] exit with return code 1
Oct 14 11:44:06 localhost.localdomain systemd-udevd[470]: '/sbin/multipath -c /dev/sdc' [610] exit with return code 1
Oct 14 11:44:06 localhost.localdomain systemd-udevd[471]: '/sbin/multipath -c /dev/sdd' [619] exit with return code 1

While running in the regular root filesystem, I see:
Oct 14 11:44:12 localhost.localdomain systemd-udevd[1040]: '/sbin/multipath -c /dev/sdc' [1118] exit with return code 0
Oct 14 11:44:12 localhost.localdomain systemd-udevd[975]: '/sbin/multipath -c /dev/sda' [1119] exit with return code 0
Oct 14 11:44:12 localhost.localdomain systemd-udevd[1036]: '/sbin/multipath -c /dev/sdb' [1159] exit with return code 0
Oct 14 11:44:12 localhost.localdomain systemd-udevd[1046]: '/sbin/multipath -c /dev/sdd' [1164] exit with return code 0

I can also see:
Oct 14 11:44:15 localhost.localdomain multipathd[1320]: mpatha: ignoring map
Oct 14 11:44:15 localhost.localdomain multipathd[1320]: mpatha: ignoring map
Oct 14 11:44:15 localhost.localdomain multipathd[1320]: mpatha: ignoring map
Oct 14 11:44:15 localhost.localdomain multipathd[1320]: mpatha: ignoring map

What this means is that when multipath was running in the initramfs, it was not correctly identifying the devices as multipath paths, and so, it was not creating multipath devices on top of them. In fact, it didn't correctly identify the devices as paths until after systemd already started trying to mount /home and /boot.  By this time multipathd was unable to create the devices, presumably because something already had them open.  I assume multipathd was racing to grab those devices, and failed, but the udev rules claimed them anyway.

Can you make sure that /etc/multipath.conf, /etc/multipath/bindings and /etc/multipath/wwids match up between the what is on the regular filesystem, and what is in the initramfs?

# lsinitrd -f <file> <initrd_image>

will show you the contents of <file> on <initrd_image>.

Comment 3 shivamerla1 2016-12-05 05:25:32 UTC
I will ask QA to reproduce this issue again and provide the necessary information this week.

Comment 4 shivamerla1 2016-12-20 01:45:20 UTC
Hi Ben, i have not received any update internally on this. Please close for the case for now. I will re-open the bug if we see this again.

Comment 5 Ben Marzinski 2016-12-20 14:25:15 UTC
Sure.


Note You need to log in before you can comment on or make changes to this bug.