Bug 1152948

Summary:

[3.5-7.0][dracut] Failed to boot and report "Dracut: FATAL：Failed to mount block device of live image, System halted"

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

cshao <cshao>

Component:

ovirt-node

Assignee:

Fabian Deutsch <fdeutsch>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.5.0

CC:

aberezin, aburden, bmarzins, cdupuis, cshao, ecohen, fdeutsch, gklein, hadong, harald, huiwa, iheim, juwu, leiwang, lsurette, rbarry, yaniwang, ycui

Target Milestone:

---

Keywords:

TestOnly

Target Release:

3.5.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

node

Fixed In Version:

rhev-hypervisor6-6.6-20141218.0.iso rhev-hypervisor7-7.0-20141218.0.iso

Doc Type:

Known Issue

Doc Text:

A multipath regression meant multipath was incorrectly claiming devices, which prevented the Red Hat Enterprise Virtualization Hypervisor to boot because it could not locate the root file system. This has been adjusted so that multipath will only claim multipath devices and the Hypervisor boots as expected.

Story Points:

---

Clone Of:

Clones:

1167620 (view as bug list)

Environment:

Last Closed:

2015-02-11 21:04:33 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Node

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1094719, 1164308, 1164311, 1167620

Attachments:

Description	Flags
dracut.png	none
rdsosreport.txt	none
rdsosreport(1030build).txt	none
console_output log	none
rdsosreport_rhevh720141107.txt	none
rdsosreport-1113.txt	none
rdsosreport-1114.txt	none
rdsosreport-1118.txt	none
rdsosreport-1119.txt	none
hardware-info.tar.gz	none
rdsosreport_20141120.txt	none
fc-lun	none
rdsosreport_20141201.txt for comment 67	none
iscsi-vs-fc.png	none
mulitpath.conf	none
wwids	none
/var/log	none
journalctl -b for comment 83	none
dmesg from failed boot	none
1212.tar.gz	none
1204-vs-1212.png	none

Description cshao 2014-10-15 09:10:18 UTC

Created attachment 947164 [details]
dracut.png

Description of problem:
RHEVH7 Failed to boot and report "Dracut: FATAL：Failed to mount block device of live image, System halted"

There can only obtain below info on monitor, I can't provide more log info for debug due to can't login the hypervisor.
=============================================================
Dracut-initqueue[424]: mount:/dev/sda3 is already mounted or /run/initramfs/live busy
Dracut: FATAL：Failed to mount block device of live image
Dracut: Refusing to continue
System halted.

This issue often happens on below machines:
Dell OPTIPLEX 9010
Dell OPTIPLEX 790
hp-z220(8G CORE4 ENT01 EPT INTEL QEPDU14 SINGLE VMX)

No such issue on VM.

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.0-20141006.0.el7ev
ovirt-node-3.1.0-0.20.20141006gitc421e04.el7.noarch.rpm 
dracut-033-161.el7.x86_64
kernel-3.10.0-123.8.1.el7.x86_64

How reproducible:
70%

Steps to Reproduce:
1. Install RHEVH7.0 build to above machine.
2. Reboot RHEVH.
3.

Actual results:
RHEVH7 Failed to boot and report "Dracut: FATAL：Failed to mount block device of live image, System halted"

Expected results:
Boot RHEVH can succeed on all machines.

Additional info:

Comment 1 Ying Cui 2014-10-15 10:00:49 UTC

We can reproduce this issue on 70%+ test machines, so I update the priority to Urgent to make more attention on it.

Comment 2 Fabian Deutsch 2014-10-15 12:33:31 UTC

Could you please try to gather more logs from dracut. Please use the rdsosreport tool.

Comment 3 cshao 2014-10-16 03:21:08 UTC

Created attachment 947399 [details]
rdsosreport.txt

Add "rd.debug single" to grub, and boot the hypervisor will generate rdsosreport.txt on path "/run/initramfs/rdsosreport.txt".

Comment 4 Fabian Deutsch 2014-10-23 12:36:30 UTC

Harald, do you have an idea why this could happen - it seems that this is happening inside dracut, without any of our code jumping in.

Comment 5 Harald Hoyer 2014-10-24 09:16:32 UTC

Ah, multipath!

/sbin/dmsquash-live-root /dev/sda3
is called. So somehow one of the udev rules:

KERNEL=="disk/by-label/Root", RUN+="/sbin/initqueue --settled --onetime --unique /sbin/dmsquash-live-root $env{DEVNAME}" 
SYMLINK=="disk/by-label/Root", RUN+="/sbin/initqueue --settled --onetime --unique /sbin/dmsquash-live-root $env{DEVNAME}"

was triggered by sda3 before it was added to the multipath device

/dev/mapper/ST500DM002-1BD142_W2ACE2N6p3: LABEL="Root" UUID="5b9d68a2-2125-4a81-9684-652967d945e6" TYPE="ext2" PARTLABEL="primary" PARTUUID="8a0b5636-8293-492d-93fe-27ee1b70f87e" 

/dev/disk/by-label:
lrwxrwxrwx 1 root 0 10 Oct 16 03:06 Root -> ../../dm-3

Comment 6 Harald Hoyer 2014-10-24 09:18:44 UTC

Proposed patch: http://git.kernel.org/cgit/boot/dracut/dracut.git/commit/?id=d829e7fce273e9dbd8a71dbf71612556331f28fa

Comment 8 Fabian Deutsch 2014-10-30 10:28:13 UTC

Sorted out on IRC, now the request:

1. boot into grub
2. add rdshell to the cmdline and remove the quiet arg
3. boot
4. get rdsosreport and attach it to this bug

Comment 9 Fabian Deutsch 2014-10-30 10:29:23 UTC

Oops. The last comment was actually for a different bug.

Comment 14 cshao 2014-11-05 06:03:51 UTC

Created attachment 953922 [details]
rdsosreport(1030build).txt

Please refer the new attachment rdsosreport(1030build).txt.
Thanks!

Comment 19 Ying Cui 2014-11-11 12:12:37 UTC

Still encounter this issue.

Test version:
rhev-hypervisor7-7.0-20141107.0
ovirt-node-3.1.0-0.25.20141107gitf6dc7b9.el7.noarch
device-mapper-1.02.84-14.el7.x86_64
dracut-033-161.el7_0.173.x86_64

Command line: BOOT_IMAGE=/vmlinuz0 root=live:LABEL=Root ro rootfstype=auto rootflags=ro ksdevice=bootif rd.lvm=0 rd.dm=0 elevator=deadline lang= max_loop=256 rd.md=0 rd.live.check rd.luks=0 rd.live.image crashkernel=128M rd.debug single console=ttyS0,115200n8

dracut:/#blkid
/dev/sda3: LABEL="Root" UUID="34a23678-013c-41cc-8b15-d741ab4beb33" TYPE="ext2" PARTLABEL="primary" PARTUUID="c8be010a-f7bd-4439-80a7-fa129fd6dcae"
/dev/sda2: LABEL="RootBackup" UUID="11f00a8e-7977-45fe-9c66-37543738ad27" TYPE="ext2" PARTLABEL="primary" PARTUUID="51e2c4b0-4efa-455d-a762-dea891d9d2a9"
/dev/sda4: UUID="UVqdV1-141O-yQUE-tsc0-ruhh-DcFp-jhTsXc" TYPE="LVM2_member" PARTLABEL="primary" PARTUUID="405c1e02-8b0e-41c5-886c-3e17df1c88c7"
/dev/mapper/TOSHIBA_DT01ACA100_33A7TVBMS2: LABEL="RootBackup" UUID="11f00a8e-7977-45fe-9c66-37543738ad27" TYPE="ext2" PARTLABEL="primary" PARTUUID="51e2c4b0-4efa-455d-a762-dea891d9d2a9"
/dev/mapper/TOSHIBA_DT01ACA100_33A7TVBMS3: LABEL="Root" UUID="34a23678-013c-41cc-8b15-d741ab4beb33" TYPE="ext2" PARTLABEL="primary" PARTUUID="c8be010a-f7bd-4439-80a7-fa129fd6dcae"
/dev/mapper/TOSHIBA_DT01ACA100_33A7TVBMS4: UUID="UVqdV1-141O-yQUE-tsc0-ruhh-DcFp-jhTsXc" TYPE="LVM2_member" PARTLABEL="primary" PARTUUID="405c1e02-8b0e-41c5-886c-3e17df1c88c7"
/dev/mapper/TOSHIBA_DT01ACA100_33A7TVBMS: PTTYPE="gpt"

Comment 20 Ying Cui 2014-11-11 12:13:58 UTC

Created attachment 956265 [details]
console_output log

Comment 21 Ying Cui 2014-11-11 12:14:41 UTC

Created attachment 956266 [details]
rdsosreport_rhevh720141107.txt

Comment 23 cshao 2014-11-18 05:26:42 UTC

Created attachment 958417 [details]
rdsosreport-1113.txt

(In reply to Fabian Deutsch from comment #22)
> Hey, could you please check if this build fixes the problem:
> 
> https://brewweb.devel.redhat.com/buildinfo?buildID=398693
> 
> This build contains the complete udev fix.

Test version:
rhev-hypervisor7-7.0-20141113.0

Test machine:
Dell OPTIPLEX 9010

Test result:
Still met kernel panic issue.
Please refer new attachment "rdsosreport-1113.txt" for more info.

Thanks!

Comment 25 cshao 2014-11-18 09:31:59 UTC

Created attachment 958479 [details]
rdsosreport-1114.txt

(In reply to Fabian Deutsch from comment #24)
> I apologize, I used the incorrect brew link:-/
> 
> Please use:
> 
> rhev-hypervisor7-7.0-20141114.0.iso 
> https://brewweb.devel.redhat.com/taskinfo?taskID=8250792

Test version:
rhev-hypervisor7-7.0-20141114.0

Test machine:
Dell OPTIPLEX 9010
Dell 790

Test result:
Still met kernel panic issue.
Please refer new attachment "rdsosreport-1114.txt" for more info.

Thanks!

Comment 28 cshao 2014-11-19 06:10:07 UTC

Created attachment 958891 [details]
rdsosreport-1118.txt

Test version:
rhev-hypervisor7-7.0-20141118.0.iso

How reproducible:
30%

Test machine:
Dell 9010

Test result:
Still met kernel panic issue with 30% reproduce rate.
Please refer attachment "rdsosreport-1118" for more details.

Thanks!

Comment 29 Fabian Deutsch 2014-11-19 09:10:21 UTC

Thanks Chen Shao.
Could you please check if this bug also exists with a stcok RHEL 7 livecd? i.e. the boot iso

Comment 30 Fabian Deutsch 2014-11-19 09:49:52 UTC

Harald, do you maybe see any suspicious in the rdsosreport in comment 2?

Comment 34 Fabian Deutsch 2014-11-20 02:51:42 UTC

If this is still reproducable with the build in comment 33, please provide details on the hosts:

1. Exact name/manufacturer/typee
2. Storage hardware

Comment 35 cshao 2014-11-20 06:15:10 UTC

(In reply to Fabian Deutsch from comment #34)
> If this is still reproducable with the build in comment 33, please provide
> details on the hosts:
Yes, still can reproduce with low reproduce rate(<20).

> 1. Exact name/manufacturer/typee

> 2. Storage hardware

Test machine:
Dell 9010

Test version:
rhev-hypervisor7-7.0-20141119.0.iso
ovirt-node-3.1.0-0.27.20141119git24e087e.el7.noarch

DMI: Dell Inc. OptiPlex 9010/0H5XPC, BIOS A14 06/11/2013
SMBIOS 2.7 present.
TOSHIBA_DT01ACA100_33A7TVBMS dm-0 ATA     ,TOSHIBA DT01ACA1
size=932G features='0' hwhandler='0' wp=rw

please see attachment "dmesg+cpu+lspci" for more hardware info.

Thanks!

Comment 36 cshao 2014-11-20 06:17:01 UTC

Created attachment 959245 [details]
rdsosreport-1119.txt

Comment 37 cshao 2014-11-20 06:17:41 UTC

Created attachment 959246 [details]
hardware-info.tar.gz

Comment 45 cshao 2014-11-21 08:42:22 UTC

Test version:
rhev-hypervisor7-7.0-20141120.0.iso
ovirt-node-3.1.991-0.0.master.el7.centos.noarch

Download link:
http://download.devel.redhat.com/brewroot/work/tasks/6476/8276476/rhev-hypervisor7-7.0-20141120.0.iso

Test machine:
dell-per515-01

I test several times didn't met the system halted issue.


@ycui,

How about dell 9010?

Comment 46 Fabian Deutsch 2014-11-21 09:08:02 UTC

A build with the final fix is not yet available. So this bug can probably still be reproduced.

Comment 47 Ying Cui 2014-11-21 09:49:54 UTC

> @ycui,
> 
> How about dell 9010?

Tested the build(rhev-hypervisor7-7.0-20141120.0) Ryan provided it this morning, test 3 times, 1 time system halted on dell 9010 machine. So the bug is still here.

------screen output/display----
Dracut-initqueue[416]: mount:/dev/sda3 is already mounted or /run/initramfs/live busy
Dracut: FATAL：Failed to mount block device of live image
Dracut: Refusing to continue
System halted.

Comment 49 Ryan Barry 2014-11-21 14:14:17 UTC

(In reply to Ying Cui from comment #47)
> > @ycui,
> > 
> > How about dell 9010?
> 
> Tested the build(rhev-hypervisor7-7.0-20141120.0) Ryan provided it this
> morning, test 3 times, 1 time system halted on dell 9010 machine. So the bug
> is still here.
> 
> ------screen output/display----
> Dracut-initqueue[416]: mount:/dev/sda3 is already mounted or
> /run/initramfs/live busy
> Dracut: FATAL：Failed to mount block device of live image
> Dracut: Refusing to continue
> System halted.

Can we grab a new rdsosreport?

Comment 52 Ying Cui 2014-11-21 15:36:58 UTC

Additional info:
1. After communicated with cshao on this bug today, we can not reproduce this bug on QE's dell server(r515 and r510), so we may doubt this bug only occur on desktop and workstation.

2. After rhevh7 installed, sometimes I can boot into rhevh7 correct, no such issue, try to reboot this host again and again, sometimes this bug happen.
so the bug workaround is to reboot rhevh7 host again and again till you can boot successful to login page.(for testing experience today, reboot host 3 times, you encounter this bug once)

Just for reference.

Comment 53 Fabian Deutsch 2014-11-21 18:59:29 UTC

Please provide all the rdsosreports from all failures.
Otherwise we will not be able to nail this bug down.

Comment 54 Ying Cui 2014-11-22 04:00:55 UTC

Created attachment 960054 [details]
rdsosreport_20141120.txt

Comment 56 Fabian Deutsch 2014-11-24 14:32:38 UTC

Ying, could you please try the following:

Please install RHEV-H with the nompath keyword.

Please make sure that it's used during installation and on regular boots.

Comment 57 Harald Hoyer 2014-11-24 14:35:44 UTC

Workaround for now: boot with "nompath" on the kernel command line.

This has to be fixed in the multipath/udev integration. At no point of time should udev be settled and the original device have the /dev/disk symlink.

Asynchronous claiming devices for multipath only leads to race conditions and I will never add "sleep 10" to be sure multipathd has kicked in and the device symlinks are rerouted.

Comment 60 cshao 2014-11-25 07:44:19 UTC

Created attachment 961044 [details]
fc-lun

Comment 61 Fabian Deutsch 2014-11-25 07:50:01 UTC

Comment 59 confirms that the workaround works, thus it seems to be an issue with device-mapper-multipath.

Comment 62 Fabian Deutsch 2014-11-25 08:00:19 UTC

According to our current understanding the problem is as follows:

During boot, a race occurs between simple udev devices (vd*, sd*, hd*) and assembled mpath devices (dev/mapper/mpath*).

The assembly of the multipath devices happens asynchronously - as noted in comment 57 - thus, sometimes they get picked up by the boot logic, but sometimes, the mpath devices are not ready in time, and the simple udev devices are used.
If a simple udev device is used, the described bug appears, once the multipath device get's assembled, because from that point on the simple udev device will not be accessible anymore.

The workaround, until the race has been fixed, is to disable multipath in initrd using mpath.
The sideeffect is that no multipath is available for the rootfs.

For bug verification: Please verify this bug by verifying the workaround. (appending mpath to the kernel cmdline).

Comment 64 Ying Cui 2014-11-26 03:27:00 UTC

Change this bug back to MODIFIED and waiting the platform bug 1167620 fix firstly, then build new rhevh image to check this bug.

Comment 65 Julie 2014-11-26 08:11:21 UTC

Doc text is updated for 3.5 Beta 5. Please make sure you update the doc text for GA release note once the bug fix is fixed or set the 'requires_release_note' flag to -.

Cheers,
Julie

Comment 68 Ying Cui 2014-12-02 12:09:41 UTC

Created attachment 963681 [details]
rdsosreport_20141201.txt for comment 67

Comment 69 cshao 2014-12-03 02:54:14 UTC

(In reply to Ying Cui from comment #68)
> Created attachment 963681 [details]
> rdsosreport_20141201.txt for comment 67

Hi ycui,

Thank you. I appreciate your help. So cancel the needinfo flag.
Thanks!

Comment 73 cshao 2014-12-05 07:45:47 UTC

Created attachment 965011 [details]
iscsi-vs-fc.png

Comment 76 Ryan Barry 2014-12-05 16:26:25 UTC

Regarding comment 71, this should have been a very small change which didn't go anywhere near the installer code, but I'll grab the image and see if I can reproduce.

Ben -

After booting an image with "find_multipaths yes" in multipath.conf in the initrd, there appears to be a behavior change on images.

dmsetup shows multipath grabbing an iscsi device:

HostVG-Logging: 0 4194304 linear 8:20 8128512
HostVG-Swap: 0 8110080 linear 8:20 2048
HostVG-Data: 0 11517952 linear 8:20 12322816
WDC_WD2502ABYS-18B7A0_WD-WCAT19558392: 0 488281250 multipath 0 0 1 1 service-time 0 1 2 8:0 1 1 
live-base: 0 3145728 linear 7:1 0
36090a038d0f731381e035566b2497f85: 0 62914560 multipath 0 0 1 1 service-time 0 1 2 8:32 1 1 
HostVG-Config: 0 16384 linear 8:20 8112128
live-rw: 0 3145728 snapshot 7:1 7:2 P 8

And multipath -ll shows the same:

WDC_WD2502ABYS-18B7A0_WD-WCAT19558392 dm-6 ATA     ,WDC WD2502ABYS-1
size=233G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 0:0:0:0 sda 8:0  active ready running
36090a038d0f731381e035566b2497f85 dm-7 EQLOGIC ,100E-00         
size=30G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 4:0:1:0 sdc 8:32 active ready running

But there's no matching device in /dev/mapper:
crw-------. 1 root root 10, 236 Dec  5 07:40 control
lrwxrwxrwx. 1 root root       7 Dec  5 07:40 HostVG-Config -> ../dm-3
lrwxrwxrwx. 1 root root       7 Dec  5 07:40 HostVG-Data -> ../dm-5
lrwxrwxrwx. 1 root root       7 Dec  5 07:40 HostVG-Logging -> ../dm-4
lrwxrwxrwx. 1 root root       7 Dec  5 07:40 HostVG-Swap -> ../dm-2
lrwxrwxrwx. 1 root root       7 Dec  5 07:40 live-base -> ../dm-1
lrwxrwxrwx. 1 root root       7 Dec  5 07:40 live-rw -> ../dm-0
lrwxrwxrwx. 1 root root       7 Dec  5 07:41 WDC_WD2502ABYS-18B7A0_WD-WCAT19558392 -> ../dm-6

Any idea why this would be?

Comment 77 Ben Marzinski 2014-12-05 17:23:44 UTC

I don't understand why multipath would grab those devices in the first place, if find_multipaths was enabled. With find_multipaths enabled, multipath should only create a device if one of three things is true:

1. It sees two different paths with the same wwids. Looking at the multipath -l
output:

WDC_WD2502ABYS-18B7A0_WD-WCAT19558392 dm-6 ATA ,WDC WD2502ABYS-1
size=233G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
`- 0:0:0:0 sda 8:0 active ready running
36090a038d0f731381e035566b2497f85 dm-7 EQLOGIC ,100E-00
size=30G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
`- 4:0:1:0 sdc 8:32 active ready running

Each of these devices only has one path.

2. multipath is specifically told to run on the device by running (for instance)

# multipath /dev/sda

This overrides find_multipaths. Do you know if this is happening? multipath is never called like this during the regular RHEL boot process. I'm not sure if things work differently in RHEV

3. the wwid for the device already exists in /etc/multipath/wwids. If this is a stock initramfs, and mpath.wwid wasn't on the kernel command line, I don't see how this would be possible.

As to why no device got created, I have no idea. udev is responsible for that. If you remove the devices and recreate them (this should work since they should now be in the wwids file), do the symlinks get created correctly?

Could you post the initramfs file somewhere I can look at it?

Comment 78 Ryan Barry 2014-12-05 18:57:53 UTC

Created attachment 965205 [details]
mulitpath.conf

Comment 79 Ryan Barry 2014-12-05 18:58:31 UTC

Created attachment 965207 [details]
wwids

Comment 80 Ryan Barry 2014-12-05 19:05:07 UTC

I should have clarified -- this behavior is on a running system, not in the initrd.

I attached multipath.conf and wwids from the running system.  find_multipaths is not enabled there, but never has been, to my knowledge. Should it be?

Unfortunately, I don't know what steps QE took to get them to their current state. The only references I see to multipath in the code are "multipath -r" and various incantation so "multipath -ll", never directly "multipath /dev/foo"

The initrd is at:

http://rbarry.org/initrd0.img

Comment 82 Ben Marzinski 2014-12-08 16:02:53 UTC

(In reply to Ryan Barry from comment #80)
> I should have clarified -- this behavior is on a running system, not in the
> initrd.
> 
> I attached multipath.conf and wwids from the running system. 
> find_multipaths is not enabled there, but never has been, to my knowledge.
> Should it be?

With find_multipaths enbabled (and the wwids removed from /etc/multipath/wwids) you won't be multipathing these single path devices at all. Do you want them multipathed? At any rate, the /etc/multipath.conf used in the regular filesystem should match the one used in the initramfs. In Comment 76, you said that the initramfs used find_multipaths, so it should be enabled in the regular file system as well.

> Unfortunately, I don't know what steps QE took to get them to their current
> state. The only references I see to multipath in the code are "multipath -r"
> and various incantation so "multipath -ll", never directly "multipath
> /dev/foo"

Without find_multipaths enabled, why the multipath devices exist makes sense.  So the only question is, "why didn't the device symlink get created?" Do you know if /dev/dm-7?  Like I said before, this is all handled by udev, and I don't know why it would fail to create the symlink

> The initrd is at:
> 
> http://rbarry.org/initrd0.img

Comment 84 Ying Cui 2014-12-09 08:03:18 UTC

Created attachment 966113 [details]
/var/log

> Ying, it would be important to get a logfile from such a failed attempt, at
> best:
> all of /var/log and journalctl -b

attachment for comment 83 and bug 1136300.

Comment 85 Ying Cui 2014-12-09 08:05:37 UTC

Created attachment 966114 [details]
journalctl -b for comment 83

Comment 86 Ying Cui 2014-12-09 08:16:49 UTC

/tmp/ovirt.log in attachment 966113 [details] as ovirt-tmp.log

Comment 87 Fabian Deutsch 2014-12-09 14:59:30 UTC

While playing with the device which does not show up as an mpath device, I noticed:

[root@dell-pet105-02 ~]# multipath /dev/sdb
Dec 09 14:57:57 | multipath.conf +5, invalid keyword: getuid_callout
Dec 09 14:57:57 | multipath.conf +37, invalid keyword: getuid_callout
Dec 09 14:57:57 | 36090a038d0f721901d033566b2493f23: ignoring map

[root@dell-pet105-02 ~]# dmesg | tail
[31413.413945] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[66997.450122] systemd-journald[1984]: Vacuuming done, freed 0 bytes
[120186.991804] systemd-journald[1984]: Vacuuming done, freed 0 bytes
[173307.255451] systemd-journald[1984]: Vacuuming done, freed 0 bytes
[226529.492222] systemd-journald[1984]: Vacuuming done, freed 0 bytes
[279556.885257] systemd-journald[1984]: Vacuuming done, freed 0 bytes
[332666.181429] systemd-journald[1984]: Vacuuming done, freed 0 bytes
[360952.614330] NFSD: starting 90-second grace period (net ffffffff819a29c0)
[371632.245637] device-mapper: table: 253:8: multipath: error getting device
[371632.245643] device-mapper: ioctl: error adding target to table
[root@dell-pet105-02 ~]#


Ben - can you tell what these errors are about?

(The first invalid keyowrk problem is tracked here: bug 1172186)

Comment 89 Ben Marzinski 2014-12-09 18:21:04 UTC

This all makes sense:

In RHEL7, multipath no longer uses "getuid_callout" to grab the wwid.  It gets the wwid from the udev database or the udev environment variables (if the value is not in the database yet).  By default is uses ID_SERIAL.  You can change that with the "uid_attribute" option.  Having "getuid_callout" in your config file will trigger a warning, but doesn't harm anything.

Also, running multipath will try to multiphath /dev/sdb, since find_multipaths in not enabled in /etc/multipath.conf, and the device is not blacklisted. When you run

# multipath /dev/sdb

That will attempt to multipath /dev/sdb, even if find_multipaths is enabled, since you explicitly specified the device.

Multipath will fail on /dev/sdb because the device is already in use:

/dev/sdb3 is currently mounted as /run/initramfs/live

and

/dev/sdb4 is PV for a currently active Volume group.

This is all working as designed.

Comment 90 Ryan Barry 2014-12-09 19:50:18 UTC

I'm most curious why it's /dev/sdb and not /dev/mapper/36090a038d0f721901d033566b2493f23

From an earlier spin of EL7-based RHEV-H with an iSCSI root:

# dmsetup table
HostVG-Logging: 0 4194304 linear 253:7 5105664
HostVG-Swap: 0 5087232 linear 253:7 2048
HostVG-Data: 0 6471680 linear 253:7 9299968
QEMU_HARDDISK_QM00001p3: 0 499712 linear 253:0 999424
live-base: 0 3145728 linear 7:1 0
QEMU_HARDDISK_QM00001p2: 0 499712 linear 253:0 499712
QEMU_HARDDISK_QM00001p1: 0 497664 linear 253:0 2048
360000000000000000e00000000040001p3: 0 15775744 linear 253:4 999424
360000000000000000e00000000040001p2: 0 499712 linear 253:4 499712
360000000000000000e00000000040001p1: 0 497664 linear 253:4 2048
HostVG-Config: 0 16384 linear 253:7 5089280
live-rw: 0 3145728 snapshot 7:1 7:2 P 8
QEMU_HARDDISK_QM00001: 0 16777216 multipath 0 0 1 1 service-time 0 1 2 8:0 1 1 
360000000000000000e00000000040001: 0 16777216 multipath 0 0 1 1 service-time 0 1 2 8:16 1 1 

There are dm devices for the partitions. Our fear at this point is that we saw a similar problem which required a udev patch.

Namely, our installer doesn't use Anaconda, and we pull a weighted list of disks. But when multipath claims the disk and we try to partition /dev/sd[ab..], it fails, since the disk is already claimed.

With the disk claimed but no device-mapper devices created that we can use for installing or partitioning, a number of operations fail...

So we're basically hoping to either:

Have device-mapper claim all the devices, including partitions. This would be ideal, since it wouldn't be a behavior change (and while a behavior change is ok in EL7, we're seeing the same things in 6.6).

Use /dev/sd* devices, but not have multipath claim them. Another team handles the multipath.conf on running images (synced from the management engine), but we can probably get them to add find_multipaths...

Comment 91 Fabian Deutsch 2014-12-09 20:48:04 UTC

Summarizing the god and bad from comment 70 onwards:

Progress:
1. Comment 72: THIS bug is fixed with local disk storage
2. Comment 72: THIS bug is fixed with FC disk storage
3. Comment 72: THIS bug is fixed with iSCSI disk storage

The progress shows that this bug really solved the multipath race we saw before.
The race was: Before multipather claimed a raw device, dracut decided to use the raw device (/dev/sdb) for booting.
When the final switchroot/mounts took place, the raw device got assembled into a dm mpath device (which included that the raw device got claimed by dm).

By using the find_multipath directive, multipath will not assemble the mpath device, and dm will thus not claim it, that is why we can boot.

A note on the specific machine used for testing: The rootfs resides on a not-multipathed-device. This means: The OS is only seeing one path to teh iSCSI target, means, no multipath device should not (and is not) created.

Ying, what is intended here? Should the iSCSI target be reachable by multiple paths or a asingle path?
We need to clarify this to know if everything is discovered correctly.

Regressions:
1. Comment 71: bug 1136300 appears again
Partitioning fails, disks claimed?

2. Comment 72:
iSCSI devices seem to not use multiple paths

3. Comment 72:
iSCSI device is missing symlinks in mapper (looks like a regular disks)

Possible explanations:
* Regression 1:
This can be a race of claiming disks or partitions during the installation.

* Regression 2:
This can happen because the target is only accessible via one path.
Please check that there are really several paths to the target.

Looking at the machine, it looks like there is only a single path,
this means that there is nothing wrong.

* Regression 3:
It seems that the symlinks to the iSCSI device is missing, that is why it
appears as a regular disk
We need to check with the udev team why this could be happening.

Ying/Chen, can you please double check what kind of targets are exported from the EQLOGIC?

Harald, can you tell why the symlinks into /dev/mapper/ were not created for the iSCSI device in use (/dev/sdb)? Please look at comment 88 of how to access the machine.

Comment 92 Ying Cui 2014-12-10 03:33:28 UTC

adding back needinfo for harald.

Comment 93 Ying Cui 2014-12-10 03:34:31 UTC

Chen, for comment 91, thanks.

Comment 94 cshao 2014-12-10 07:51:06 UTC

> 2. Comment 72:
>    iSCSI devices seem to not use multiple paths

This iscsi machine using hardware iscsi device but not software iscsi.
Seem iscsi devices is use multiple paths, the output as follows:
But I don't known why the boot lun(36090a038d0f721901d033566b2493f23) can't be listed by run multipath -ll. actually it could be listed with other rhevh build.

# multipath -ll
Dec 10 07:43:26 | multipath.conf +5, invalid keyword: getuid_callout
Dec 10 07:43:26 | multipath.conf +18, invalid keyword: getuid_callout
Dec 10 07:43:26 | multipath.conf +37, invalid keyword: getuid_callout
WDC_WD2502ABYS-18B7A0_WD-WCAT19558392 dm-6 ATA     ,WDC WD2502ABYS-1
size=233G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 0:0:0:0 sda 8:0  active ready running
36090a038d0f731381e035566b2497f85 dm-7 EQLOGIC ,100E-00         
size=30G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 4:0:1:0 sdc 8:32 active ready running


> Ying/Chen, can you please double check what kind of targets are exported
> from the EQLOGIC?

qla4xxx: 

# iscsiadm -m session
qla4xxx: [1] 10.66.90.100:3260,1 iqn.2001-05.com.equallogic:0-8a0906-9021f7d03-233f49b26635031d-s1-gouyang-165404-01 (flash)
qla4xxx: [2] 10.66.90.100:3260,1 iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-165404-02 (flash)

Comment 95 Fabian Deutsch 2014-12-10 09:52:51 UTC

(In reply to shaochen from comment #94)
> > 2. Comment 72:
> >    iSCSI devices seem to not use multiple paths
> 
> This iscsi machine using hardware iscsi device but not software iscsi.
> Seem iscsi devices is use multiple paths, the output as follows:
> But I don't known why the boot lun(36090a038d0f721901d033566b2493f23) can't
> be listed by run multipath -ll. actually it could be listed with other rhevh
> build.
> 
> # multipath -ll
> Dec 10 07:43:26 | multipath.conf +5, invalid keyword: getuid_callout
> Dec 10 07:43:26 | multipath.conf +18, invalid keyword: getuid_callout
> Dec 10 07:43:26 | multipath.conf +37, invalid keyword: getuid_callout
> WDC_WD2502ABYS-18B7A0_WD-WCAT19558392 dm-6 ATA     ,WDC WD2502ABYS-1
> size=233G features='0' hwhandler='0' wp=rw
> `-+- policy='service-time 0' prio=1 status=active
>   `- 0:0:0:0 sda 8:0  active ready running
> 36090a038d0f731381e035566b2497f85 dm-7 EQLOGIC ,100E-00         
> size=30G features='0' hwhandler='0' wp=rw
> `-+- policy='service-time 0' prio=1 status=active
>   `- 4:0:1:0 sdc 8:32 active ready running

Yes, but there was something wrong with the other RHEV-H 7.0 builds. They _always_ tried to use multipath - which is wrong.
With the build which is used here, multtipaths are only used if the same disk (based on the serial) appears at least twice.
And that is not the case here. The two disks exposed by the HBA use different serials, so are different disks, see below.

> > Ying/Chen, can you please double check what kind of targets are exported
> > from the EQLOGIC?
> 
> qla4xxx: 
> 
> # iscsiadm -m session
> qla4xxx: [1] 10.66.90.100:3260,1
> iqn.2001-05.com.equallogic:0-8a0906-9021f7d03-233f49b26635031d-s1-gouyang-
> 165404-01 (flash)
> qla4xxx: [2] 10.66.90.100:3260,1
> iqn.2001-05.com.equallogic:0-8a0906-3831f7d03-857f49b26655031e-s1-gouyang-
> 165404-02 (flash)

Yes, I see two sessions, but the devices these session point to, seem to be different ones. The serials are different:

[root@dell-pet105-02 ~]# lsblk --nodeps -o name,serial
lsblk: dm-7: failed to get device path
NAME  SERIAL
sda   WD-WCAT19558392
sdb   6090a038d0f721901d033566b2493f23
sdc   6090a038d0f731381e035566b2497f85
…

I wonder if the eqlogic is configured correctly?

Comment 96 Harald Hoyer 2014-12-10 12:20:48 UTC

(In reply to Fabian Deutsch from comment #91)
> Harald, can you tell why the symlinks into /dev/mapper/ were not created for
> the iSCSI device in use (/dev/sdb)? Please look at comment 88 of how to
> access the machine.

Dec 05 07:40:11 localhost systemd-udevd[367]: timeout '/sbin/multipath -c /dev/sdc'
Dec 05 07:40:11 localhost multipathd[142]: sdc: add path (uevent)
Dec 05 07:40:11 localhost systemd-udevd[368]: timeout '/sbin/multipath -c /dev/sdb'
Dec 05 07:40:11 localhost multipathd[142]: sdb: add path (uevent)


most likely because:

Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi4:0:0: Abort command issued cmd=ffff8800cf1f0700, cdb=0x28
Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi4: qla4xxx_mailbox_command: FAILED, MBOX CMD = 00000015, MBOX STS = 00004005 00000001 00000000 00
Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi4:0:0: Abort command - failed
Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi4:1:0: Abort command issued cmd=ffff8801232cd500, cdb=0x28
Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi4: qla4xxx_mailbox_command: FAILED, MBOX CMD = 00000015, MBOX STS = 00004005 00000001 00000000 01
Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi4:1:0: Abort command - failed
Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi4:0:0:0: DEVICE RESET ISSUED.
Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi(4:0:0:0): DEVICE RESET SUCCEEDED.
Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi4:0:1:0: DEVICE RESET ISSUED.
Dec 05 07:40:11 localhost kernel: qla4xxx 0000:06:01.1: scsi(4:0:1:0): DEVICE RESET SUCCEEDED.

Comment 99 Chad Dupuis (Cavium) 2014-12-10 13:24:52 UTC

(In reply to Ying Cui from comment #93)
> Chen, for comment 91, thanks.

Can you make comment 91 non-private, I can't see it.

Comment 100 Chad Dupuis (Cavium) 2014-12-10 21:36:36 UTC

Is the whole kernel log from a bad boot available?

Comment 101 Ben Marzinski 2014-12-10 22:34:34 UTC

(In reply to Ryan Barry from comment #90)
> Have device-mapper claim all the devices, including partitions. This would
> be ideal, since it wouldn't be a behavior change (and while a behavior
> change is ok in EL7, we're seeing the same things in 6.6).

If you'd like multipath to claim every device, I can backport the "-i" option from rhel7.  That tells multipath to ignore the wwids file and just look at the blacklist.  So, if you run "multipath -i -c <dev>" in the udev rules, multipath will immediately claim all non-blacklisted block devices. There won't be a race here, because multipath will claim the device the moment it sees it.

Comment 102 Fabian Deutsch 2014-12-11 05:55:41 UTC

Created attachment 967091 [details]
dmesg from failed boot

Chad, here you go with a journalctl -k form a failed boot.

Comment 107 Fabian Deutsch 2014-12-15 13:21:40 UTC

Wrapping up once again:

We made some multipath changes (using find_multipaths yes) and included some dracut fixes, which solved this bug.

QE; is the regression mentioned in comment 71 (bug 1136300) still existent?
Or what else problems do we have with the builds from 1212, which contain all necessary fixes for this bug.

Comment 108 cshao 2014-12-16 06:34:28 UTC

Hi fabiand,

Can we request iscsi machine (dell-pet105-02) back for our new build(1212) testing?

Thanks!

Comment 109 Fabian Deutsch 2014-12-16 08:34:11 UTC

Yes, please do so.

Comment 110 Ying Cui 2014-12-16 11:46:22 UTC

At firstly we just might suspect that was not present in the dell _server_ (comment 51), but while the test gradually and deeply in the limited time, QE covered more server testing with previous 1204 builds(hp-dl388g7-01 and hp-dl385pg8-11, each 6 times boot), all servers were not encountered this system halted issue. So we 90% doubt this issue only happened on desktop and workstation.
Based on the present test and this bug is related probability reproduce, I have to make this bug priority down to High.

Comment 112 cshao 2014-12-16 14:28:32 UTC

Created attachment 969587 [details]
1212.tar.gz

Comment 113 cshao 2014-12-16 23:55:42 UTC

Created attachment 969801 [details]
1204-vs-1212.png

Comment 114 Ying Cui 2014-12-17 11:46:39 UTC

for this _probability_ bug, and base on the previous testing(before 1204) also communicated with shaochen, desktop and workstation are affected, not server.  summarizing here for QE test probability and test machines:

server:
    dell-per510-01 - tested about 10 times, 0 happened, not encountered.
    dell-per515-01 - tested about 10 times, 0 happened, not encountered.
    hp-dl388g7-01 - tested about 6 times, 0 happened, not encountered.
    hp-dl385pg8-11 - tested about 6 times, 0 happened, not encountered.

workstation:
    hp-z220 - tested about 3 times, 2 times encountered this bug 1152948.

desktop:
    dell 9010 - tested about 20+ times, 10+ times encountered this bug 1152948.
    dell 790  - tested about 10 times, 3 times encountered this bug 1152948.

Comment 115 Fabian Deutsch 2014-12-18 14:15:27 UTC

The new patch addresses this issue by explicitly naming the wwid of the root device (if it's a multipath device) on the kernel commandline, to tell multipath to claim it right away.

Comment 116 cshao 2014-12-19 10:32:12 UTC

Test version:
rhev-hypervisor7-7.0-20141218.0.el7ev
ovirt-node-3.1.0-0.37.20141218gitcf277e1.el7.noarch

Test result:
server:
dell-per515-02  - tested about 2 times, didn't met system halted issue.
dell-pet105-02 - tested about 10 times, didn't met system halted issue.

workstation:
hp-z800-02 - tested about 2 times, didn't met system halted issue.

Desktop
dell-790 - tested about 3+ times, didn't met system halted issue.
dell 9010 - tested about 8+ times, didn't met system halted issue.
hp-5850  - tested about 2 times, didn't met system halted issue.

Seem the bug is fixed with above build.

Thanks!

Comment 117 Fabian Deutsch 2014-12-19 12:07:01 UTC

Moving to VERIFIED according to comment 116.

Comment 119 errata-xmlrpc 2015-02-11 21:04:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0160.html