Bug 1170062 - Multipath with virtio-scsi fails in Fedora 21 Final RC2/4
Summary: Multipath with virtio-scsi fails in Fedora 21 Final RC2/4
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: device-mapper-multipath
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: LVM and device-mapper development team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-03 07:16 UTC by Adam Williamson
Modified: 2015-12-02 16:37 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-02 05:24:37 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journal from a Server RC2 run, including a 'systemctl restart multipathd.service' which makes the device appear (89.30 KB, text/plain)
2014-12-03 07:18 UTC, Adam Williamson
no flags Details
storage.log (122.32 KB, text/plain)
2014-12-03 07:32 UTC, Adam Williamson
no flags Details
anaconda.log (12.29 KB, text/plain)
2014-12-03 07:33 UTC, Adam Williamson
no flags Details
program.log (28.53 KB, text/plain)
2014-12-03 07:34 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2014-12-03 07:16:00 UTC
I tried testing a multipath install using two VirtIO-SCSI disks attached to the same VM and pointing to the same disk image, with the same serial number, as per https://sharkcz.livejournal.com/12846.html .

On boot (tried RC4 Workstation x64 live and RC2 Server x64 DVD, also tried Alpha RC1 just to see and it did the same), it's not seen as a multipath drive; I see both disks separately.

If I go to ctrl-alt-f2 and do 'systemctl restart multipathd.service' it seems to pick it up correctly.

For the record multipath is listed as Final in the installation matrix, but I think that's a mistake on my part and it's actually intended to be Optional. The criterion its allegedly 'associated' with is "The installer must be able to detect (if possible) and install to supported network-attached storage devices.", which isn't really supposed to cover multipath, as I read the criteria history now. So, not proposing as a blocker.

Comment 1 Adam Williamson 2014-12-03 07:18:58 UTC
Created attachment 964009 [details]
journal from a Server RC2 run, including a 'systemctl restart multipathd.service' which makes the device appear

Comment 2 Adam Williamson 2014-12-03 07:19:47 UTC
Forgot to note, F20 x86-64 DVD image sees the same multipath 'device' correctly.

Comment 3 Adam Williamson 2014-12-03 07:32:39 UTC
Created attachment 964012 [details]
storage.log

Comment 4 Adam Williamson 2014-12-03 07:33:44 UTC
Created attachment 964013 [details]
anaconda.log

Comment 5 Adam Williamson 2014-12-03 07:34:43 UTC
Created attachment 964014 [details]
program.log

Comment 6 David Shea 2014-12-09 15:40:24 UTC
07:02:46,811 INFO program: Running... multipath -c /dev/sda
07:02:46,818 INFO program: /dev/sda is not a valid multipath device path
07:02:46,818 DEBUG program: Return code: 1

is all we have to go on, so if multipath isn't getting it right the first time that's multipath's fault.

Comment 7 Mike Snitzer 2014-12-09 15:58:47 UTC
Does virtio-scsi in Fedora21 make use of scsi-mq (and blk-mq)?

AFAIK Fedora 21's virtio-blk is using blk-mq.  The kernel's request-based DM (DM core and dm-multipath) target doesn't support blk-mq devices yet.

There is an upstream patchset that is under development but it is unclear when all the issues will be resolved, see:

https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-for-3.19-blk-mq

and: https://www.redhat.com/archives/dm-devel/2014-December/msg00028.html

For now, as a workaround, all dm-multipath'ing must be performed in the host kernel.

Comment 8 Adam Williamson 2014-12-09 16:13:51 UTC
good lord, I haven't got a clue, I just press the magic buttons that make the box go. ;)

is this something that's changed since f20?

Comment 9 Jeff Moyer 2014-12-09 18:10:35 UTC
virtio-scsi hasn't been converted to use the scsi-mq infrastructure yet.

For anyone else looking into this, the kernel version is 3.16.1-301.fc21.x86_64.

Comment 10 Adam Williamson 2014-12-09 18:56:53 UTC
Uh, are you sure? I'm pretty sure it's 3.17, not 3.16.

Comment 11 Ben Marzinski 2014-12-09 18:59:21 UTC
Multipath will always say that when the devices appear on install, and anaconda must deal with it.  Anaconda is using find_multipaths.  When that is set up. multipath won't claim devices in udev the first time it sees them, because it can't know beforehand if the device is supposed to be multipathed.

When find multipaths is enabled, a device will only be multipathed, if

1. you specifically tell multipath to do so, by running

# multipath <devname>

2. multipathd knows of two devices with the same wwid

3. The device has its wwid listed in /etc/multipath/wwids. This is the list of devices that have been multipathed before.  Multipath then knows it can claim the device as soon as it sees it.

When the first path device appears on installation, multipath has no way to know if there will be another device, so it can't claim the device.  When the second path device appears, multipath in theory could see that another path with the same wwid has already appeared.  However this takes too long to do in a udev rule, and if a large number of block devices get uevents at the same time, doing this caused udev to miss events.  Also, if there was a race to claim devices, multipath would have already lost, as it didn't claim the first device, so whatever it was racing with could. So multipath just checks if the device is blacklisted, and if not, if it is the list of known wwids.

Seeing "multipath -c" in the messages doesn't mean that the device won't be
multipathed.  It will, assuming that two path devices appear and something else doesn't grab one of the devices first.  It just means that the device wasn't claimed by multipath as soon as it appeared in the uevent processing.

This is how it has worked since anaconda switched to using find_multipaths in RHEL-7.0.  There is a known anaconda issue where if the device is already labelled as a raid or lvm device before installation starts, those will autoassemble, and the device won't be multipathed.  Anaconda needs some way to disassemble the virtual device stack so that multipath can grab the device first, and have it re-assemble correctly.

The workaround for that is to add the device wwid to the kernel command line when you install with

mpath.wwid=<wwid>

This will place the wwid in /etc/multipath/wwids, and make sure multipath claims the devices as soon as they appear.

Now, I'm not saying that this definitely isn't a multipath issue. I'll look into the logs more. But anaconda should be expecting to see those messages from Comment 6, and they don't mean that multipath won't correctly create a multipath device.

Comment 12 Jeff Moyer 2014-12-09 19:18:28 UTC
(In reply to Adam Williamson (Red Hat) from comment #10)
> Uh, are you sure? I'm pretty sure it's 3.17, not 3.16.

If the journal you attached is from the failed boot, then yes, I am sure.  Have a look at the logs for yourself if you don't believe me.  :)

Comment 13 Adam Williamson 2014-12-09 19:47:34 UTC
Oh. I think I see what happened. From the label it looks like I attached the log from the Alpha RC1 run:

LABEL=Fedora-S-21_A-x86_64

which would've had 3.16, yeah.

Comment 14 Ben Marzinski 2014-12-10 18:57:54 UTC
So looking at program.log

I can see this:

**The first path is discovered. Multipath has no way to know there will be
  a second path here**
07:02:46,811 INFO program: Running... multipath -c /dev/sda
07:02:46,818 INFO program: /dev/sda is not a valid multipath device path


**An lvm PV label is found on /dev/sda1 for the "fedora" volume group.**
07:02:47,436 INFO program: Running... lvm pvs --unit=k --nosuffix --nameprefixes
 --unquoted --noheadings -opv_name,pv_uuid,pe_start,vg_name,vg_uuid,vg_size,vg_f
ree,vg_extent_size,vg_extent_count,vg_free_count,pv_count --config  devices { pr
eferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] } global {locking_type=
4} 
07:02:47,450 INFO program:   LVM2_PV_NAME=/dev/sda2 LVM2_PV_UUID=uqm20h-JofL-qjU
n-ZsHF-m9Ml-zwJY-jfUb02 LVM2_PE_START=1024.00 LVM2_VG_NAME=fedora LVM2_VG_UUID=g
ImNjm-LSY2-eLIg-WrVZ-qPKo-9sJ8-j4JGzM LVM2_VG_SIZE=20455424.00 LVM2_VG_FREE=4096
0.00 LVM2_VG_EXTENT_SIZE=4096.00 LVM2_VG_EXTENT_COUNT=4994 LVM2_VG_FREE_COUNT=10
 LVM2_PV_COUNT=1


**The fedora volume group is clearly activated.  /dev/sda is now in use, and it will be impossible for multipath to use it**
07:02:48,194 INFO program: Running... dumpe2fs -h /dev/mapper/fedora-root


**The second path is discovered here, but multipath will not be able to
  create a multipath device with it and sda**
07:02:48,299 INFO program: Running... multipath -c /dev/sdb
07:02:48,312 INFO program: /dev/sdb is not a valid multipath device path


So it appears that /dev/sda already has lvm metadata on it.  This will cause the known issue I mentioned in Comment 11 (It was originally discovered in Bug 1054806). You can deal with this issue by adding the device wwid to the kernel command line with "mpath.wwid=<WWID>".  Multipath uses the ID_SERIAL udev attribute as the wwid for scsi devices.

looking at storage.log, that appears to be:

 'ID_SERIAL': '0QEMU_QEMU_HARDDISK_0001'

So if you add

mpath.wwid=0QEMU_QEMU_HARDDISK_0001

to the kernel command line, does this work? The other option is to erase the lvm metadata on the disk.  Like I mentioned, the long term solution will be to enable anaconda to detect these situations, and disassemble the virtual device
stack so that multipath can grab the devices first.

Comment 15 Adam Williamson 2014-12-10 18:59:18 UTC
oh, I'll have written over the disks many times since then. I'll try and find a minute to test it again with both empty disks and disks with previous F21 installs on them, though.

Comment 16 Paolo Bonzini 2014-12-17 11:12:04 UTC
I think scsi-mq has to be enabled manually.

Comment 17 Fedora End Of Life 2015-11-04 14:39:17 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 18 Fedora End Of Life 2015-12-02 05:24:45 UTC
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.