Bug 1042857

Summary: pvs command hangs (or takes tens of minutes) when a system has a lot of SAN luns - LVM claims paths before multipath has a chance to claim it
Product: Red Hat Enterprise Linux 7 Reporter: Barry Donahue <bdonahue>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Default / Unclassified QA Contact: cluster-qe <cluster-qe>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: urgent CC: agk, bdonahue, bmarzins, dlehman, harald, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, xiaoli, zkabelac
Version: 7.0   
Target Milestone: rc   
Target Release: 7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-20 07:52:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
anaconda log
none
console log
none
storage log
none
sys log
none
dmesg
none
messages
none
anaconda log
none
console log
none
storage log
none
sys log none

Description Barry Donahue 2013-12-13 14:27:54 UTC
Description of problem:When I try to install to a system with 40 SAN luns as well as a local disk, anaconda never resolves the storage target.


Version-Release number of selected component (if applicable):
RHEL-7.0-20131210.n.0 Server x86_64

How reproducible:every time


Steps to Reproduce:

   See: https://beaker.engineering.redhat.com/jobs/559686

Comment 2 Barry Donahue 2014-01-14 17:58:19 UTC
I tried this again on snap shot 2 and I still can't install unless I disconnect the SAN storage.

Comment 3 David Lehman 2014-01-14 19:16:04 UTC
The job you referenced has been deleted and therefore is not accessible. Please reproduce again and leave the job so we can see the logs. Thanks.

Comment 4 Barry Donahue 2014-01-15 18:02:32 UTC
I reranthis on the 20140110.0 build. It's hung in a loop now. https://beaker.engineering.redhat.com/jobs/575642

I'm also attaching the logs.

Comment 5 Barry Donahue 2014-01-15 18:03:31 UTC
Created attachment 850604 [details]
anaconda log

Comment 6 Barry Donahue 2014-01-15 18:04:00 UTC
Created attachment 850606 [details]
console log

Comment 7 Barry Donahue 2014-01-15 18:04:29 UTC
Created attachment 850608 [details]
storage log

Comment 8 Barry Donahue 2014-01-15 18:04:54 UTC
Created attachment 850609 [details]
sys log

Comment 9 Barry Donahue 2014-01-15 18:05:15 UTC
Created attachment 850610 [details]
dmesg

Comment 10 Barry Donahue 2014-01-15 18:05:38 UTC
Created attachment 850611 [details]
messages

Comment 11 David Lehman 2014-01-15 18:15:42 UTC
The logs you attached seem to capture a successful installation. What was the problem?

Comment 12 Barry Donahue 2014-01-15 18:20:08 UTC
Right now on the console, All I see is ......... This will continue until Beaker times out the install and kills it.
11:47:23 Running pre-installation scripts
11:47:24 Not asking for VNC because of an automated install
11:47:24 Not asking for VNC because text mode was explicitly asked for in kickstart
Starting automated install.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................forever

Comment 13 David Lehman 2014-01-15 19:00:56 UTC
One of your kickstart %post scripts must be hung. The last thing in anaconda.log is this:

16:39:39,928 INFO anaconda: Running kickstart %%post script(s)

Once that finishes, you would see something like this:

16:39:39,928 INFO anaconda: All kickstart %%post script(s) have been run

But the second line above is absent.

I see that 'set -x' is in your %post, and that the logging should go to /dev/console. Maybe something there will provide some insight as to what's hanging.

Comment 14 Barry Donahue 2014-01-15 19:24:34 UTC
It appears to be in the pre-install phase.

Starting installer, one moment...
anaconda 19.31.42-1 for Red Hat Enterprise Linux 7.0 (pre-release) started.
11:47:23 Running pre-installation scripts
11:47:24 Not asking for VNC because of an automated install
11:47:24 Not asking for VNC because text mode was explicitly asked for in kickstart
Starting automated install...................................................................................forever

Comment 15 Barry Donahue 2014-01-15 19:29:11 UTC
Complete screwup on my part. Wrong install and wrong logs. sigh....

Off to get the correct info.

Comment 16 Barry Donahue 2014-01-15 20:08:02 UTC
The correct job (still running). https://beaker.engineering.redhat.com/jobs/576459

Will add logs.

Comment 17 Barry Donahue 2014-01-15 20:08:33 UTC
Created attachment 850668 [details]
anaconda log

Comment 18 Barry Donahue 2014-01-15 20:09:05 UTC
Created attachment 850669 [details]
console log

Comment 19 Barry Donahue 2014-01-15 20:09:32 UTC
Created attachment 850670 [details]
storage log

Comment 20 Barry Donahue 2014-01-15 20:09:56 UTC
Created attachment 850671 [details]
sys log

Comment 21 David Lehman 2014-01-15 20:28:47 UTC
The last thing in program.log is this:

14:39:22,193 INFO program: Running... lvm pvs --unit=k --nosuffix --nameprefixes --rows --unquoted --noheadings -opv_uuid,pe_start,vg_name,vg_uuid,vg_size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,pv_count --config global {locking_type=4}  /dev/sda2

Had this command completed it would be followed by a line like this:

14:39:22,350 DEBUG program: Return code: 0


So it looks like this lvm command is hanging.

Comment 25 Peter Rajnoha 2014-01-30 10:03:04 UTC
OK, I've tried the installation on the machine myself (thanks Barry for the access to the machine!) and I've found this:

The problem is that the devices are multipath members and we fire PV scans on those, even though these should be ignored!

We normally ignore devices (for pvscan) which have

  'DM_MULTIPATH_DEVICE_PATH': '1'

...set in udev (this is set by mpath udev rules to indicate that this is a multipath device path).

However, we still fire a scan because I can see:

  'SYSTEMD_WANTS': 'lvm2-pvscan@8:16.service'

...that instantiates a service to run a scan on the device. This is because the LVM label was detected to be lost from previous state, indicated by this property recorded in udev:

  'LVM_PV_GONE': '1'

This only happens if and only if previously there was a PV label detected and recorded in udev db ('ID_FS_TYPE': 'LVM2_member') and now, there's something else ('ID_FS_TYPE': 'mpath_member' exactly here).

I still have to investigate how this happens, looks like a problem with device claiming - the device should be claimed by mpath as the very first thing on boot if this is an mpath device to prevent this problem... Looks like an ordering problem when scanning devices/running subsystem services (mpath should be first so any mpath member is properly marked before it's considered by lvm2).

Also, another part of this problem is numerous event processing (for which we already have a patch - bz #1056976 and original and related bz #1049387 - the mpath part built just yesterday and I'll do an lvm2 update as well) - this should also help here...

The extra probolem in this report is the LVM_PV_GONE=1 flag set in udev db which fires the useless extra pvscans...

Comment 27 Peter Rajnoha 2014-02-03 10:05:43 UTC
This is where it goes wrong - it's during initramfs stage (I've chosen one path as an example - /dev/sddq here). In dracut's "pre-trigger", this is marked as ID_FS_TYPE=LVM2_member (incorrect) while at "switch-root" it's ID_FS_TYPE=mpath_member (correct). This transition causes the LV_PV_GONE=1 to be set which then fires pvscan later on. Though it's odd that the ID_FS_TYPE=LVM2_member is not corrected to ID_FS_TYPE=mpath_member as the "multipath -c /dev/sddq" already shows that it's the multipath device path even in the pre-trigger stage (where the ID_FS_TYPE has incorrect value in udev database).

/lib/udev/rules.d/62-multipath.rules should have rewritten this to correct value.

pre-trigger:/# multipath -c /dev/sddq
/dev/sddq is a valid multipath device path

pre-trigger:/# udevadm info --name=/dev/sddq
P: /devices/pci0000:00/0000:00:09.0/0000:05:00.1/host2/rport-2:0-3/target2:0:1/2:0:1:39/block/sddq
N: sddq
S: disk/by-id/scsi-360a9800044306a31715d436c58423270
S: disk/by-id/wwn-0x60a9800044306a31715d436c58423270
S: disk/by-path/pci-0000:05:00.1-fc-0x500a0983881bc1db-lun-39
E: DEVLINKS=/dev/disk/by-id/scsi-360a9800044306a31715d436c58423270 /dev/disk/by-id/wwn-0x60a9800044306a31715d436c58423270 /dev/disk/by-path/pci-0000:05:00.1-fc-0x500a0983881bc1db-lun-39
E: DEVNAME=/dev/sddq
E: DEVPATH=/devices/pci0000:00/0000:00:09.0/0000:05:00.1/host2/rport-2:0-3/target2:0:1/2:0:1:39/block/sddq
E: DEVTYPE=disk
E: DM_MULTIPATH_TIMESTAMP=1391420704
E: ID_BUS=scsi
E: ID_FS_TYPE=LVM2_member
E: ID_FS_USAGE=raid
E: ID_FS_UUID=3seHM5-sQzk-N1AH-wewl-3zeP-hBfL-mEpzFw
E: ID_FS_UUID_ENC=3seHM5-sQzk-N1AH-wewl-3zeP-hBfL-mEpzFw
E: ID_FS_VERSION=LVM2 001
E: ID_MODEL=LUN
E: ID_MODEL_ENC=LUN\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E: ID_PATH=pci-0000:05:00.1-fc-0x500a0983881bc1db-lun-39
E: ID_PATH_TAG=pci-0000_05_00_1-fc-0x500a0983881bc1db-lun-39
E: ID_REVISION=820a
E: ID_SCSI=1
E: ID_SCSI_SERIAL=D0j1q]ClXB2p
E: ID_SERIAL=360a9800044306a31715d436c58423270
E: ID_SERIAL_SHORT=60a9800044306a31715d436c58423270
E: ID_TARGET_PORT=1
E: ID_TYPE=disk
E: ID_VENDOR=NETAPP
E: ID_VENDOR_ENC=NETAPP\x20\x20
E: ID_WWN=0x60a9800044306a31
E: ID_WWN_VENDOR_EXTENSION=0x715d436c58423270
E: ID_WWN_WITH_EXTENSION=0x60a9800044306a31715d436c58423270
E: MAJOR=71
E: MINOR=128
E: MPATH_SBIN_PATH=/sbin
E: SUBSYSTEM=block
E: SYSTEMD_READY=1
E: TAGS=:systemd:
E: USEC_INITIALIZED=449294

switch_root:/# udevadm info --name=/dev/sddq
P: /devices/pci0000:00/0000:00:09.0/0000:05:00.1/host2/rport-2:0-3/target2:0:1/2:0:1:39/block/sddq
N: sddq
S: disk/by-id/scsi-360a9800044306a31715d436c58423270
S: disk/by-id/wwn-0x60a9800044306a31715d436c58423270
S: disk/by-path/pci-0000:05:00.1-fc-0x500a0983881bc1db-lun-39
E: DEVLINKS=/dev/disk/by-id/scsi-360a9800044306a31715d436c58423270 /dev/disk/by-id/wwn-0x60a9800044306a31715d436c58423270 /dev/disk/by-path/pci-0000:05:00.1-fc-0x500a0983881bc1db-lun-39
E: DEVNAME=/dev/sddq
E: DEVPATH=/devices/pci0000:00/0000:00:09.0/0000:05:00.1/host2/rport-2:0-3/target2:0:1/2:0:1:39/block/sddq
E: DEVTYPE=disk
E: DM_MULTIPATH_DEVICE_PATH=1
E: DM_MULTIPATH_TIMESTAMP=1391420760
E: DM_MULTIPATH_WIPE_PARTS=1
E: ID_BUS=scsi
E: ID_FS_TYPE=mpath_member
E: ID_FS_USAGE=raid
E: ID_FS_UUID=3seHM5-sQzk-N1AH-wewl-3zeP-hBfL-mEpzFw
E: ID_FS_UUID_ENC=3seHM5-sQzk-N1AH-wewl-3zeP-hBfL-mEpzFw
E: ID_FS_VERSION=LVM2 001
E: ID_MODEL=LUN
E: ID_MODEL_ENC=LUN\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E: ID_PATH=pci-0000:05:00.1-fc-0x500a0983881bc1db-lun-39
E: ID_PATH_TAG=pci-0000_05_00_1-fc-0x500a0983881bc1db-lun-39
E: ID_REVISION=820a
E: ID_SCSI=1
E: ID_SCSI_SERIAL=D0j1q]ClXB2p
E: ID_SERIAL=360a9800044306a31715d436c58423270
E: ID_SERIAL_SHORT=60a9800044306a31715d436c58423270
E: ID_TARGET_PORT=1
E: ID_TYPE=disk
E: ID_VENDOR=NETAPP
E: ID_VENDOR_ENC=NETAPP\x20\x20
E: ID_WWN=0x60a9800044306a31
E: ID_WWN_VENDOR_EXTENSION=0x715d436c58423270
E: ID_WWN_WITH_EXTENSION=0x60a9800044306a31715d436c58423270
E: LVM_PV_GONE=1
E: MAJOR=71
E: MINOR=128
E: MPATH_SBIN_PATH=/sbin
E: SUBSYSTEM=block
E: SYSTEMD_READY=1
E: TAGS=:systemd:
E: USEC_INITIALIZED=449294

Comment 28 Peter Rajnoha 2014-02-03 10:12:25 UTC
Ben, can you have a look, please - seems the 62-multipath.rules did not detect valid multipath device path when it should in dracut? (...and did not set the  DM_MULTIPATH_DEVICE_PATH=1 + it did not rewrite the ID_FS_LABEL to correct value during this early stage?)

Comment 29 Peter Rajnoha 2014-02-03 10:14:21 UTC
(This one "E: LVM_PV_GONE=1" is set whenever we change from ID_FS_TYPE="LVM2_member" to anything else - we fire pvscan to inform lvmetad that the device is not a PV anymore...)

Comment 30 Peter Rajnoha 2014-02-03 11:11:08 UTC
..tried rd.udev.debug and I see tons of these:

...
Feb 03 10:47:27 localhost systemd-udevd[653]: '/sbin/multipath -c /dev/sdag'(err) 'libudev: udev_device_read_db: no db file to read /run/udev/data/b66:0: No such file or directory'
Feb 03 10:47:27 localhost systemd-udevd[591]: '/sbin/multipath -c /dev/sdac'(err) 'libudev: udev_device_new_from_syspath: device 0x1982250 has devpath '/devices/pci0000:00/0000:00:09.0/0000:05:00.1/host2/rport-2:0-3/fc_remote_ports/rport-2:0-3''
Feb 03 10:47:27 localhost systemd-udevd[613]: '/sbin/multipath -c /dev/sdl' [881] exit with return code 1
...

So that's probably the reason why the dev is not marked as multipath device path in udev db!

Ben, just to make sure - "multipath -c" uses libudev, right? If yes, then I think we can hand this over to udev as it seems it can't handle so many devices at a time and, maybe some races inside udev when numerous devices are processed in a short time.

Comment 31 Ben Marzinski 2014-02-05 18:16:00 UTC
Yes, multipath -c does use udev.  I doesn't call udev_device_new_from_syspath() directly doing in this code path, but it does call udev_device_new_from_subsystem_sysname(), which may build the syspath form the subsystem name and device name inside libudev and call udev_device_new_from_syspath() internally, but I haven't looked.

There's another thing you might want to check though.  In order to keep from firing doing so much work on change events, they only check if the device is a multipath device if the timestamp has been updated since the last uevent. Otherwise, they just copy the value from the udev database.  It's possible that something is going wrong here.  You could try removing all this from 62-multipath.rules

*******
ACTION!="change", GOTO="update_timestamp"
IMPORT{db}="DM_MULTIPATH_TIMESTAMP"
IMPORT{db}="DM_MULTIPATH_DEVICE_PATH"
IMPORT{db}="DM_MULTIPATH_WIPE_PARTS"
# Check if the device is part of a multipath device. the -T option just keeps
# the old result if the timestamp hasn't changed.
PROGRAM=="$env{MPATH_SBIN_PATH}/multipath -T $env{DM_MULTIPATH_TIMESTAMP}:$env{DM_MULTIPATH_DEVICE_PATH} -c $env{DEVNAME}", \
        ENV{DM_MULTIPATH_DEVICE_PATH}="1", ENV{ID_FS_TYPE}="mpath_member", \
        GOTO="update_timestamp"

# If the device isn't part of a multipath device, clear this
ENV{DM_MULTIPATH_DEVICE_PATH}=""
ENV{DM_MULTIPATH_WIPE_PARTS}=""

LABEL="update_timestamp"
*******

and change

ACTION=="add", ENV{DM_MULTIPATH_DEVICE_PATH}!="1", \
        PROGRAM=="$env{MPATH_SBIN_PATH}/multipath -c $tempnode", \
        ENV{DM_MULTIPATH_DEVICE_PATH}="1", ENV{ID_FS_TYPE}="mpath_member"

to

ACTION!="remove", ENV{DM_MULTIPATH_DEVICE_PATH}!="1", \
        PROGRAM=="$env{MPATH_SBIN_PATH}/multipath -c $tempnode", \
        ENV{DM_MULTIPATH_DEVICE_PATH}="1", ENV{ID_FS_TYPE}="mpath_member"

and then remake the intramfs. This will make multipath always actually do the check.  However given the errors, it seems more likely that this is working fine, and the problem is that udev can't give me a udev device to use.

Comment 34 Peter Rajnoha 2014-02-19 09:06:18 UTC
It seems this an instance of bug #1054806 where the problem is in a way devices are claimed. Since multipath doesn't know whether the device is a multipath path or not unless we have another device with the same WWID or directly defined in the multipath configuration (which is not the case during installation of course). This ends up with LVM taking over the device (which later appears to be a multipath path). This problem is just being resolved at the moment (bug #1054806).

In addition to that, I'm also a bit worried about these messages:

Feb 03 10:47:27 localhost systemd-udevd[653]: '/sbin/multipath -c /dev/sdag'(err) 'libudev: udev_device_read_db: no db file to read /run/udev/data/b66:0: No such file or directory'

Since multipath uses libudev to enumerate block devices in the system, I assume this can end up with multipath not seeing a block device and hence ending up with a conclusion that there's no other device in the system with the same WWID.

Harald - can you please confirm? Is it possible that the error above causes the block device to be "invisible" to the libudev user?

Comment 35 Harald Hoyer 2014-02-19 09:28:59 UTC
PROGRAM="" (not ==, btw) runs at rule processing time. So, if a device appears and has no records yet, of course there is no database file for it _yet_.

PROGRAM
     Execute a program to determine whether there is a match; the key is true
     if the program returns successfully. The device properties are made
     available to the executed program in the environment. The program's stdout
     is available in the RESULT key.

     This can only be used for very short-running foreground tasks.
     For details see RUN.

Comment 36 Harald Hoyer 2014-02-19 09:33:56 UTC
Also, please check, if the custom initrd udev rules, dracut adds are correct and do not interfere with yours.

namely:

11-dm.rules
59-persistent-storage-dm.rules

http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/90dm/11-dm.rules
http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/90dm/59-persistent-storage-dm.rules

Comment 37 Ben Marzinski 2014-02-19 19:17:58 UTC
Peter, I'm pretty sure that these

Feb 03 10:47:27 localhost systemd-udevd[653]: '/sbin/multipath -c /dev/sdag'(err) 'libudev: udev_device_read_db: no db file to read /run/udev/data/b66:0: No such file or directory'

messages aren't a problem. These are most likely coming from this chunk of code in mutipath's get_uid() function.

        value = udev_device_get_property_value(pp->udev, pp->uid_attribute);
        if ((!value || strlen(value) == 0) && conf->dry_run == 2)
                value = getenv(pp->uid_attribute);

pp->uid_attribute is "ID_SERIAL" most devices.  and "conf->dry_run == 2" is checking if we are running "multipath -c".  If we are, we just read the value from the udev environment variable if we can't grab it from the udev database. So you will see the error message, but multipath will deal with it.

Comment 38 Peter Rajnoha 2014-02-20 07:52:10 UTC
OK, thanks for confirmation. I think we can mark this one as dup of bug #1054806 then - if the other bug is fixes, the devs should be claimed properly and we should not hit the bug reported here anymore I believe.

*** This bug has been marked as a duplicate of bug 1054806 ***