Bug 1933038
Summary: | systemd: Timed out waiting for Device/LV during boot on rhel-8.3 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Jon Magrini <jmagrini> | ||||
Component: | dracut | Assignee: | Lukáš Nykrýn <lnykryn> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | qe-baseos-daemons | ||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 8.3 | CC: | agk, akgaikwa, alex.wang, bmarzins, bubrown, dracut-maint-list, dtardon, fsilva, hannsj_uhl, heinzm, jbrassow, jhouston, mgandhi, msnitzer, nweddle, pmahale, prajnoha, rmadhuso, teigland, ysudarev, zkabelac | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-05-25 20:31:38 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1951002 | ||||||
Attachments: |
|
Description
Jon Magrini
2021-02-25 14:43:51 UTC
Device table is created prior to switch root: --- Feb 24 12:30:25 hostname multipathd[847]: mpathcb: load table [0 419430400 multipath 1 queue_if_no_path 1 alua 2 1 service-time 0 2 1 65:80 1 67:112 1 service-time 0 2 1 8:64 1 6 6:96 1] [..] Feb 24 12:30:25 hostname multipathd[847]: exit (signal) Feb 24 12:30:25 hostname multipathd[847]: --------shut down------- Feb 24 12:30:25 hostname systemd[1]: multipathd.service: Succeeded. Feb 24 12:30:25 hostname systemd[1]: Starting Switch Root... Feb 24 12:30:25 hostname systemd[1]: Switching root. [..] Seems lvm attempting to activate prior to multipathd starting: --- Feb 24 12:30:28 hostname systemd[1]: Created slice system-lvm2\x2dpvscan.slice. Feb 24 12:30:28 hostname systemd[1]: Starting LVM event activation on device 253:10... Feb 24 12:30:28 hostname systemd[1]: Starting LVM event activation on device 253:7... Feb 24 12:30:28 hostname systemd[1]: Starting LVM event activation on device 253:1... Feb 24 12:30:28 hostname systemd[1]: Starting LVM event activation on device 253:4... Feb 24 12:30:28 hostname systemd[1]: Starting LVM event activation on device 253:5... Feb 24 12:30:28 hostname systemd[1]: Starting LVM event activation on device 253:13... Feb 24 12:30:28 hostname systemd[1]: Starting LVM event activation on device 253:9... Feb 24 12:30:28 hostname systemd[1]: Starting LVM event activation on device 253:12... Feb 24 12:30:28 hostname lvm[2584]: pvscan[2584] PV /dev/mapper/mpatha3 online, VG rootvg is complete. Feb 24 12:30:28 hostname lvm[2584]: pvscan[2584] VG rootvg run autoactivation. Feb 24 12:30:28 hostname kernel: ch 3:0:0:1: [ch0] dt 0x2: Feb 24 12:30:28 hostname kernel: ch 3:0:0:1: [ch0] ID/LUN unknown Feb 24 12:30:28 hostname lvm[2581]: pvscan[2581] PV /dev/mapper/mpathcl online, VG vg_hanadata incomplete (need 3). Feb 24 12:30:28 hostname lvm[2583]: pvscan[2583] PV /dev/mapper/mpathb online, VG vg_paging is complete. Feb 24 12:30:28 hostname lvm[2583]: pvscan[2583] VG vg_paging run autoactivation. Feb 24 12:30:28 hostname lvm[2585]: pvscan[2585] PV /dev/mapper/mpathcg online, VG vg_hanadata incomplete (need 2). Feb 24 12:30:28 hostname lvm[2587]: pvscan[2587] PV /dev/mapper/mpathcj online, VG vg_hanadata incomplete (need 1). Feb 24 12:30:28 hostname lvm[2582]: pvscan[2582] PV /dev/mapper/mpathci online, VG vg_hanadata is complete. Feb 24 12:30:28 hostname lvm[2582]: pvscan[2582] VG vg_hanadata run autoactivation. Feb 24 12:30:28 hostname lvm[2586]: pvscan[2586] PV /dev/mapper/mpathcm online, VG vg_hanalog incomplete (need 2). Feb 24 12:30:28 hostname lvm[2588]: pvscan[2588] PV /dev/mapper/mpathck online, VG vg_hanalog incomplete (need 1). [..] Feb 24 12:30:32 hostname systemd[1]: Started udev Wait for Complete Device Initialization. Feb 24 12:30:32 hostname systemd[1]: Starting Device-Mapper Multipath Device Controller... Feb 24 12:30:32 hostname multipathd[2675]: --------start up-------- Feb 24 12:30:32 hostname multipathd[2675]: read /etc/multipath.conf Feb 24 12:30:32 hostname multipathd[2675]: path checkers start up [..] Feb 24 12:30:35 hostname multipathd[2675]: mpathcb: load table [0 419430400 multipath 1 queue_if_no_path 1 alua 2 1 service-time 0 2 1 65:80 1 67:112 1 service-time 0 2 1 8:64 1 66:96 1] [..] Feb 24 12:30:35 hostname systemd[1]: Started Device-Mapper Multipath Device Controller. Feb 24 12:35:26 hostname systemd[1]: dev-mapper-vg_hana\x2dlv_hana02.device: Job dev-mapper-vg_hana\x2dlv_hana02.device/start timed out. Feb 24 12:35:26 hostname systemd[1]: Timed out waiting for device dev-mapper-vg_hana\x2dlv_hana02.device. Feb 24 12:35:26 hostname systemd[1]: dev-mapper-vg_hana\x2dlv_hana02.device: Job dev-mapper-vg_hana\x2dlv_hana02.device/start timed out. Feb 24 12:35:26 hostname systemd[1]: Timed out waiting for device dev-mapper-vg_hana\x2dlv_hana02.device. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert LV Tags Devices lv_hana03 vg_hana -wi------- 300.00g /dev/mapper/mpathcb(0) -1 auto 1 0 lv_hana03 vg_hana -wi------- 300.00g /dev/mapper/mpathcc(0) -1 auto 1 0 ./etc/lvm/lvm.conf [..] filter = [ "a|/dev/mapper/mpath.*|" "r|.*|" ] mpathcb (3600507638081053c1800000000000729) dm-17 IBM,2145 size=200G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 1:0:1:4 sdv 65:80 active ready running | `- 2:0:1:4 sdbd 67:112 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 1:0:0:4 sde 8:64 active ready running `- 2:0:0:4 sdam 66:96 active ready running udevadm_info_.dev.sdv [..] E: DM_MULTIPATH_DEVICE_PATH=1 E: ID_FS_TYPE=mpath_member Hello Team, I have similar case where LV fails to activate during boot. Environments ------------- Red Hat Enterprise Linux release 8.1 (Ootpa) kernel: 4.18.0-147.5.1.el8_1.ppc64le lvm2-2.03.05-5.el8.ppc64le Thu Jan 9 15:59:57 2020 device-mapper-multipath-0.8.0-5.el8.ppc64le Thu Jan 9 15:59:55 2020 logs: ---- Feb 24 16:28:46 hostname systemd[1]: dev-mapper-datavg\x2dimpervalv.device: Job dev-mapper-datavg\x2dimpervalv.device/start timed out. Feb 24 16:28:46 hostname systemd[1]: Timed out waiting for device dev-mapper-datavg\x2dimpervalv.device. -- Subject: Unit dev-mapper-datavg\x2dimpervalv.device has failed -- Defined-By: systemd Feb 24 16:28:46 hostname systemd[1]: dev-mapper-datavg\x2dsapmntlv.device: Job dev-mapper-datavg\x2dsapmntlv.device/start timed out. Feb 24 16:28:46 hostname systemd[1]: Timed out waiting for device dev-mapper-datavg\x2dsapmntlv.device. Feb 24 16:28:46 hostname lxhn013 systemd[1]: dev-mapper-datavg\x2dimpervalv.device: Job dev-mapper-datavg\x2dimpervalv.device/start timed out. Feb 24 16:28:46 hostname lxhn013 systemd[1]: Timed out waiting for device dev-mapper-datavg\x2dimpervalv.device. Feb 24 16:28:46 hostname lxhn013 systemd[1]: dev-mapper-sapbackupvg\x2dsapbackuplv.device: Job dev-mapper-sapbackupvg\x2dsapbackuplv.device/start timed out. Feb 24 16:28:46 hostname lxhn013 systemd[1]: Timed out waiting for device dev-mapper-sapbackupvg\x2dsapbackuplv.device. Feb 24 16:28:46 hostname lxhn013 systemd[1]: dev-mapper-datavg\x2dsaplv.device: Job dev-mapper-datavg\x2dsaplv.device/start timed out. Feb 24 16:28:46 hostname lxhn013 systemd[1]: Timed out waiting for device dev-mapper-datavg\x2dsaplv.device. Regards, Pratapsingh There are only 8 or 9 lvm2-pvscan services, but there are about 17 PVs, so it seems that lvm2-pvscan services are not being generated for around half of the mpath devices. A VG will not be activated unless there's an lvm2-pvscan service generated for each PV in the VG. (In reply to David Teigland from comment #5) > There are only 8 or 9 lvm2-pvscan services, but there are about 17 PVs, so > it seems that lvm2-pvscan services are not being generated for around half > of the mpath devices. A VG will not be activated unless there's an > lvm2-pvscan service generated for each PV in the VG. Seems appropriate here. For the case mentioned in the bz opening that seems true as there are 8 units present: $ grep lvm2-pvscan sos_commands/systemd/systemctl_list-units lvm2-pvscan@253:1.service loaded active exited LVM event activation on device 253:1 lvm2-pvscan@253:10.service loaded active exited LVM event activation on device 253:10 lvm2-pvscan@253:12.service loaded active exited LVM event activation on device 253:12 lvm2-pvscan@253:13.service loaded active exited LVM event activation on device 253:13 lvm2-pvscan@253:4.service loaded active exited LVM event activation on device 253:4 lvm2-pvscan@253:5.service loaded active exited LVM event activation on device 253:5 lvm2-pvscan@253:7.service loaded active exited LVM event activation on device 253:7 lvm2-pvscan@253:9.service loaded active exited LVM event activation on device 253:9 There are 20 mpath devices total, though only 17 are used by LVM: $ less sos_commands/devicemapper/dmsetup_info_-c | sed -n '1p;/mpath/p' Name Maj Min Stat Open Targ Event UUID mpathch 253 22 L--w 0 1 0 mpath-3600507638081053c180000000000072e mpathcg 253 5 L--w 1 1 0 mpath-3600507638081053c180000000000072f mpathcf 253 21 L--w 0 1 0 mpath-3600507638081053c180000000000072d mpathbz 253 15 L--w 0 1 0 mpath-3600507638081053c1800000000000727 mpathce 253 20 L--w 0 1 0 mpath-3600507638081053c180000000000072c mpathb 253 1 L--w 1 1 0 mpath-3600507638081053c18000000000006db mpathcd 253 19 L--w 0 1 0 mpath-3600507638081053c180000000000072b mpatha 253 0 L--w 3 1 0 mpath-3600507638081053c18000000000006da mpatha3 253 4 L--w 7 1 0 part3-mpath-3600507638081053c18000000000006da mpathcc 253 18 L--w 0 1 0 mpath-3600507638081053c180000000000072a mpatha2 253 3 L--w 1 1 0 part2-mpath-3600507638081053c18000000000006da mpathcb 253 17 L--w 0 1 0 mpath-3600507638081053c1800000000000729 mpatha1 253 2 L--w 0 1 0 part1-mpath-3600507638081053c18000000000006da mpathcn 253 14 L--w 0 1 0 mpath-3600507638081053c1800000000000735 mpathca 253 16 L--w 0 1 0 mpath-3600507638081053c1800000000000728 mpathcm 253 13 L--w 0 1 0 mpath-3600507638081053c1800000000000734 mpathcl 253 10 L--w 1 1 0 mpath-3600507638081053c1800000000000732 mpathck 253 12 L--w 0 1 0 mpath-3600507638081053c1800000000000733 mpathcj 253 9 L--w 1 1 0 mpath-3600507638081053c1800000000000731 mpathci 253 7 L--w 1 1 0 mpath-3600507638081053c1800000000000730 $ less sos_commands/block/blkid_-c_.dev.null |grep mpath |grep -c LVM2 17 The system appears to be configured for event-based activation: $ grep event_activation etc/lvm/lvm.conf # Configuration option global/event_activation. # When event_activation is disabled, the system will generally run event_activation = 1 Shouldn't one of the lvm2-activation*.service generate the missing units? Or is a fix simply as re-running the lvm2-activation-generator? The lvm2-activation-generator looks at lvm.conf event_activation, and if it's 0 it will create the lvm2-activation* services that will call vgchange -aay directly. When event_activation=1, the generator does not create the lvm2-activation services, and the system depends entirely on lvm2-pvscan services to activate everything. Created attachment 1759904 [details] Service to collect udev monitor log (In reply to Peter Rajnoha from comment #8) > I've had a set of helper rules/systemd units which could collect the whole > uevent sequence which could help us in debugging this, I'll try to find them > and attach it here... The steps to take is this (7, 8, 11 is extra, but the more context we have, the better): 1) place systemd-udev-monitor.service file in /etc/systemd/system 2) call "systemctl daemon-reload" 3) call "systemctl enable systemd-udev-monitor.service" 4) reboot 5) append "systemd.log_level=debug systemd.log_target=kmsg udev.log-priority=debug log_buf_len=8M" to kernel cmd line at boot 6) make sure systemd-udev-monitor.service is running by checking "systemctl status systemd-udev-monitor.service" 7) after booting, grab the journal with "journalctl -b" 8) generate lvmdump by calling "lvmdump -u -l -s" and take the generated dump file 9) stop the systemd-udev-monitor.service with "systemctl stop systemd-udev-monitor.service" 10) take the /run/udev_monitor.log file that got generated 11) do the sosreport, just in case we'd need some extra info not already gathered by commands above 12) call "systemctl disable systemd-udev-monitor.service" and "rm /etc/systemd/system/systemd-udev-monitor.service" for cleanup. (Copied from here: https://bugzilla.redhat.com/show_bug.cgi?id=1324028#c10) As Bz 1949076 addresses the issue initially reported in this BZ (comment 1 & 2). Though there are four other cases linked as well, which 3 of them the additions to multipathd.service have resolved. The final fourth case is chasing an additional issue not related to this BZ, therefore we should be good to close this BZ and defer to 1949076. OK, thanks for confirmation. I'm closing this one then. *** This bug has been marked as a duplicate of bug 1949076 *** |