Bug 1670209
| Summary: | [HPE 8.0 Bug] RHEL8 snapshot4 lvm activation errors | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Randy Wright <rwright> |
| Component: | lvm2 | Assignee: | David Teigland <teigland> |
| lvm2 sub component: | Activating existing Logical Volumes | QA Contact: | cluster-qe <cluster-qe> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | agk, cmarthal, heinzm, jbrassow, jkachuck, joseph.szczypek, karen.skweres, mcsontos, mknutson, msnitzer, prajnoha, rhandlin, rwright, teigland, tom.vaden, trinh.dao, zkabelac |
| Version: | 8.0 | ||
| Target Milestone: | rc | ||
| Target Release: | 8.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | lvm2-2.03.02-6.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-14 01:13:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1472458, 1478674 | ||
| Attachments: | |||
|
Description
Randy Wright
2019-01-28 22:46:13 UTC
Sosreport has been uploaded to Redhat dropbox incoming directory: ftp> bin 200 Switching to Binary mode. ftp> cd incoming 250 Directory successfully changed. ftp> put sosreport-tanerite02-2019-01-28-zmpdylf.tar.xz bz1670209-sosreport-tanerite02-2019-01-28-zmpdylf.tar.xz local: sosreport-tanerite02-2019-01-28-zmpdylf.tar.xz remote: bz1670209-sosreport-tanerite02-2019-01-28-zmpdylf.tar.xz 200 PORT command successful. Consider using PASV. 150 Ok to send data. 226 File receive OK. 31415856 bytes sent in 3.73 secs (8.0238 MB/s) ftp> quit 221 Goodbye. I've also placed a copy inside HPE where Joe Szczypek could retrieve it for you: https://hpsl.ftc.rdlabs.hpecorp.net/~wrightr1/tanerite02/bz1670209-sosreport-tanerite02-2019-01-28-zmpdylf.tar.xz Hello HPE, Please confirm as soon as you are able if this worked correctly in any previous RHEL 8 Alpha or Beta releases. Please confirm if you have been able to recreate this issue on more then one physical system. Thank You Joe Kachuck Comment 2 is not simple to answer, and I have to split it up between different systems. I should note this is not an error for which I run a specific test case on all my test systems; rather, it's something I just noticed as the system booted, since I keep an eye open for things that say "failed" or "error", so some detective work is required. One simple case: systems with no lvm volumes - the majority of my test systems - will occasionally show some or all of these messages: vsp-20190125-1436.txt:[ 24.958202] lvm2-activation-generator: lvmconfig failed vsp-20190125-1436.txt:[ 24.983113] lvm2-activation-generator: Activation generator failed. vsp-20190125-1436.txt:[ 25.385836] systemd[1052]: /usr/lib/systemd/system-generators/lvm2-activation-generator failed with exit status 1. For my test systems, I generally keep 3 months of serial console logs, and searching the old ones, I only see this beginning January 25 2019, but looking at the kernel version in that log, I see 4.18.0-56.el8.x86_64 which I think was snapshot 3... so I may have overlooked an initial occurrence of this issue in snapshot3. Most of my systems have no LVM volumes, and so this error message is not preventing boot to the default systemd target, so on those systems, one must be alert to notice the issue. In this case, the basic question is whether lvm activation generator failure matters if there are no lvm volumes on the SUT? Now on my only system that actually has some LVM volumes, because that system is also multipath, it's hard for me to distinguish if the failure cause is the current bug or related to symptoms described in bug 1642728. One or both of these bugs are likely to interfere with booting successfully to the default systemd target. That system is the one from which I provided the sosreport in the original filing of the current bug. I have worked around bug 1642728 by substituting in /etc/fstab actual /dev/mapper device names, rather than LABEL= designation. I can make that edit again so as to prevent bug 1642728, and see if the current lvm2-activation-generator bug still occurs and prevents achieving the systemd default target. I'll post another update with that result tomorrow. I saw this in the /var/log/messages of the first boot instance after editting /etc/fstab to reference /dev/mapper device names only: Feb 5 18:22:20 tanerite02 lvm2-activation-generator: lvmconfig failed Feb 5 18:22:20 tanerite02 lvm2-activation-generator: Activation generator failed. In that boot bug 1642728 was not seen, so the current bug can be observed without 1642728. This boot instance went on to attain the systemd default run level, so even on a system that has LVM volumes mounted from fstab, the logged failure messages do not neccesarily prevent a successful boot. I have a reboot loop setup to run overnight so I will update again tomorrow with the result of that test. This could be caused by upgrading from previous lvm versions - lvmetad was removed, and global/use_lvmetad in lvm.conf was replaced by global/event_activation.
Randy, I do not have access to the sosfile. Could you please post output of `diff /etc/lvm/{lvm.conf,lvm.conf.rpmnew}`?
Now we need to find the best way to fix this:
- use `lvmconfig --type full` in the activation generator, to take default values into account,
- use use_lvmetad value if event_actiovation is missing,
- rewrite the lvm.conf (Not something I would recommend without user's consent)
- other options?
Commit 6298eaeca50e32cdff3adefdb57b09c3250547a2 - add `--type full` option to lvmconfig, Commit fdd612b8242281ac599c220726155202c71549a8 - avoid writing to syslog during boot - to prevent blocking Responding to comment 7: I believe I did update this sytem from the previous snapshot, rather than do a cold install, because the multipath setup is fairly specialized on this SUT as I use it for multipath failover testing - it's one of the systems used in bug 1601647 and bug 1625414. More generally, the idea that this issue might affect only updated systems rather than cold installed systems might explain why the message is seen on some but not all systems we have here testing RHEL8 I'll see if I can correlate the results. diff /etc/lvm/{lvm.conf,lvm.conf.rpmnew} shows many differences, so I will state by way of introduction that the change to scan /dev/mapper is one of the intentional customizations I make on my multipath test systems: scan = [ "/dev/mapper" ] I believe that was my only edit to the lvm.conf as previously installed, but likely explained the upgrade not overwriting the prior version, as is fairly evident from just the file modification times: [root@tanerite02 ~]# ll /etc/lvm/{lvm.conf,lvm.conf.rpmnew} -rw-r--r--. 1 root root 98112 Nov 29 13:48 /etc/lvm/lvm.conf -rw-r--r--. 1 root root 96843 Jan 4 02:49 /etc/lvm/lvm.conf.rpmnew [root@tanerite02 ~]# diff /etc/lvm/{lvm.conf,lvm.conf.rpmnew} 60c60 < scan = [ "/dev/mapper" ] --- > scan = [ "/dev" ] 152,163d151 < # Configuration option devices/cache_dir. < # This setting is no longer used. < cache_dir = "/etc/lvm/cache" < < # Configuration option devices/cache_file_prefix. < # This setting is no longer used. < cache_file_prefix = "" < < # Configuration option devices/write_cache_state. < # This setting is no longer used. < write_cache_state = 1 < 268,271d255 < # Configuration option devices/disable_after_error_count. < # This setting is no longer used. < disable_after_error_count = 0 < 840,853d823 < # Configuration option global/fallback_to_lvm1. < # This setting is no longer used. < # This configuration option has an automatic default value. < # fallback_to_lvm1 = 0 < < # Configuration option global/format. < # This setting is no longer used. < # This configuration option has an automatic default value. < # format = "lvm2" < < # Configuration option global/format_libraries. < # This setting is no longer used. < # This configuration option does not have a default value defined. < 866,869d835 < # Configuration option global/locking_type. < # This setting is no longer used. < locking_type = 1 < 874,881d839 < # Configuration option global/fallback_to_clustered_locking. < # This setting is no longer used. < fallback_to_clustered_locking = 1 < < # Configuration option global/fallback_to_local_locking. < # This setting is no longer used. < fallback_to_local_locking = 1 < 902,906d859 < # Configuration option global/locking_library. < # This setting is no longer used. < # This configuration option has an automatic default value. < # locking_library = "liblvm2clusterlock.so" < 995,997c948,956 < # Configuration option global/use_lvmetad. < # This setting is no longer used. < use_lvmetad = 0 --- > # Configuration option global/event_activation. > # Activate LVs based on system-generated device events. > # When a device appears on the system, a system-generated event runs > # the pvscan command to activate LVs if the new PV completes the VG. > # Use auto_activation_volume_list to select which LVs should be > # activated from these events (the default is all.) > # When event_activation is disabled, the system will generally run > # a direct activation command to activate LVs in complete VGs. > event_activation = 1 999,1000c958,959 < # Configuration option global/lvmetad_update_wait_time. < # This setting is no longer used. --- > # Configuration option global/use_aio. > # Use async I/O when reading and writing devices. 1002c961 < # lvmetad_update_wait_time = 0 --- > # use_aio = 1 1705,1709d1663 < < # Configuration option metadata/dirs. < # This setting is no longer used. < # This configuration option is advanced. < # This configuration option does not have a default value defined. As to options for addressing the possible issues during upgrade, I am accustomed to the solution that debian-based distros often apply during an upgrade: if a config file on the system being updated differs from the originally distributed version, stop and prompt the user for options on handling it, with options such as preserve the current one but provide the rpmnew version, or move the original one aside and replace with the new one so the user can go back and reconcile differences after completing the update. But I understand your users may not want to have to interact with each system individually during a mass deployment. However, I don't recall anything in the yum/dnf update process bringing to my attention the fact it didn't update lvm.conf. The solution of leaving the current config file in place is not a bad choice, but it would be useful to bring that to the attention of the user as the yum operation completes. Created attachment 1527705 [details] journalctl output after updating lvm.conf There still seems to be an LVM problem on this SUT after replacing lvm.conf with the lvm.conf.rpmnew and applying my local customization: [root@tanerite02 lvm]# diff lvm.conf.rpmnew lvm.conf 60c60 < scan = [ "/dev" ] --- > scan = [ "/dev/mapper" ] On reboot, I am seeing one or both systemctl units fail, with the serial console notation like this: [FAILED] Failed to start LVM event activation on device 253:24. See 'systemctl status lvm2-pvscan@253:24.service' for details. [FAILED] Failed to start LVM event activation on device 253:27. See 'systemctl status lvm2-pvscan@253:27.service' for details. Looking with systemctl as suggested: [root@tanerite02 lvm]# systemctl status --full --no-pager lvm2-pvscan@253:24.service * lvm2-pvscan@253:24.service - LVM event activation on device 253:24 Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2019-02-06 14:05:52 MST; 5min ago Docs: man:pvscan(8) Main PID: 3159 (code=exited, status=5) Feb 06 14:05:52 tanerite02 systemd[1]: Starting LVM event activation on device 253:24... Feb 06 14:05:52 tanerite02 lvm[3159]: Couldn't find device with uuid IvqLhI-vhJa-OTrF-P0OH-eZzj-QlGZ-xn9PWG. Feb 06 14:05:52 tanerite02 lvm[3159]: Cannot change VG VG16MB while PVs are missing. Feb 06 14:05:52 tanerite02 lvm[3159]: Consider vgreduce --removemissing. Feb 06 14:05:52 tanerite02 lvm[3159]: Cannot process volume group VG16MB Feb 06 14:05:52 tanerite02 lvm[3159]: 2 logical volume(s) in volume group "VG04MB" now active Feb 06 14:05:52 tanerite02 systemd[1]: lvm2-pvscan@253:24.service: Main process exited, code=exited, status=5/NOTINSTALLED Feb 06 14:05:52 tanerite02 systemd[1]: lvm2-pvscan@253:24.service: Failed with result 'exit-code'. Feb 06 14:05:52 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:24. [root@tanerite02 lvm]# systemctl status --full --no-pager lvm2-pvscan@253:27.service * lvm2-pvscan@253:27.service - LVM event activation on device 253:27 Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2019-02-06 14:05:52 MST; 6min ago Docs: man:pvscan(8) Main PID: 3164 (code=exited, status=5) Feb 06 14:05:52 tanerite02 systemd[1]: Starting LVM event activation on device 253:27... Feb 06 14:05:52 tanerite02 lvm[3164]: Couldn't find device with uuid IvqLhI-vhJa-OTrF-P0OH-eZzj-QlGZ-xn9PWG. Feb 06 14:05:52 tanerite02 lvm[3164]: Cannot change VG VG16MB while PVs are missing. Feb 06 14:05:52 tanerite02 lvm[3164]: Consider vgreduce --removemissing. Feb 06 14:05:52 tanerite02 lvm[3164]: Cannot process volume group VG16MB Feb 06 14:05:52 tanerite02 lvm[3164]: 2 logical volume(s) in volume group "VG04MB" now active Feb 06 14:05:52 tanerite02 systemd[1]: lvm2-pvscan@253:27.service: Main process exited, code=exited, status=5/NOTINSTALLED Feb 06 14:05:52 tanerite02 systemd[1]: lvm2-pvscan@253:27.service: Failed with result 'exit-code'. Feb 06 14:05:52 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:27. However, the expected LV's and PV's are getting mounted: [root@tanerite02 lvm]# mount|grep VG /dev/mapper/VG16MB-VG16LV01 on /VG16/LV01 type ext3 (rw,relatime,seclabel) /dev/mapper/VG16MB-VG16LV02 on /VG16/LV02 type ext3 (rw,relatime,seclabel) /dev/mapper/VG04MB-VG04LV01 on /VG04/LV01 type ext3 (rw,relatime,seclabel) /dev/mapper/VG04MB-VG04LV02 on /VG04/LV02 type ext3 (rw,relatime,seclabel) I should point out clearly that the volume groups are composed of multipath devices. That is the reason for my customization to scan /dev/mapper: so there is not a race in trying to scan the simple scsi disks before they are captured by device mapper. [root@tanerite02 ~]# vgdisplay -v /dev/VG16MB --- Volume group --- VG Name VG16MB System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 4 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 2 Act PV 2 VG Size 93.12 GiB PE Size 16.00 MiB Total PE 5960 Alloc PE / Size 5958 / 93.09 GiB Free PE / Size 2 / 32.00 MiB VG UUID jXLe9r-nom8-hnUk-lgFF-0E0w-sJyy-iE2x5g --- Logical volume --- ... --- Physical volumes --- PV Name /dev/mapper/3600c0ff00014fdb28089225b01000000p1 PV UUID eTAWLD-7Zk0-PGOu-sVcf-2A2U-u1LP-V0NEFG PV Status allocatable Total PE / Free PE 2980 / 2 PV Name /dev/mapper/3600c0ff00014fdb2b889225b01000000p1 PV UUID IvqLhI-vhJa-OTrF-P0OH-eZzj-QlGZ-xn9PWG PV Status allocatable Total PE / Free PE 2980 / 0 Is the lvm initialization perhaps proclaiming failure too soon, before device mapper has a chance to create the multipath devices? Because by the time I can login and enter shell commands, I don't really see any problem. I'll attach journalctl output, this particular boot showed only one of the lvm services failing: [root@tanerite02 ~]# journalctl --failed journalctl: unrecognized option '--failed' [root@tanerite02 ~]# systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION * lvm2-pvscan@253:24.service loaded failed failed LVM event activation on device 253:24 LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 1 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'. [root@tanerite02 ~]# journalctl -ab > journalctl-20190206.txt We have fixed the missing options in lvm.conf causing issues. Modifying scan option is not good idea. If you want only mpath devices use filter option as a whitelist to accept only mpath devices: filter = [ "a|/dev/mapper/mpath|", "r|*|" ] I'm noticing this in the BZ description: Cannot change VG VG04MB while PVs are missing. Consider vgreduce --removemissing. It does look like a 'missing' PV in this VG is preventing 'activation' of a LV. To activate LVs from a VG with missing PV - extra option 'lvchange --partial' has to be used. (So eventually missing portions of LV can be replaced with error or zero segments). For working autoactivation the VG has to be complete/consistent (no missing PV). If the PV should be removed - try suggested vgreduce --removemissing. If the PV got somehow back and should be restored back into VG - try 'vgextend --restoremissing'. Both operation needs manual admin steps - and of course - all associated LVs should be fscked. I acknowledge comment 14 and will make the recommended change. In response to comment 15: The message is misleading, as I am not missing any PV's. That is the point I was trying to make in comment 13, about whether LVM might be looking too early, before device mapper creates the /dev/mapper devices. You will see the PV's associated with the VG are /dev/mapper multipath devices, and they show up initially as two distinct /dev/sd* scsi devices, which device mapper consolidates to the multipath dm device /dev/mapper/* ... so the message may have been a correct for a few seconds of transient state early in boot, as devices are being discovered and before device mapper captures them. But the fact it leads to showing the service persistently in the failed state via "systemctl --failed" seems incorrect. Created attachment 1527973 [details] journalctl with misleading failures reported from lvm2-pvscan Actually, the comment 14 change doesn't work for me, for a couple of reasons. It assumes multipath is configured with friendly names, which I am not using. And then I believe the regex in the reject pattern should be a real regex, rather than a shell glob pattern. So for my SUT, I modified it to become: filter = [ "a|/dev/mapper/3600|", "r|.*|" ] With this change in /etc/lvm/lvm.conf -- and also built into the initramfs -- I still see intermittent failures reported: [root@tanerite02 ~]# systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION * lvm2-pvscan@253:25.service loaded failed failed LVM event activation on device 253:25 * lvm2-pvscan@253:27.service loaded failed failed LVM event activation on device 253:27 This is not reported on every boot, just randomly. And even on boots where it is reported, the system has come up perfectly happy. But you are leaving a diagnostic that erroneously tells the administrator he may want to run vgreduce on a perfectly functional volume group. [root@tanerite02 ~]# systemctl status --full --no-pager lvm2-pvscan@253:27.service * lvm2-pvscan@253:27.service - LVM event activation on device 253:27 Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2019-02-07 17:46:47 MST; 8min ago Docs: man:pvscan(8) Process: 3189 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay 253:27 (code=exited, status=5) Main PID: 3189 (code=exited, status=5) Feb 07 17:46:47 tanerite02 systemd[1]: Starting LVM event activation on device 253:27... Feb 07 17:46:47 tanerite02 lvm[3189]: Couldn't find device with uuid IvqLhI-vhJa-OTrF-P0OH-eZzj-QlGZ-xn9PWG. Feb 07 17:46:47 tanerite02 lvm[3189]: Cannot change VG VG16MB while PVs are missing. Feb 07 17:46:47 tanerite02 lvm[3189]: Consider vgreduce --removemissing. Feb 07 17:46:47 tanerite02 lvm[3189]: Cannot process volume group VG16MB Feb 07 17:46:47 tanerite02 lvm[3189]: 2 logical volume(s) in volume group "VG04MB" now active Feb 07 17:46:47 tanerite02 systemd[1]: lvm2-pvscan@253:27.service: Main process exited, code=exited, status=5/NOTINSTALLED Feb 07 17:46:47 tanerite02 systemd[1]: lvm2-pvscan@253:27.service: Failed with result 'exit-code'. Feb 07 17:46:47 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:27. But here's what I see when I run vgdisplay on VG16MB: [root@tanerite02 ~]# vgdisplay -v VG16MB --- Volume group --- VG Name VG16MB System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 4 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 2 Act PV 2 VG Size 93.12 GiB PE Size 16.00 MiB Total PE 5960 Alloc PE / Size 5958 / 93.09 GiB Free PE / Size 2 / 32.00 MiB VG UUID jXLe9r-nom8-hnUk-lgFF-0E0w-sJyy-iE2x5g --- Logical volume --- LV Path /dev/VG16MB/VG16LV01 LV Name VG16LV01 VG Name VG16MB LV UUID wNbe1h-VsTY-fFPb-Zmxe-dkr2-eqBY-CBbD3G LV Write Access read/write LV Creation host, time tanerite02, 2018-06-14 17:22:24 -0600 LV Status available # open 1 LV Size 46.00 GiB Current LE 2944 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:32 --- Logical volume --- LV Path /dev/VG16MB/VG16LV02 LV Name VG16LV02 VG Name VG16MB LV UUID 34KkOc-M9wE-Piqf-RSXV-yPgl-jzyA-5E784c LV Write Access read/write LV Creation host, time tanerite02, 2018-06-14 17:22:24 -0600 LV Status available # open 1 LV Size 47.09 GiB Current LE 3014 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:33 --- Physical volumes --- PV Name /dev/mapper/3600c0ff00014fdb28089225b01000000p1 PV UUID eTAWLD-7Zk0-PGOu-sVcf-2A2U-u1LP-V0NEFG PV Status allocatable Total PE / Free PE 2980 / 2 PV Name /dev/mapper/3600c0ff00014fdb2b889225b01000000p1 PV UUID IvqLhI-vhJa-OTrF-P0OH-eZzj-QlGZ-xn9PWG PV Status allocatable Total PE / Free PE 2980 / 0 There are two PV's, and they are both present and functioning perfectly well: [root@tanerite02 ~]# blkid /dev/mapper/3600c0ff00014fdb2b889225b01000000p1 /dev/mapper/3600c0ff00014fdb2b889225b01000000p1: UUID="IvqLhI-vhJa-OTrF-P0OH-eZzj-QlGZ-xn9PWG" TYPE="LVM2_member" PARTLABEL="primary" PARTUUID="ea0442bc-74be-4060-8700-f5ed7cdd99ac" [root@tanerite02 ~]# blkid /dev/mapper/3600c0ff00014fdb28089225b01000000p1 /dev/mapper/3600c0ff00014fdb28089225b01000000p1: UUID="eTAWLD-7Zk0-PGOu-sVcf-2A2U-u1LP-V0NEFG" TYPE="LVM2_member" PARTLABEL="primary" PARTUUID="5403feb8-f628-45f0-bda6-795f4874a230" I'm attaching journalctl output from this boot instance in hope that the misleading diagnostic can be avoided. Hmm looking at comment 18 it does look like there would be processed activation of 2 VGs: VG16 and VG04 - which does look like a bug - since pvscan should be scanning only a single arriving device - thus it should complete just exactly 1 VG - but trace seems to be showing more then a single VG has been processed. The other point is - the missing PV seems to be reported for VG16 - while VG04 has all PVs. So can we get attached 'vgcfgbackup' of both these VGs? (So we can check whether the VG has marked PV as missing) Created attachment 1528193 [details] vgcfgbackup VG04MB As requested in comment 19, attached is vgcfgbackup of VG04MB Created attachment 1528194 [details] vgcfgbackup VG16MB As requested in comment 19, vgcfgbackup of VG16MB Created attachment 1528223 [details]
Tar image containing journalctl output from multiple boots
To provide an illustration of the random nature of LVM initialization failures, I put the SUT in a reboot loop for a couple of hours, collecting journalctl output from each iteration. The attached tar image of 27 journalctl files is the result. I just inspected this collection in a fairly crude way:
# grep -i lvm journalctl-??.txt|grep -i failed -e vgreduce
[root@tanerite02 tmp]# grep -i lvm journalctl-??.txt|grep -i -e failed -e vgreduce
journalctl-00.txt:Feb 08 13:40:09 tanerite02 lvm[3207]: Consider vgreduce --removemissing.
journalctl-00.txt:Feb 08 13:40:09 tanerite02 systemd[1]: lvm2-pvscan@253:21.service: Failed with result 'exit-code'.
journalctl-00.txt:Feb 08 13:40:09 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:21.
journalctl-01.txt:Feb 08 11:46:48 tanerite02 lvm[2878]: Consider vgreduce --removemissing.
journalctl-01.txt:Feb 08 11:46:48 tanerite02 systemd[1]: lvm2-pvscan@253:2.service: Failed with result 'exit-code'.
journalctl-01.txt:Feb 08 11:46:48 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:2.
journalctl-02.txt:Feb 08 11:51:31 tanerite02 lvm[3164]: Consider vgreduce --removemissing.
journalctl-02.txt:Feb 08 11:51:31 tanerite02 systemd[1]: lvm2-pvscan@253:20.service: Failed with result 'exit-code'.
journalctl-02.txt:Feb 08 11:51:31 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:20.
journalctl-04.txt:Feb 08 12:00:19 tanerite02 lvm[3181]: Consider vgreduce --removemissing.
journalctl-04.txt:Feb 08 12:00:19 tanerite02 lvm[3179]: Consider vgreduce --removemissing.
journalctl-04.txt:Feb 08 12:00:19 tanerite02 lvm[3198]: Consider vgreduce --removemissing.
journalctl-04.txt:Feb 08 12:00:19 tanerite02 systemd[1]: lvm2-pvscan@253:23.service: Failed with result 'exit-code'.
journalctl-04.txt:Feb 08 12:00:19 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:23.
journalctl-04.txt:Feb 08 12:00:19 tanerite02 systemd[1]: lvm2-pvscan@253:21.service: Failed with result 'exit-code'.
journalctl-04.txt:Feb 08 12:00:19 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:21.
journalctl-04.txt:Feb 08 12:00:19 tanerite02 systemd[1]: lvm2-pvscan@253:25.service: Failed with result 'exit-code'.
journalctl-04.txt:Feb 08 12:00:19 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:25.
...
You will observe that at least one service is found in a failed state in most boot instances, but a few - such as 3, 12, 13, and 14 - had no problem by my simple grep test.
But all of these boots successfully attained multiuser state, hence all the mount units including the lvm volumes were completed okay.
Yep - this need analysis why the system tries to activate VG when no all components are present. There is now new code evaluating presence of all needed PVs for autoactivation of particular VG. In this case it seems - it possibly tries to activate one VG that has all PV - and at the same time it also tries to activate not yet completely present 2nd. VG - and you can observe the MISSING message from this. Probably we can safely assume - that PVs are not disappearing shortly after they appear in the system. We need to find a way of collecting full debug output from the pvscan commands (this has proven to be very difficult so far in all the other cases like this.) RH, is fix version: lvm2-2.03.02-4.el8 be included in the next RHEL8 RC build? thanks, trinh Regarding comment 24: can I collect the desired pvscan debug output by modifying /usr/lib/systemd/system/lvm2-pvscan@.service on the reproducing system? I'll give it a try and report again on results. BTW, prior to making any modification of the configuration, I ran a quick test on the reproducing system after updating to snapshot 5. I still find intermittent failures reported in journalctl. In a test limited to 20 reboots, 3 iterations reported errors. One iteration reported only 1 pvscan failure, two other iterations reported two pvscan failures: [root@tanerite02 SI.1]# grep -i lvm jou* | grep -i -e failed -e vgreduce journalctl.out.16:Feb 12 12:25:18 tanerite02 lvm[3112]: Consider vgreduce --removemissing. journalctl.out.16:Feb 12 12:25:18 tanerite02 lvm[3116]: Consider vgreduce --removemissing. journalctl.out.16:Feb 12 12:25:19 tanerite02 systemd[1]: lvm2-pvscan@253:24.service: Failed with result 'exit-code'. journalctl.out.16:Feb 12 12:25:19 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:24. journalctl.out.16:Feb 12 12:25:19 tanerite02 systemd[1]: lvm2-pvscan@253:27.service: Failed with result 'exit-code'. journalctl.out.16:Feb 12 12:25:19 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:27. journalctl.out.17:Feb 12 12:27:27 tanerite02 lvm[3053]: Consider vgreduce --removemissing. journalctl.out.17:Feb 12 12:27:27 tanerite02 systemd[1]: lvm2-pvscan@253:24.service: Failed with result 'exit-code'. journalctl.out.17:Feb 12 12:27:27 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:24. journalctl.out.19:Feb 12 12:31:44 tanerite02 lvm[3113]: Consider vgreduce --removemissing. journalctl.out.19:Feb 12 12:31:44 tanerite02 lvm[3109]: Consider vgreduce --removemissing. journalctl.out.19:Feb 12 12:31:44 tanerite02 systemd[1]: lvm2-pvscan@253:24.service: Failed with result 'exit-code'. journalctl.out.19:Feb 12 12:31:44 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:24. journalctl.out.19:Feb 12 12:31:44 tanerite02 systemd[1]: lvm2-pvscan@253:27.service: Failed with result 'exit-code'. journalctl.out.19:Feb 12 12:31:44 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:27. Created attachment 1534263 [details] Tar image of journalctl logs with pvscan -vv verbosity Following up on my intent in comment 26, I added a couple of v's to this line in lvm2-pvscan@.service: ExecStart=/usr/sbin/lvm pvscan -vv --cache --activate ay %i I ran another 20 reboot experiment with that change in place. Iterations 7 and 8 reproduced pvscan failure results: [root@tanerite02 SI-vv]# grep -i lvm jour*|grep -i fail journalctl.out.07:Feb 12 15:01:47 tanerite02 systemd[1]: lvm2-pvscan@253:26.service: Failed with result 'exit-code'. journalctl.out.07:Feb 12 15:01:47 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:26. journalctl.out.08:Feb 12 15:04:10 tanerite02 systemd[1]: lvm2-pvscan@253:24.service: Failed with result 'exit-code'. journalctl.out.08:Feb 12 15:04:10 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:24. I'm attaching a tar image containing journalctl output from all iterations so your have both failure and success cases. I'll try again with -vvvv verbosity. Yes, sorry about that, anything less than -vvvv is not very helpful. Created attachment 1534265 [details]
journalctl logs with pvscan -vvvv
Another set of 20 journalctl output files. This time we have:
ExecStart=/usr/sbin/lvm pvscan -vvvv --cache --activate ay %i
This tar image contains 3 iterations reporting failure:
journalctl.out.01:Feb 12 16:00:50 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:25.
journalctl.out.08:Feb 12 16:17:01 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:27.
journalctl.out.19:Feb 12 16:54:21 tanerite02 systemd[1]: Failed to start LVM event activation on device 253:24.
In bug 1672062 we have been looking into similar issues, but when using lvmetad. The problem here may have a similar aspect in that multiple (often concurrent) pvscans believe they are doing initialization. We need to ensure that the first pvscan (doing initialization) completes before any other pvscans run (e.g. a simple file lock would probably suffice). Thanks for the full debugging, I think that has identified the problem. The initial pvscan is trying to activate all VGs, even when those VGs are not yet complete. The activation finds the VG is incomplete and prints the error (about removemissing). I believe this faulty logic is due to some old lvmetad logic that has carried over and no longer applies. We should only try to activate VGs that are complete (which it already does except in the initialization case). The fix for that looks simple and I'll test it tomorrow. There might also be a need to strengthen the serialization of concurrent pvscans with the initial pvscan as mentioned above. Thanks for the comment 30 and 31 updates. Since I have a couple of these multipath device mapper test systems and can easily reproduce the issue, I am willing to test a candidate fix as soon as you make one available. I have two commits on the branch below for this issue; I've not tested them thoroughly yet. The first commit is the main bug fix. The second commit is related but probably not required (and may be expanded as mentioned in the description.) One or both would need to be backported for 8.0. https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-pvscan-init-1 Hi David, will the commits make it in the next RHEL8 snapshot-6 build? thanks, trinh This is not yet fully tested, and we have not yet determined how extensive the fix(es) need to be to make everything work (we don't want to add more change than necessary). There are between one and three parts that could be included in this fix. > am willing to test a candidate fix as soon as you make one available. Hi Randy, here is an initial test build that you could try: http://people.redhat.com/teigland/.bz1670209/lvm2-2.03.02-4test1.el8.x86_64.rpm Randy, can you please verify this test build and provide your test result? thanks, trinh I tried to install the comment 38 rpm and found it is missing a dependency: [root@tanerite02 rpms]# rpm -i lvm2-2.03.02-4test1.el8.x86_64.rpm error: Failed dependencies: lvm2-libs = 8:2.03.02-4test1.el8 is needed by lvm2-8:2.03.02-4test1.el8.x86_64 A query shows it might be missing at least one more: [root@tanerite02 rpms]# rpm -q -R lvm2-2.03.02-4test1.el8.x86_64.rpm |grep test config(lvm2) = 8:2.03.02-4test1.el8 lvm2-libs = 8:2.03.02-4test1.el8 I uploaded the lvm2-libs, I'm not sure why that's needed, I don't really understand rpms... there are a couple dozen rpms produced by every lvm build. Maybe you can force install or something like that if it still complains. I attempted rpm -i adding the new lvm2-libs rpm:
[root@tanerite02 rpms]# ls -l lvm2*rpm
-rw-r--r--. 1 root root 1541544 Feb 18 13:29 lvm2-2.03.02-4test1.el8.x86_64.rpm
-rw-r--r--. 1 root root 1103888 Feb 20 09:57 lvm2-libs-2.03.02-4test1.el8.x86_64.rpm
[root@tanerite02 rpms]# rpm -i lvm2*rpm
error: Failed dependencies:
device-mapper-event = 8:1.02.155-4test1.el8 is needed by lvm2-libs-8:2.03.02-4test1.el8.x86_64
Per suggestion in comment 41, I tried adding --nodeps:
[root@tanerite02 rpms]# rpm -i --nodeps lvm2*rpm
file /usr/lib64/device-mapper/libdevmapper-event-lvm2mirror.so from install of lvm2-libs-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-libs-8:2.03.02-2.el8.x86_64
file /usr/lib64/device-mapper/libdevmapper-event-lvm2raid.so from install of lvm2-libs-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-libs-8:2.03.02-2.el8.x86_64
file /usr/lib64/device-mapper/libdevmapper-event-lvm2snapshot.so from install of lvm2-libs-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-libs-8:2.03.02-2.el8.x86_64
file /usr/lib64/device-mapper/libdevmapper-event-lvm2thin.so from install of lvm2-libs-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-libs-8:2.03.02-2.el8.x86_64
file /usr/lib64/device-mapper/libdevmapper-event-lvm2vdo.so from install of lvm2-libs-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-libs-8:2.03.02-2.el8.x86_64
file /usr/lib64/libdevmapper-event-lvm2.so.2.03 from install of lvm2-libs-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-libs-8:2.03.02-2.el8.x86_64
file /usr/lib64/liblvm2cmd.so.2.03 from install of lvm2-libs-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-libs-8:2.03.02-2.el8.x86_64
file /etc/lvm/lvm.conf from install of lvm2-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-8:2.03.02-2.el8.x86_64
file /etc/lvm/profile/vdo-small.profile from install of lvm2-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-8:2.03.02-2.el8.x86_64
file /usr/lib/systemd/system-generators/lvm2-activation-generator from install of lvm2-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-8:2.03.02-2.el8.x86_64
file /usr/sbin/lvm from install of lvm2-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-8:2.03.02-2.el8.x86_64
file /usr/sbin/lvmpolld from install of lvm2-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-8:2.03.02-2.el8.x86_64
file /usr/share/doc/lvm2/WHATS_NEW from install of lvm2-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-8:2.03.02-2.el8.x86_64
file /usr/share/man/man8/lvcreate.8.gz from install of lvm2-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-8:2.03.02-2.el8.x86_64
file /usr/share/man/man8/lvm.8.gz from install of lvm2-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-8:2.03.02-2.el8.x86_64
file /usr/share/man/man8/lvs.8.gz from install of lvm2-8:2.03.02-4test1.el8.x86_64 conflicts with file from package lvm2-8:2.03.02-2.el8.x86_64
The appearance of conflicts suggested -U might be more appropriate, so:
[root@tanerite02 rpms]# rpm -U --nodeps lvm2*rpm
warning: /etc/lvm/lvm.conf created as /etc/lvm/lvm.conf.rpmnew
[/usr/lib/tmpfiles.d/libgpod.conf:1] Line references path below legacy directory /var/run/, updating /var/run/libgpod â /run/libgpod; please update the tmpfiles.d/ drop-in file accordingly.
[/usr/lib/tmpfiles.d/libstoragemgmt.conf:1] Line references path below legacy directory /var/run/, updating /var/run/lsm â /run/lsm; please update the tmpfiles.d/ drop-in file accordingly.
[/usr/lib/tmpfiles.d/libstoragemgmt.conf:2] Line references path below legacy directory /var/run/, updating /var/run/lsm/ipc â /run/lsm/ipc; please update the tmpfiles.d/ drop-in file accordingly.
[/usr/lib/tmpfiles.d/mdadm.conf:1] Line references path below legacy directory /var/run/, updating /var/run/mdadm â /run/mdadm; please update the tmpfiles.d/ drop-in file accordingly.
[/usr/lib/tmpfiles.d/pesign.conf:1] Line references path below legacy directory /var/run/, updating /var/run/pesign â /run/pesign; please update the tmpfiles.d/ drop-in file accordingly.
[/usr/lib/tmpfiles.d/radvd.conf:1] Line references path below legacy directory /var/run/, updating /var/run/radvd â /run/radvd; please update the tmpfiles.d/ drop-in file accordingly.
[/usr/lib/tmpfiles.d/spice-vdagentd.conf:2] Line references path below legacy directory /var/run/, updating /var/run/spice-vdagentd â /run/spice-vdagentd; please update the tmpfiles.d/ drop-in file accordingly.
[/usr/lib/tmpfiles.d/subscription-manager.conf:1] Line references path below legacy directory /var/run/, updating /var/run/rhsm â /run/rhsm; please update the tmpfiles.d/ drop-in file accordingly.
Since the rpm -U did in fact replace the lvm2 rpms, I felt I was commited to this change, so I edited the tmpfiles.d files as suggested. I also adjusted /etc/lvm/lvm.conf to match the /etc/lvm/lvm.conf.rpmnew.
I then rebuilt the boot and kdump initrd images and rebooted.
First boot has succeeded, the journalctl log shows a 'Couldn't find device' message but in the big picture succeeds:
[root@tanerite02 ~]# systemctl --failed --all
0 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.
[root@tanerite02 ~]# journalctl -ab | grep -i lvm
Feb 20 11:22:53 tanerite02 systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Feb 20 11:22:58 tanerite02 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Feb 20 11:22:58 tanerite02 systemd[1]: Starting LVM event activation on device 253:25...
Feb 20 11:22:58 tanerite02 lvm[2950]: Couldn't find device with uuid L9805A-mwjR-cYyo-i7e6-Q3Lk-tOqE-YduR02.
Feb 20 11:22:58 tanerite02 systemd[1]: Starting LVM event activation on device 253:27...
Feb 20 11:22:58 tanerite02 lvm[2964]: 2 logical volume(s) in volume group "VG04MB" now active
Feb 20 11:22:58 tanerite02 systemd[1]: Started LVM event activation on device 253:25.
Feb 20 11:22:58 tanerite02 systemd[1]: Started LVM event activation on device 253:27.
Feb 20 11:22:58 tanerite02 systemd[1]: Starting LVM event activation on device 253:31...
Feb 20 11:22:58 tanerite02 systemd[1]: Starting LVM event activation on device 253:28...
Feb 20 11:22:58 tanerite02 lvm[3126]: 2 logical volume(s) in volume group "VG16MB" now active
Feb 20 11:22:58 tanerite02 systemd[1]: Started LVM event activation on device 253:31.
Feb 20 11:22:59 tanerite02 systemd[1]: Started LVM event activation on device 253:28.
Since the results have previously differed run to run, I will put the system in a reboot loop for a while and report again later this afternoon on results.
Regarding comment 42 line: Feb 20 11:22:58 tanerite02 lvm[2950]: Couldn't find device with uuid L9805A-mwjR-cYyo-i7e6-Q3Lk-tOqE-YduR02. udevadm info -e shows that is the lvm-pv-uuid for one of the multipaths: P: /devices/virtual/block/dm-27 N: dm-27 L: 50 S: disk/by-id/dm-name-3600c0ff00014fdb25d89225b01000000p1 S: disk/by-id/dm-uuid-part1-mpath-3600c0ff00014fdb25d89225b01000000 S: disk/by-id/lvm-pv-uuid-L9805A-mwjR-cYyo-i7e6-Q3Lk-tOqE-YduR02 S: disk/by-id/scsi-3600c0ff00014fdb25d89225b01000000-part1 S: disk/by-id/wwn-0x600c0ff00014fdb25d89225b01000000-part1 S: disk/by-partlabel/primary S: disk/by-partuuid/4a695ff4-6a60-4367-9fa1-5048e5bce619 S: mapper/3600c0ff00014fdb25d89225b01000000p1 E: DEVLINKS=/dev/mapper/3600c0ff00014fdb25d89225b01000000p1 /dev/disk/by-id/dm-uuid-part1-mpath-3600c0ff00014fdb25d89225b01000000 /dev/disk/by-partlabel/primary /dev/disk/by-partuuid/4a695ff4-6a60-4367-9fa1-5048e5bce619 /dev/disk/by-id/dm-name-3600c0ff00014fdb25d89225b01000000p1 /dev/disk/by-id/lvm-pv-uuid-L9805A-mwjR-cYyo-i7e6-Q3Lk-tOqE-YduR02 /dev/disk/by-id/wwn-0x600c0ff00014fdb25d89225b01000000-part1 /dev/disk/by-id/scsi-3600c0ff00014fdb25d89225b01000000-part1 E: DEVNAME=/dev/dm-27 E: DEVPATH=/devices/virtual/block/dm-27 E: DEVTYPE=disk E: DM_ACTIVATION=1 E: DM_MPATH=3600c0ff00014fdb25d89225b01000000 E: DM_NAME=3600c0ff00014fdb25d89225b01000000p1 E: DM_PART=1 E: DM_SERIAL=3600c0ff00014fdb25d89225b01000000 E: DM_SUSPENDED=0 E: DM_TYPE=scsi E: DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1 E: DM_UDEV_PRIMARY_SOURCE_FLAG=1 E: DM_UDEV_RULES_VSN=2 E: DM_UUID=part1-mpath-3600c0ff00014fdb25d89225b01000000 E: DM_WWN=0x600c0ff00014fdb25d89225b01000000 E: ID_FS_TYPE=LVM2_member E: ID_FS_USAGE=raid E: ID_FS_UUID=L9805A-mwjR-cYyo-i7e6-Q3Lk-tOqE-YduR02 E: ID_FS_UUID_ENC=L9805A-mwjR-cYyo-i7e6-Q3Lk-tOqE-YduR02 E: ID_FS_VERSION=LVM2 001 E: ID_MODEL=LVM PV L9805A-mwjR-cYyo-i7e6-Q3Lk-tOqE-YduR02 on /dev/dm-27 E: ID_PART_ENTRY_DISK=253:23 E: ID_PART_ENTRY_NAME=primary E: ID_PART_ENTRY_NUMBER=1 E: ID_PART_ENTRY_OFFSET=2048 E: ID_PART_ENTRY_SCHEME=gpt E: ID_PART_ENTRY_SIZE=97652736 E: ID_PART_ENTRY_TYPE=e6d6d379-f507-44c2-a23c-238f2a3df928 E: ID_PART_ENTRY_UUID=4a695ff4-6a60-4367-9fa1-5048e5bce619 E: MAJOR=253 E: MINOR=27 E: SUBSYSTEM=block E: SYSTEMD_ALIAS=/dev/block/253:27 E: SYSTEMD_READY=1 E: SYSTEMD_WANTS=lvm2-pvscan@253:27.service E: TAGS=:systemd: E: UDISKS_IGNORE=1 E: USEC_INITIALIZED=31444084 So evidence supports that is a device that hasn't yet been assembled when lvm first starts to look for it, but appears shortly thereafter so the volume groups become fully provisioned. Thanks for testing. The "Couldn't find device with uuid" is expected, but that message is meant to be lowered to the debug level instead of the error level so you don't see it. I've fixed that here. Otherwise the results look good, please let me know if any problems start appearing from other tests. (I may create another build and upload all the rpms so you don't have to work around the silly rpm complaints.) Created attachment 1536878 [details] tar image of journalctl logs I ran a reboot loop using kexec for a while, it hung on iteration 19. I can't identify the problem source on the final iteration beyond it hung printing A start job is running for File Sys5801000000p But the shortened name displayed on the serial console is ambiguous, it matches any of 6 devices known to udev S: mapper/3600c0ff00014fc394e35335801000000p1 S: mapper/3600c0ff00014fc394f35335801000000p1 S: mapper/3600c0ff00014fc395035335801000000p1 S: mapper/3600c0ff00014fdb204a62d5801000000p1 S: mapper/3600c0ff00014fc39f534335801000000p1 S: mapper/3600c0ff00014fc39f634335801000000p1 However, kexec failures on these multipath system are known to occur due to hpsa controller issues so let's skip over that for now, and I'll run a reboot test overnight booting without kexec, that is a complete firmware boot each iteration. I will report again tomorrow on that test. Looking at journalctl output from iterations completed as attached in the tar image, there are not any of the failure messages previously seen suggesting a vgreduce should be undertaken, so it seems promising that the intent of the comment 38 patch was successful. There are many udevd failures logged trying to operate on the simple scsi devices that will become part of multipaths. Let me know if you think those will actually result in failures, if so I can submit a distinct bugzilla. Or perhaps they are related to the udev symlink issue in bug 1642728? Marking verified based on comment #45, as well as by a cursory check of an fs made up of multiple PVs being auto-activated and mounted at boot time. kernel-4.18.0-75.el8 BUILT: Fri Mar 1 11:37:34 CST 2019 lvm2-2.03.02-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 lvm2-libs-2.03.02-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 lvm2-dbusd-2.03.02-6.el8 BUILT: Fri Feb 22 04:50:28 CST 2019 lvm2-lockd-2.03.02-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-libs-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-event-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-event-libs-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-persistent-data-0.7.6-1.el8 BUILT: Sun Aug 12 04:21:55 CDT 2018 Mar 11 15:03:01 hayes-01 systemd[1]: Starting LVM event activation on device 8:33... Mar 11 15:03:01 hayes-01 systemd[1]: Starting LVM event activation on device 8:49... Mar 11 15:03:01 hayes-01 systemd[1]: Starting LVM event activation on device 8:17... Mar 11 15:03:01 hayes-01 systemd[1]: Starting LVM event activation on device 8:65... Mar 11 15:03:01 hayes-01 kernel: ipmi_si dmi-ipmi-si.0: The BMC does not support setting the recv irq bit, compensating, but the BMC needs to be fixed. Mar 11 15:03:01 hayes-01 kernel: ipmi_si dmi-ipmi-si.0: Using irq 10 Mar 11 15:03:01 hayes-01 systemd-udevd[1016]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Mar 11 15:03:01 hayes-01 kernel: ipmi_si dmi-ipmi-si.0: Found new BMC (man_id: 0x0002a2, prod_id: 0x0100, dev_id: 0x20) Mar 11 15:03:01 hayes-01 kernel: ipmi_si dmi-ipmi-si.0: IPMI kcs interface initialized Mar 11 15:03:01 hayes-01 kernel: IPMI SSIF Interface driver Mar 11 15:03:01 hayes-01 systemd-udevd[1043]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Mar 11 15:03:01 hayes-01 systemd-udevd[1057]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Mar 11 15:03:01 hayes-01 systemd-udevd[1056]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Mar 11 15:03:02 hayes-01 lvm[1118]: pvscan[1118] PV /dev/sdb1 online, VG TEST incomplete (need 3). Mar 11 15:03:02 hayes-01 lvm[1118]: pvscan[1118] PV /dev/sdc1 online, VG TEST incomplete (need 2). Mar 11 15:03:02 hayes-01 lvm[1118]: pvscan[1118] PV /dev/sdd1 online, VG TEST incomplete (need 1). Mar 11 15:03:02 hayes-01 lvm[1118]: pvscan[1118] PV /dev/sde1 online, VG TEST is complete. Mar 11 15:03:02 hayes-01 lvm[1118]: pvscan[1118] VG TEST run autoactivation. Mar 11 15:03:02 hayes-01 lvm[1119]: pvscan[1119] PV /dev/sde1 online, VG TEST is complete. Mar 11 15:03:02 hayes-01 lvm[1119]: pvscan[1119] VG TEST skip autoactivation. Mar 11 15:03:02 hayes-01 lvm[1117]: pvscan[1117] PV /dev/sdd1 online, VG TEST is complete. Mar 11 15:03:02 hayes-01 lvm[1117]: pvscan[1117] VG TEST skip autoactivation. Mar 11 15:03:02 hayes-01 lvm[1116]: pvscan[1116] PV /dev/sdc1 online, VG TEST is complete. Mar 11 15:03:02 hayes-01 lvm[1116]: pvscan[1116] VG TEST skip autoactivation. Mar 11 15:03:02 hayes-01 systemd[1]: Started LVM event activation on device 8:49. Mar 11 15:03:02 hayes-01 systemd[1]: Started LVM event activation on device 8:33. Mar 11 15:03:02 hayes-01 systemd[1]: Started LVM event activation on device 8:65. Mar 11 15:03:02 hayes-01 systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling. Mar 11 15:03:02 hayes-01 kernel: EDAC MC0: Giving out device to module sb_edac controller Broadwell SrcID#0_Ha#0: DEV 0000:7f:12.0 (INTERRUPT) Mar 11 15:03:02 hayes-01 kernel: EDAC MC1: Giving out device to module sb_edac controller Broadwell SrcID#1_Ha#0: DEV 0000:ff:12.0 (INTERRUPT) Mar 11 15:03:02 hayes-01 kernel: EDAC MC2: Giving out device to module sb_edac controller Broadwell SrcID#0_Ha#1: DEV 0000:7f:12.4 (INTERRUPT) Mar 11 15:03:02 hayes-01 kernel: EDAC MC3: Giving out device to module sb_edac controller Broadwell SrcID#1_Ha#1: DEV 0000:ff:12.4 (INTERRUPT) Mar 11 15:03:02 hayes-01 kernel: EDAC sbridge: Ver: 1.1.2 Mar 11 15:03:02 hayes-01 lvm[1118]: 1 logical volume(s) in volume group "TEST" now active Mar 11 15:03:02 hayes-01 systemd[1]: Found device /dev/TEST/test. Mar 11 15:03:02 hayes-01 systemd[1]: Started LVM event activation on device 8:17. Mar 11 15:03:02 hayes-01 systemd[1]: Started udev Wait for Complete Device Initialization. Mar 11 15:03:02 hayes-01 systemd[1]: Reached target Local File Systems (Pre). Mar 11 15:03:02 hayes-01 systemd[1]: Mounting /mnt/test... Mar 11 15:03:02 hayes-01 systemd[1]: Mounting /boot... I ran 20 iterations of a crash and reboot loop in which there were no occurrences of the lvm failures as reported in this bug. Marking HPE verified. |