Bug 1672336
Summary: | lvm2 commands do not show logical volumes when lvmetad is used | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Andrew Schorr <ajschorr> | |
Component: | lvm2 | Assignee: | David Teigland <teigland> | |
lvm2 sub component: | Other | QA Contact: | cluster-qe <cluster-qe> | |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | ||
Severity: | medium | |||
Priority: | high | CC: | agk, cmarthal, heinzm, jbrassow, mcsontos, msnitzer, orion, pasik, prajnoha, rhandlin, teigland, thornber, toneata, villapla, zkabelac | |
Version: | 7.6 | Keywords: | ZStream | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | lvm2-2.02.184-1.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1698750 (view as bug list) | Environment: | ||
Last Closed: | 2019-08-21 18:29:54 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1698750 |
Description
Andrew Schorr
2019-02-04 15:56:14 UTC
I just rebooted into kernel 3.10.0-957.10.1.el7.x86_64, and things are working a bit better now. I still see some concerning boot messages: Mar 19 21:59:49 ajserver lvm[1665]: WARNING: lvmetad is being updated by another command (pid 3113). Mar 19 21:59:49 ajserver lvm[1665]: WARNING: Not using lvmetad because cache update failed. Mar 19 21:59:49 ajserver lvm[3113]: 1 logical volume(s) in volume group "vg_data" now active But the pvs, vgs, and lvs commands are producing useful output now. I'm not sure what's going on here. This commit may help (stable-2.02 branch): https://sourceware.org/git/?p=lvm2.git;a=commit;h=dcf8f3111aea1179be1083ace772f3108b06c508 I will try to find a test rpm with this fix for you to try. This looks like another ZStream candidate. I am preparing a 7.6 scratch build Could you please test with the following 7.6 build with the patch included http://people.redhat.com/mcsontos/repos/bz1672336/ I installed the repo file, upgraded the rpms, and rebooted. After rebooting: [schorr@ajserver ~]$ sudo journalctl -b | grep lvm Mar 22 16:35:30 ajserver lvm[3109]: WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds. Mar 22 16:35:31 ajserver lvm[3109]: 1 logical volume(s) in volume group "vg_data" now active And pvs, vgs, and lvs seem to be working properly. Regards, Andy *** Bug 1691826 has been marked as a duplicate of this bug. *** IIUC this is a race, so may not be 100% reproducible. David, you fixed this, so may have an idea how to test. Are multiple VGs on MD a requirement? Or just multiple VGs? Right, this bug is not directly reproducable (and doesn't require MD). What we probably want to do is something like: - kill and restart lvmetad (and don't run any lvm commands, so it remains uninitialized) - barrage lvmetad with a number of concurrent pvscan --cache -aay <device> commands. In real life this comes about during system startup, where multiple devices are appear to the system all at once, and systemd/udev fires many pvscan serivces. But we might be able to simulate similar by running 'udevadm trigger' (we'll need to check if that produces serial or parallel pvscans.) What we want to do is set up a potential race among many pvscan's trying to initialize lvmetad. udevadm trigger doesn't do that... we might have to cut out systemd/udev/services and test this by directly running a bunch of parallel 'pvscan --cache -aay <device>' commands against the uninitialized lvmetad. (Part of the problem we've had in this area is that our testing and simulation has been rather different from what the system actually runs, but systemd/udev do not make it easy to reproduce their actions outside of an actual system startup.) We still see the "WARNING: Not updating lvmetad because cache update failed." messages at boot from time to time on a variety of test machines running the latest rpms. Does that warrant not marking this bug verified? We do see the proper pvs/lvs listed after boot however, and we tried the quick check listed in comment #9 and saw no messages. A couple test machines just last week: [root@hayes-01 ~]# grep "cache update failed" /var/log/message* /var/log/messages-20190630:Jun 26 14:21:42 hayes-01 lvm: WARNING: Not using lvmetad because cache update failed. /var/log/messages-20190630:Jun 26 14:21:42 hayes-01 lvm: WARNING: Not updating lvmetad because cache update failed. /var/log/messages-20190630:Jun 26 16:18:17 hayes-01 lvm: WARNING: Not updating lvmetad because cache update failed. /var/log/messages-20190630:Jun 26 16:18:17 hayes-01 lvm: WARNING: Not updating lvmetad because cache update failed. /var/log/messages-20190630:Jun 26 16:18:17 hayes-01 lvm: WARNING: Not using lvmetad because cache update failed. /var/log/messages-20190630:Jun 26 17:16:45 hayes-01 lvm: WARNING: Not using lvmetad because cache update failed. /var/log/messages-20190630:Jun 26 17:16:45 hayes-01 lvm: WARNING: Not updating lvmetad because cache update failed. /var/log/messages-20190630:Jun 26 17:16:45 hayes-01 lvm: WARNING: Not updating lvmetad because cache update failed. [root@hayes-03 ~]# grep "cache update failed" /var/log/message* /var/log/messages-20190630:Jun 24 12:02:06 hayes-03 lvm: WARNING: Not using lvmetad because cache update failed. /var/log/messages-20190630:Jun 24 12:02:06 hayes-03 lvm: WARNING: Not updating lvmetad because cache update failed. /var/log/messages-20190630:Jun 24 14:01:54 hayes-03 lvm: WARNING: Not using lvmetad because cache update failed. /var/log/messages-20190630:Jun 26 13:47:33 hayes-03 lvm: WARNING: Not using lvmetad because cache update failed. Jun 26 14:21:41 hayes-01 systemd: Starting LVM2 PV scan on device 8:113... Jun 26 14:21:41 hayes-01 lvm: WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds. Jun 26 14:21:41 hayes-01 lvm: WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds. Jun 26 14:21:41 hayes-01 lvm: WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds. Jun 26 14:21:41 hayes-01 multipathd: sdg: add path (uevent) Jun 26 14:21:41 hayes-01 lvm: WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds. Jun 26 14:21:41 hayes-01 lvm: WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds. Jun 26 14:21:41 hayes-01 multipathd: sdg: spurious uevent, path already in pathvec Jun 26 14:21:42 hayes-01 kernel: cryptd: max_cpu_qlen set to 1000 Jun 26 14:21:42 hayes-01 multipathd: sdf: add path (uevent) Jun 26 14:21:42 hayes-01 multipathd: sdf: spurious uevent, path already in pathvec Jun 26 14:21:42 hayes-01 multipathd: sdi: add path (uevent) Jun 26 14:21:42 hayes-01 multipathd: sdi: spurious uevent, path already in pathvec Jun 26 14:21:42 hayes-01 multipathd: sdk: add path (uevent) Jun 26 14:21:42 hayes-01 multipathd: sdk: spurious uevent, path already in pathvec Jun 26 14:21:42 hayes-01 multipathd: sdc: add path (uevent) Jun 26 14:21:42 hayes-01 multipathd: sdc: spurious uevent, path already in pathvec Jun 26 14:21:42 hayes-01 multipathd: sdb: add path (uevent) Jun 26 14:21:42 hayes-01 multipathd: sdb: spurious uevent, path already in pathvec Jun 26 14:21:42 hayes-01 kernel: AVX2 version of gcm_enc/dec engaged. Jun 26 14:21:42 hayes-01 kernel: AES CTR mode by8 optimization enabled Jun 26 14:21:42 hayes-01 kernel: alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni) Jun 26 14:21:42 hayes-01 kernel: alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni) Jun 26 14:21:42 hayes-01 multipathd: sda: add path (uevent) Jun 26 14:21:42 hayes-01 multipathd: sda: spurious uevent, path already in pathvec Jun 26 14:21:42 hayes-01 kernel: dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.3) Jun 26 14:21:42 hayes-01 systemd: Found device PERC_H330_Adp 2. Jun 26 14:21:42 hayes-01 systemd: Activating swap /dev/disk/by-uuid/a65a2321-6664-44b9-9786-cdabc080c63a... Jun 26 14:21:42 hayes-01 systemd: Found device PERC_H330_Adp 1. Jun 26 14:21:42 hayes-01 lvm: WARNING: lvmetad is being updated by another command (pid 1190). Jun 26 14:21:42 hayes-01 lvm: WARNING: lvmetad is being updated by another command (pid 1190). Jun 26 14:21:42 hayes-01 lvm: WARNING: Not using lvmetad because cache update failed. Jun 26 14:21:42 hayes-01 kernel: Adding 4194300k swap on /dev/sda2. Priority:-2 extents:1 across:4194300k FS Jun 26 14:21:42 hayes-01 systemd: Activated swap /dev/disk/by-uuid/a65a2321-6664-44b9-9786-cdabc080c63a. Jun 26 14:21:42 hayes-01 systemd: Reached target Swap. Jun 26 14:21:42 hayes-01 kernel: iTCO_vendor_support: vendor-support=0 Jun 26 14:21:42 hayes-01 lvm: WARNING: lvmetad is being updated by another command (pid 1190). Jun 26 14:21:42 hayes-01 kernel: iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11 Jun 26 14:21:42 hayes-01 kernel: iTCO_wdt: Found a Wellsburg TCO device (Version=2, TCOBASE=0x0460) Jun 26 14:21:42 hayes-01 lvm: WARNING: lvmetad is being updated by another command (pid 1190). Jun 26 14:21:42 hayes-01 lvm: WARNING: Not updating lvmetad because cache update failed. Jun 26 14:21:42 hayes-01 kernel: iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) Jun 26 14:21:42 hayes-01 lvm: Command failed with status code 5. We should consider the fix incomplete unless/until we find the reason for those lvmetad update failures. There is one known cause for those messages, which is that scanning all the disks legitimately take more than 10 seconds (and there is a workaround for that unusual situation). But, that doesn't seem likely here. We need to collect vvvv debugging from all the pvscan commands run by lvm2-pvscan services. (It might be as simple as setting log/file and log/level in lvm.conf.) Moving this back to assigned based on comment #15. We can target this for rhel7.8 once we have the needed additional debugging. If we want one bz per fix, then this bz is fixed with the patch mentioned above, and comment 14 should be a new bz. If we want this bz to be for all fixes related to this issue, then we can keep this open for further debugging. Let's call this bz fixed by the commit we know improves it, and open a new bug to collect debugging from any similar problems. |