Bug 2023213
| Summary: | Autoactivation of thin pool during boot fails | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] LVM and device-mapper | Reporter: | Fiona Ebner <f.ebner> | ||||
| Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> | ||||
| lvm2 sub component: | Activating existing Logical Volumes | QA Contact: | cluster-qe <cluster-qe> | ||||
| Status: | CLOSED NOTABUG | Docs Contact: | |||||
| Severity: | unspecified | ||||||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, teigland, thornber, zkabelac | ||||
| Version: | unspecified | Flags: | pm-rhel:
lvm-technical-solution?
pm-rhel: lvm-test-coverage? |
||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-11-17 11:18:00 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Fiona Ebner
2021-11-15 08:42:16 UTC
In the lvm group we don't test event-based autoactivation from the initrd as described in the forum, so there may be some issues that we've not run across. Comment 14 in the forum thread is correct. The file /run/lvm/vgs_online/<vgname> prevents the VG from being activated a second time. If the VG was not activated the first time, then that's the problem to look at. One possibility is that the VG contains thin pools, and lvm uses external tools to check thin pools prior to activating them (i.e. thin_check). If the initrd does not contain that command, then the lvm autoactivation command will likely fail to autoactivate thin pools. Thank you for taking a look. Yes, in both cases it's a thin pool that doesn't activate properly. The initrd should contain the thin_check command (as a symlink to pdata-tools). If the binary were simply not present, I would expect a "WARNING: Check is skipped, please install recommended missing binary <path/to/binary>!" message in the log. I suspect that there is a crash, because there is no "<VG>: autoactivation failed." message in the log either. The last log entries from the pvscan command are right before executing the thin_check command. But if the check command itself were problematic, it should still be handled gracefully, right? After these messages pvscan just calls wait() so it's probably just waiting for thin_check to finish running (I think it can run for a long time). 15:04:45.986728 pvscan[474] activate/dev_manager.c:2297 Running check command on /dev/mapper/pve-data_tmeta 15:04:46.5731 pvscan[474] config/config.c:1474 global/thin_check_options not found in config: defaulting to thin_check_options = [ "-q" ] 15:04:46.5756 pvscan[474] misc/lvm-exec.c:71 Executing: /usr/sbin/thin_check -q /dev/mapper/pve-data_tmeta 15:04:46.5966 pvscan[474] misc/lvm-flock.c:37 _drop_shared_flock /run/lock/lvm/V_pve. 15:04:46.6054 pvscan[474] mm/memlock.c:694 memlock reset. So few points - since there is mentioned Debian. Upstream rules on i.e. Fedora or RHEL are using systemd services for actual autoactivation. The old style was executing 'pvscan' within udev rule (see the end of 69-dm-lvm-metad.rules file) When this happens - there is actually upper time limit enforced by udev - so if checking of thin-pool metadata takes longer time - while udev rules is actually killed - so activation is breaken - likely during 'thin_check' Since it's not clear what is the machine state and what are all the rules and the whole logic used - I could suggest couple tricks: 1. There is relatively easy way to dramatically speed-up thin_check with option '--skip-mappings' You could add this option into lvm.conf thin_check_option option list. (just make sure lvm.conf is also propagated to your ramdisk) 2. Switch the system to use systemd service for auto activation - this probably requires some cooperation with Debian devel people. 3. Add your own startup rule at the end of boot process and simply call 'vgchange -ay' from there. Thank you very much for the suggestions! This (i.e. thin_check taking too long and pvscan getting killed after the udev time limit) seems to be indeed what's happening. To test it, I replaced my thin_check with a binary that sleeps for 6 minutes, and now I do get the same behavior as our users reported. I'll suggest 1. to the affected users as a quick fix, and will discuss 2./3. with my co-workers for the long term. |