Bug 1671036
| Summary: | Timed out waiting for devices/LV's during boot | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | jhouston |
| Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> |
| Status: | CLOSED NOTABUG | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.6 | CC: | agk, akarvi, alex.wang, apanagio, bubrown, heinzm, igreen, jbrassow, jhouston, jmagrini, loberman, mgandhi, msnitzer, nkshirsa, nweddle, pdwyer, prajnoha, rsunog, spanjikk, systemd-maint-list, teigland, thornber, zkabelac |
| Target Milestone: | rc | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-04-01 16:04:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 13
loberman
2019-02-05 15:43:42 UTC
(In reply to loberman from comment #13) > Folks, this is becoming very urgent for support. > > The churn on cases coming in, and review for this issue with only best > attempts at a workaround is expanding at a rapid rate. > Please investigate this with as much urgency as possible. Let's summarize then: - the other BZ you mention is bz #1588032, right? Are there any others? (I'd probably create a tracking BZ for all these bugs with "timeout waiting on LVM during boot") - there are differences we need to take into account first whenever we deal with these: - is this RHEL6 or RHEL7+ (RHEL6 is non-systemd, RHEL7+ is systemd environment), but I think they're mostly RHEL7+ - is this with lvmetad or not (use_lvmetad=0/1 in lvm.conf) - if with lvmetad (default): - we rely on udev event-based activation where lvm2-pvscan@<major>:<minor>.service is important (then there could be some logs there as well - systemctl status lvm2-pvscan@<major>:<minor>.service) - since this is event-based activation, LVM is activated as soon as the VG is complete (it has all PVs present in the system), no matter if this is during boot or when the system is up and running - it also works with stacked LVM (an LV used as PV in a VG where another LV is, then this LV used as PV...) - the lvm2-activation-generator is NOP, it exits immediately, doing nothing (just sees the use_lvmetad is set to 1) - it can't activate partial VGs or event VGs that contain degraded raid LVs, it always needs all PVs for a VG to be present (this is because "vgchange -aay" is called inside pvscan on the last PV for a VG that we see in the system based on uevents from lvm2-pvscan@<major>:<minor>.service) - if without lvmetad (non-default): - we DON'T rely on udev event-based activation, so lvm2-pvscan@<major>:<minor>, even if executed and seen in the logs, it is NOP, it exits immediately, doing nothing (we haven't found an effective way of making this conditional, just running it and let it see that use_lvmetad=0 is much quicker than any other method of trying to make this service conditional) - the lvm2-activation-generator generates 3 services which are placed at certain point in time during boot: 1) lvm2-activation-early.service (executed before cryptsetup.target, so there's a chance to have LVM on top of which there's encryption layer) 2) lvm2-activation.service (executed after cryptsetup.target and before local-fs-pre.target, this is the usual time of activation where we need to activate LVM on top of local base devices before filesystems are mounted) 3) lvm2-activation-net.service (executed after iscsi/fcoe/rbdmap.service and before remote-fs-pre.target, this is the time of activation where we have to wait for network-attached base devices) IMPORTANT NOTE: WITH ASYNCHRONOUS SCSI SCANS, THE "AFTER: ISCSI FCOE..." DOESN'T WORK ANYMORE - this is because when iscsi/fcoe.service's main executable finishes, the device might not be fully set in the system and visible for LVM to scan! - without event-based activation, if the devices are not all present which are needed to make the VG activatable during boot at certain point in time during any of the 3 activation services above, the LVM VG is not activated - the activatable here means we either have all the PVs making up the VG in place OR we don't, but the LVs within VG can be activated in degraded mode (...the RAID LVs) - without event-based activation, stacked LVM is not automatically activated, just the three scenarios caught by any of the 3 activation services mentioned above - RHEL8 has dropped lvmetad and uses different scheme for tracking the VG completeness - this should work as with lvmetad... (there's "event_activation=0/1 with default "1" there in RHEL8 instead of use_lvmetad setting) - there are timeouts posed by systemd when waiting for a device (e.g. it waits certain amount of time for an LV to appear), if the device is not activated till that time, the timeout happens and if this was a mount point, we're usually dropped to a rescue/debug shell and boot sequence is paused - systemd waits on device only if it's referenced by another systemd unit (e.g. mount units), otherwise systemd has no reason to wait on that device - whenever we debug these bootup problems, we need systemd+udev debug log with proper timing, that is: - adding "debug" to kernel command line - then collecting debug logs with "journalctl -b -o short-precise" for current boot - it's good to colect all lvm-related service statuse as well (that is all lvm2-pvscan@<major>:<minor.service> and others) - when using "lvmdump -u -l -s", we get both journalctl and well as all service statuses within the dump, so it's always good to call "lvmdump -u -l s" (instead of relying on sosreport purely which doens't have all we need...) - if there are numerous devices on the system, it's possible udev+systemd machinery is getting into timeouts because the devices are not present in time: - the default udev timeout is 180 seconds (may be different in various versions though, some older versions had 30 seconds as well), if this timeout happens, udev kills the udevd worker and this is visible in the logs as "udev ... returned with error code 0x100" or "systemd-udevd[pid]: <device name>: Worker [pid] processing SEQNUM=<udev seqnum> killed" - there's also systemd timeout for various systemd units (including mount units) - if there are delays somewhere, they need to be inspected and considered - if delays are inevitable (very high number of devices, access to devices is slow and can't be made faster, or there are custom udev rules installed reacting to the event), then we need to consider increasing the timeouts So this is, in nutshell, what we're looking for when debugging these problems. The most important is: - to have the "debug" added to kernel command line - to have "lvmdump -u -l -s" collected (...but right after the problem, not after any manual activation or further touching of the system, if that's possible) - the sosreport is a plus - not changing any of configuration before trying to get the logs (mainly not stepping from use_lvmetad=0 and then use lvmetad=1, just providing the debug for the case that fails and make this clear) As for this report, we've noticed there's a problem with partition signatures mixed with PV signatures which confuses blkid and hence the event-based LVM activation. As for the other reports, let's go through them (please, remind me which are they exactly - let's put that in a tracking BZ), but I think they mostly ended up with us waiting on proper debug logs... I think the work we are doing in fedora bug 1672062 may apply here also. Unless we're talking about the root file system, failing to activate an LV and mount the fs is IMO not a good reason to boot into emergency mode. I wonder if this is a new behavior that systemd introduced by default? If so, maybe a systemd bug should be opened to handle these failures more gracefully. I notice that systemd.mount(5) mentions something similar in relation to the nofail fstab option:
nofail
With nofail this mount will be only wanted, not required, by local-fs.target or
remote-fs.target. This means that the boot will continue even if this mount point
is not mounted successfully.
Looking over SOS - why there is separate 'root' & 'usr' volume ? Ramdisk is 'tuned' to switch to single volume '/' which contains /usr. (In reply to loberman from comment #13) > Folks, this is becoming very urgent for support. > > The churn on cases coming in, and review for this issue with only best > attempts at a workaround is expanding at a rapid rate. > Please investigate this with as much urgency as possible. Two things might help make more progress here: 1. More debugging from the timeout cases (e.g. debug on kernel command line, and -vvvv from lvm2-pvscan services). AFAICT we only have enough debugging to narrow down a cause from the fedora case in bug 1672062. 2. Test results from the scratch build in comment 16 in cases where the problems were reproducable. Has the customer fixed partition signature as mentioned on comment 10. It's crucially important that blkid is *NOT* reporting /dev/sda3 as DOS partition type device. The problem is there is not executed 'pvscan' on /dev/sda3 - because it's not lvm2 member device. So then /home in vg00 cannot be activated and thus there is timeout. So please make sure first /dev/sda3 is *NOT* DOS (follow-up guide in comment 10 very carefully so your delete *ONLY* DOS signature. Since this BZ becomes somewhat mixture of several different reports together - let's restate what we know ATM. The original bug where the device had PV signature with a DOS signature as well - required a fix from comment 10. Followed next report experienced timeout of systemd mouting units - requires fix in udev rules find in this patch: https://github.com/lnykryn/systemd-rhel/pull/280/commits/de8c8c0f5a39fb113894779e3afcd1f8c10f41fb Which is effectively duplicate of bug 1666612. So if there is yet *ANOTHER* case mixed in these 31 comments and is not resolved by neither of suggested approaches - please open *NEW* BZ with full description of the state and SOS report of the machine. There's another related issue mixed in here which is fedora bug 1672062. (In reply to David Teigland from comment #33) > There's another related issue mixed in here which is fedora bug 1672062. created bug 1691826 for this lvmetad initialization timing issue. So closing this BZ - as resolution for original bug is solved. Timeout issue happens mostly because of udev missing bugfix. Remaining lvmetad issue is tracked with bug 1691826. Please follow-up advice - and open NEW BZ with new logs - do not reopen this bug which is considered 'fixed' (regarding of original problem). Note - also make sure - the user has fixed UDEV rules - which are otherwise major cause of big system slowdown. Once this is all fixed - please include new logs from such systems (possibly with sos report and archive of udev rules from a system) and open new BZ. |