Bug 1812801
| Summary: | udev_enumerate_scan_devices fails during stress test | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | David Vaanunu <dvaanunu> | ||||||||
| Component: | lvm2 | Assignee: | David Teigland <teigland> | ||||||||
| lvm2 sub component: | LVM Metadata / lvmetad | QA Contact: | cluster-qe <cluster-qe> | ||||||||
| Status: | CLOSED WONTFIX | Docs Contact: | |||||||||
| Severity: | unspecified | ||||||||||
| Priority: | medium | CC: | agk, heinzm, igor.raits, jbrassow, msekleta, msnitzer, nsoffer, pasik, prajnoha, teigland, zkabelac | ||||||||
| Version: | 8.2 | Keywords: | Reopened | ||||||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||||||
| Target Release: | 8.0 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2021-09-12 07:26:58 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1842053 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
David Vaanunu
2020-03-12 08:33:07 UTC
My best guess here is that the error message "Failed to udev_enumerate_scan_devices." most comes from the lvm2 command trying to talk to udev, which fails. So lvm2 called udev libraty - which told us there is udev (so /run/udev/XXXX is present) but udev itself is not there. So you mostly likely try to run lvm2 command within a 'container' - which is in general unsupported and requires special configuration in lmv.conf udev_rules=0 and mainly disabling udev interaction completely (export DM_DISABLE_UDEV=1) Usage of lvm2 inside container is highly *unrecommended* - user should not manipulate with device resource from a container - if you need to do it - you need to deal with all the associated issues on your system in your very own way. ATM there is no upstream generic support for such usage on lvm2 side (not even in devel). We are aware there are some users trying to use lvm2 this way - but these are most typically some 'specific' environments with its own rules - so in these cases the thing can 'somehow' partial work - but there will be still lots of problems with presence of devices on hosting systems. So I'd suggest to 'pass-through' the operation from a container (with some messaging mechanism) to your host - run lvm2 command on your host - and report result in some 'smart way' back to container. I'm closing this bug - if my guess was wrong and you get the error with command on the bare-metal host with normally running udev - please feel free to reopen bug and provide/attach whole 'lvchange -vvvv .... ' command log. Zdenek, we run this on RHEL 8.2 host configured as RHV hyperpervisor. No container
are involved.
The interesting thing about this failure, is that it happens only when running
a stress test using very high load.
The stress test uses 2 hosts:
- manager host - does all the metadata operations
- regular host - does only read-only operations
We run 50 threads doing:
create LV (on manager host)
activate LV
repeat 20 times:
extend LV (on mananger host)
refresh LV (on regular host)
sleep 0-2 seconds
deactivate LV (on manager host)
remove LV (on regular host)
During the test, we run about 25 lvextend per second. Here are stats from
the test that had these failures:
$ python ~/src/vdsm/tests/storage/stress/extend.py log-stats RHEL82_LongRun_run-regular.log
{
"activating": 31696,
"creating": 31696,
"deactivating": 31694,
"errors": 30,
"extend-rate": 25.077410163052082,
"extending": 633656,
"read-only": 697046,
"refreshing": 633656,
"removing": 31692,
"retries": 0,
"retry-rate": 0.0,
"total-time": 25268,
"warnings": 8024
}
This test simulate the way RHV is using LVM in a cluster, but
using 10-100 higher load.
We are concerned about getting these errors randomly in RHV, failing
real flows.
We can try to run again with -vvvv, but you can also run the stress test
to reproduce this issue:
https://github.com/oVirt/vdsm/blob/master/tests/storage/stress/extend.py
It would be interesting to set lvm.conf obtain_device_list_from_udev=0 which will avoid asking udev for a list of devs and will avoid the code that reports that error. It's possible that udev is not able to handle the volume of changes you are producing. Of course we could end up seeing different errors in the lvm code which is used when not using udev. Yep - it does look like a fix in udev side might be needed. Dropping obtaining device list from udev might 'mask' the issue for lvm2 as such - but will not fix udev side. So while I'd probably advice this as a temporarily 'workaround' on local users system - it's not a solution for major deployment - udev needs to be bug fixed if it can't work on loaded system. The question could be - whether locking is working in all scenarios correctly here on lvm2 side. There are locking changes which makes it impossible now to prevent reading lvm2 metadata in cases, some other nodes are modifying lvm2 metadata (lockless scanning). What happens when device list is 'changed' while udev is iterating over device list is question for udev devel. I'd assume a workaround that prevents parallel execution of lvm2 command may need to be now handled on caller side (it used to be handled in past on lvm2 side) (In reply to David Teigland from comment #3) > It would be interesting to set lvm.conf obtain_device_list_from_udev=0 About 6 years ago we had another stress test on RHEL 6, exposing similar issue. The advice we got from LVM folks was to use obtain_device_list_from_udev=0. https://github.com/oVirt/vdsm/commit/23ce1d87fa98f35d83005e2958ff4065b78ff9d8 In RHEL 7 the issue was fixed and we enabled it again: https://github.com/oVirt/vdsm/commit/e820cc58d0f89480ed4ce701c492501539a0b7ba We can try this again. (In reply to Zdenek Kabelac from comment #4) > The question could be - whether locking is working in all scenarios > correctly here on lvm2 side. How do you suggest to debug this? > What happens when device list is 'changed' while udev is iterating over > device list is question for udev devel. Should we move this bug to udev? > I'd assume a workaround that prevents parallel execution of lvm2 command may > need to be now handled on caller side > (it used to be handled in past on lvm2 side) lvm is not using local locks now? We will try first with obtain_device_list_from_udev=0, and also enable -vvvv to get more data when lvm command fails. Anything else we should do to collect enough info for this issue? (In reply to David Teigland from comment #3) I tried obtain_device_list_from_udev=0 and it solves issue. This was my change: https://gerrit.ovirt.org/c/109343/1/tests/storage/stress/reload.py Tested by running 6 hours stress test: - There is only single host, simulating SPM flows. - Running 10 concurrent workers, each modifying single vg - Every worker simulate volume life cycle in oVirt, modifying 500 lvs in each vg: - create lv - change lv tags (mark lv ready) - deactivate lv - activate lv - doi some I/O - extend lv - deactivate - change lv tag (mark lv for removal) - activate lv - discard lv (using blkdiscard) - deactivate lv - remove lv For more info on the test see: https://gerrit.ovirt.org/c/109342/2/tests/storage/stress/reload.py Before disabling obtain_device_list_from_udev I had 2 failing runs, one with one error (run-01.log.xz) and one with 3 errors. (run-02.log.xz). After disabling I had longer run (4 iterations) with no errors (run-04.log.xz. Errors seen in failing runs: # xzgrep -E 'Error:.+rc=5' run-01.log.xz Error: Command ['lvcreate', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/delay[0-9]+$|", "r|.*|"] hints="none"} global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0} backup { retain_min=50 retain_days=0}', '--autobackup', 'n', '--contiguous', 'n', '--size', '1g', '--addtag', 'OVIRT_VOL_INITIALIZING', '--activate', 'y', '--name', 'lv-0000000000000000000000000000-0359', 'bz1837199-000000000000000000000-0000'] failed rc=5 out='' err='Failed to udev_enumerate_scan_devices.\n Volume group "bz1837199-000000000000000000000-0000" not found.\n Cannot process volume group bz1837199-000000000000000000000-0000' # xzgrep -E 'Error:.+rc=5' run-02.log.xz Error: Command ['lvcreate', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/delay[0-9]+$|", "r|.*|"] hints="none"} global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0} backup { retain_min=50 retain_days=0}', '--autobackup', 'n', '--contiguous', 'n', '--size', '1g', '--addtag', 'OVIRT_VOL_INITIALIZING', '--activate', 'y', '--name', 'lv-0000000000000000000000000000-0133', 'bz1837199-000000000000000000000-0004'] failed rc=5 out='' err='Failed to udev_enumerate_scan_devices.\n Volume group "bz1837199-000000000000000000000-0004" not found.\n Cannot process volume group bz1837199-000000000000000000000-0004' Error: Command ['lvchange', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/delay[0-9]+$|", "r|.*|"] hints="none"} global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0} backup { retain_min=50 retain_days=0}', '--autobackup', 'n', '--activate', 'n', 'bz1837199-000000000000000000000-0006/lv-0000000000000000000000000000-0317'] failed rc=5 out='' err='Failed to udev_enumerate_scan_devices.\n Volume group "bz1837199-000000000000000000000-0006" not found.\n Cannot process volume group bz1837199-000000000000000000000-0006' Tested on RHEL 8.2.1 nightly using: lvm2-2.03.08-3.el8.x86_64 Should RHV use obtain_device_list_from_udev=0 in RHEL 8 until this issue is fixed? Do we need more info for debugging this? Created attachment 1693651 [details]
Failing run 1
Created attachment 1693652 [details]
Failing run 2
Created attachment 1693653 [details]
Successful run using obtain_device_list_from_udev=0
More info, testing on CentOS 7.8: The version with --no-udev uses obtain_device_list_form_udev=0. The run without udev had less errors in pvs, vgs, and lvs (3.7% vs 4.1%) and has almost identical performance. # time ../reload.py run 2020-05-31 21:51:17,783 INFO (reload/pv) Stats: reloads=5263 errors=200 error_rate=3.80% avg_time=0.851 med_time=0.824 min_time=0.140 max_time=2.422 2020-05-31 21:51:17,902 INFO (reload/vg) Stats: reloads=4196 errors=172 error_rate=4.10% avg_time=1.068 med_time=1.037 min_time=0.135 max_time=3.533 2020-05-31 21:51:17,780 INFO (reload/lv) Stats: reloads=4789 errors=198 error_rate=4.13% avg_time=0.935 med_time=0.942 min_time=0.136 max_time=2.633 real 74m41.284s user 72m12.264s sys 27m3.581s # time ../reload.py run --no-udev ... 2020-05-31 20:09:15,257 INFO (reload/pv) Stats: reloads=5192 errors=192 error_rate=3.70% avg_time=0.862 med_time=0.829 min_time=0.132 max_time=2.407 2020-05-31 20:09:15,461 INFO (reload/vg) Stats: reloads=4178 errors=157 error_rate=3.76% avg_time=1.071 med_time=1.020 min_time=0.132 max_time=5.709 2020-05-31 20:09:15,262 INFO (reload/lv) Stats: reloads=4736 errors=177 error_rate=3.74% avg_time=0.945 med_time=0.953 min_time=0.133 max_time=2.393 real 74m37.137s user 73m5.567s sys 25m14.252s David, do we have a reason to disable obtain_device_list_from_udev on RHEL 7.8? Primarily, the obtain_device_list_from_udev setting controls whether lvm will iterate through /dev nodes to get a list of devices to scan, or will iterate through a libudev api to get a list of devices to scan. In theory using the udev api is probably more efficient, but perhaps it makes no real difference for you. There's another behavior that is linked with that config setting: when disabled, lvm will not use udev to check if a device is an mpath or md compononent. It's important that lvm not scan mpath or md components, and disabling this setting makes it somewhat more likely (in unknown cases) that lvm may not detect a dev is an mpath/md component. The mpath/md detection may not be relevant to RHV. We probably can't change the default in RHEL7 because of unknown fallout, but if the non-default works better for RHV it would be fine for you to change it. (In reply to David Teigland from comment #11) > There's another behavior that is linked with that config setting: when > disabled, lvm will not use udev to check if a device is an mpath or > md compononent. It's important that lvm not scan mpath or md components, > and disabling this setting makes it somewhat more likely > (in unknown cases) that lvm may not detect a dev is an mpath/md > component. The mpath/md detection may not be relevant to RHV. We use only multipath devices in lvm commands, for example: 2020-06-11 12:12:08,703-0400 DEBUG (jsonrpc/0) [common.commands] /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /sbin/lvm vgs --config 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/3600a098038304437415d4b6a59684968$|^/dev/mapper/3600a098038304437415d4b6a59684969$|^/dev/mapper/3600a098038304437415d4b6a5968496a$|^/dev/mapper/3600a098038304437415d4b6a5968496b$|^/dev/mapper/3600a098038304437415d4b6a5968496c$|^/dev/mapper/3600a098038304437415d4b6a5968496d$|^/dev/mapper/3600a098038304437415d4b6a5968496e$|^/dev/mapper/3600a098038304437415d4b6a5968496f$|^/dev/mapper/3600a098038304437415d4b6a59684970$|^/dev/mapper/3600a098038304437415d4b6a59684971$|^/dev/mapper/3600a098038304437415d4b6a59684972$|^/dev/mapper/3600a098038304437415d4b6a59684979$|", "r|.*|"] hints="none" } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name --select 'vg_name = 2e7d7e21-917b-4202-b4bc-ba5e6fffa558' (cwd None) (commands:153) All commands issued by vdsm use this filter, typically using only the multipath devices relevant for the command. Looks like the issue about not detecting multipath componenet is not relevant to RHV. David did I got this right? > All commands issued by vdsm use this filter, typically using only the
> multipath devices relevant for the command.
>
> Looks like the issue about not detecting multipath componenet
> is not relevant to RHV.
>
> David did I got this right?
Right, since you are not exposing the mpath component devs to lvm, we don't need the code that tries to distinguish the component from the mpath dev.
This is another udev bug. It seems to me there are three options: 1. If we thought that udev could fix this we could reassign this to udev, but I doubt that would happen. 2. We could leave this as it is and when people hit this (or other udev issues) have them change lvm.conf obtain_device_info_from_udev=0. This is the option we've chosen thus far, and what we get by just closing this bug. 3. We could change the lvm default obtain_device_list_from_udev to 0 and avoid these udev issues by default. My suggestion is 3, given how often I'm telling people to change this, and given that I don't see any value in using udev for obtaining the device list. I'm not sure if it makes more sense to reassign this bz to udev for a fix, or change the lvm default to get the list of devs from /dev instead of from a udev query. Any suggestions? Have we ever had this problem reported for real environment (at customers) or is it just this stress test where we hit this failure? Honestly, I haven't seen error from this part of the code reported before.
Anyway, I'd really like to know why this fails because when I look at libudev code, it just walks through the sysfs content and does filtering on it, but I'm not quite clear where it could fail due to the high number of devices or operations happening at once. There are log_debugs there in that code so we could try to enable these logs, though I'm not sure where the knob is for libudev to enable these (the /etc/udev/udev.conf seems to apply for udevd only?)
There's one more detail - looking into the man page, it also states that:
"On success, udev_enumerate_scan_devices(), udev_enumerate_scan_subsystems() and udev_enumerate_add_syspath() return an integer greater than, or equal to, 0"
...while LVM considers only 0 as success:
if (udev_enumerate_scan_devices(udev_enum)) {
log_error("Failed to udev_enumerate_scan_devices.");
goto out;
}
I don't think it's caused by this, but we can try for sure...
FTR we've hit this issue on CentOS Stream 8. As bug mentions, it happens only under heavy load = not big deal but it probably should be fixed one day :) After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |