Bug 1986158
| Summary: | udev enumerate interface is slow | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | David Teigland <teigland> |
| Component: | systemd | Assignee: | Michal Sekletar <msekleta> |
| Status: | CLOSED WORKSFORME | QA Contact: | Frantisek Sumsal <fsumsal> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.3 | CC: | dtardon, jbrassow, msekleta, prajnoha, systemd-maint-list, zkabelac |
| Target Milestone: | beta | Keywords: | Performance, Triaged |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-10 15:59:56 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
David Teigland
2021-07-26 19:22:43 UTC
udev_enumerate_scan_devices() crawls /sys under the hood and you're calling it multiple times. I assume that's the cause of the slowness. We call udev_enumerate_scan_devices only once. I tried this in a VM with over 1400 devices through virtio scsi and when I compared latest RHEL7 (systemd-219-78.el7_9.3.x86_64) and latest RHEL8 (systemd-239-50.el8.2.x86_64) version of systemd/systemd-udevd, both with and without libudev to get list of block devices (devices/obtain_device_list_from_udev in lvm.conf). I noticed a performance drop while using libudev interface, looking at strace/ltrace/callgrind: - the number of certain syscalls (open/openat/fstat...) is much higher on rhel8 version (e.g. taking 'open'+'openat' it is 13037+4315 in RHEL7 vs 79143 in RHEL8 - 'udev_enumerate_scan_devices' taking 41643 usecs/call in RHEL7 vs 1455003 usecs in RHEL8 (and other functions having slowed down a bit too, like udev_device_new_from_syspath...) - comparing the callgrind logs from both, it seems the performance drop seems to be caused by internal hashmap use inside libudev and its siphash backend (probably because the systemd code in RHEL8 is now more shared with udev in recent versions? Just a guess...) Here are the all the logs I've collected: https://prajnoha.fedorapeople.org/bz1986158/ (I'll also attach to this bz) Also, is it really necessary to parse the sysfs while the only thing we need is the list of block devices and/or a few properties that udev already knows? This information is already in udev database so why the need to parse sysfs? Would it be possible to provide an extension to the libudev API for getting this information directly from udev db, bypassing any sysfs access/parsing? Created attachment 1820775 [details]
strace, ltrace, callgrind logs while using and not using libudev RHEL7 and RHEL8
The logs from strace, ltrace and callgrind on RHEL7 vs RHEL8 both with using libudev and without using libudev in lvm to get full list of block devices.
Does udev_enumerate_scan_devices change the list of devices it gets from sysfs before giving it to the caller? If so, what change is it making, and if not why are we not going to sysfs directly? I assume the basic idea is - the udev is supposed to give us list of block devices we should see as usable/potential PV devices. So the 'so called' private devices (i.e. mpath legs, raid legs...) are not passed to us. If we are not using all the knowledge udev DB has - lvm2 would need to reproduce every type of hook ATM used by udev to recognize these device (which lvm2 already does for numerous types). (In reply to Zdenek Kabelac from comment #7) > I assume the basic idea is - the udev is supposed to give us list of block > devices we should see as usable/potential PV devices. > > So the 'so called' private devices (i.e. mpath legs, raid legs...) are not > passed to us. It doesn't do any filtering AFAICT. One thing that udev_enumerate_scan_devices does is add link names to the basic set of devices in sysfs. I assume function 'udev_device_get_is_initialized()' is provided by udev to recognize whether we are accessing a device that is meant to be accessed. Although I believe there is a chaos since the udev block device maintenance is mess on all sides, but IMHO fix should be oriented to be system-wide - as all the 'disk' utilities should understand the privacy logic - thus fixing it 'just for lvm2' still leave us in troubles when other tools will randomly access our devices unexpectedly. i.e. the existing issue is - while we do implement 'retry' operation for public LVs, there is no such 'retry' for subLVs, thus if there is a parallel race access/open on those - lvm2 ATM may leak them in table. (In reply to Zdenek Kabelac from comment #9) > Although I believe there is a chaos since the udev block device maintenance > is mess on all sides, > but IMHO fix should be oriented to be system-wide - as all the 'disk' > utilities should understand > the privacy logic - thus fixing it 'just for lvm2' still leave us in > troubles when other tools will > randomly access our devices unexpectedly. Yes, the way it works now with udev is that each block subsystem marks "private" or "unusable" devices in its own way in udev rules, thus leaving any udev db user to check all possible variables these subsystems may set. There's no coordination nor any standard defined here on how such devices should be marked in a single way. However, I don't think this is quite a problem of udev because this is something related to block devices only I suppose. And that's actually the part that SID is trying to cover within one of its primary goals (to have a standard defined on marking block device state for sharing among other users/subsystems). The primary clear issue here is that udev should provide an easy and quick way of simply enumerating devices with basic filtering like "all block devices" etc. And this is where the regression is seen right now - the traces point to some internal hash usage which wasn't the bottleneck there before (my guess is this is due to the code being merged/shared with systemd code). Also, if we're not interested in any of sysfs info and we just need to get the list of devices as udev db has in its records, we should have a way to simply bypass the sysfs scans and get the list from udev db directly as it's all already there in udev db. I was trying to reproduce this on latest RHEL-8 and I can see that lvm commands are much slower compared to RHEL-9 (on the system with the same amount of block devices), however, I don't observe slowness that is mentioned in the bug description (i.e. pvscan taking 100 seconds to complete). Hence I am closing the BZ for now (based on my offline discussion with Peter Rajnoha). Feel free to double check my findings and reopen if needed. |