Bug 1812801

Summary:

udev_enumerate_scan_devices fails during stress test

Product:

Red Hat Enterprise Linux 8

Reporter:

David Vaanunu <dvaanunu>

Component:

lvm2

Assignee:

David Teigland <teigland>

lvm2 sub component:

LVM Metadata / lvmetad

QA Contact:

cluster-qe <cluster-qe>

Status:

CLOSED WONTFIX

Docs Contact:

Severity:

unspecified

Priority:

medium

CC:

agk, heinzm, igor.raits, jbrassow, msekleta, msnitzer, nsoffer, pasik, prajnoha, teigland, zkabelac

Version:

8.2

Keywords:

Reopened

Target Milestone:

Flags:

pm-rhel: mirror+

Target Release:

8.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-09-12 07:26:58 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1842053

Attachments:

Description	Flags
Failing run 1	none
Failing run 2	none
Successful run using obtain_device_list_from_udev=0	none

Description David Vaanunu 2020-03-12 08:33:07 UTC

Description of problem:

Error "Failed to udev_enumerate_scan_devices." message while running extend.py script

Version-Release number of selected component (if applicable):

vdsm-4.40.0-164.git38a19bb.el8ev.x86_64
rhv-release-4.4.0-13-999.noarch
redhat-release-8.2-0.7.el8.x86_64
python2-subprocess32-3.2.6-14.el7.x86_64


How reproducible:

While trying to reproduce Bug1553133 on RHEL8.2, running extend.py script on 2 hosts. (Create, Extend & Remove LV)

Extreme load test - 25 extends per second

Steps to Reproduce:
1. Copy python script to both hosts from: 
   https://github.com/oVirt/vdsm/blob/master/tests/storage/stress/extend.py

2. Install python2 package on both hosts - python2-subprocess32-3.2.6-14.el7.x86_64

3. On the first host:
   python2 extend.py create-vg <VG_NAME> /dev/mapper/<LUN_ID>
   firewall-cmd --add-port=8000/tcp
   python2 extend.py run-manager -p 8000 <VG_NAME> /dev/mapper/<LUN_ID> 
   2>1_run-manager.log

4. On the second host:
   python2 extend.py run-regular --iterations 1000 -p 8000 <First Host hostnae/IP> <VG_NAME> /dev/mapper/<LUN_ID> 2>1_run-regular.log


Actual results:

Get ERROR Message: err=  Failed to udev_enumerate_scan_devices.


Expected results:

LV commands create, extend and remove - run successfully.
No Errors


Additional info:

*** Test log files

https://drive.google.com/drive/folders/17qInSKdoEX9ysebFNo4KP8Bo8vQ8VBKV
(RHEL82_LongRun_run-manager.log.xz & RHEL82_LongRun_run-regular.log.xz)

*** Full ERROR message:

2020-03-09 10:51:51,000 ERROR   (worker-034) worker failed
Traceback (most recent call last):
  File "extend_new.py", line 480, in regular_worker
    lvm.refresh_lv(lv_name)
  File "extend_new.py", line 138, in refresh_lv
    return self.run("lvchange", "--refresh", full_name)
  File "extend_new.py", line 219, in run
    return run(cmd)
  File "extend_new.py", line 509, in run
    raise CommandError(cmd, p.returncode, out, err)
CommandError: Command ['lvchange', '--config', ' devices {  preferred_names = ["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=[ "a|^/dev/mapper/3600a098038304437415d4b6a59684474$|", "r|.*|" ] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 } ', '--refresh', 'Bug/worker-034-000018'] failed
rc=5
out=
err=  Failed to udev_enumerate_scan_devices.
  Volume group "Bug" not found.
  Cannot process volume group Bug

Comment 1 Zdenek Kabelac 2020-03-12 09:28:01 UTC

My best guess here is that the error message "Failed to udev_enumerate_scan_devices." most comes from the lvm2 command trying to talk to udev, which fails. So lvm2 called udev libraty - which told us there is udev (so /run/udev/XXXX is present) but udev itself is not there.

So you mostly likely try to run lvm2 command within a 'container' - which is in general unsupported and requires special configuration in lmv.conf udev_rules=0 and mainly disabling udev interaction completely (export DM_DISABLE_UDEV=1)

Usage of lvm2 inside container is highly *unrecommended* - user should not manipulate with device resource from a container - if you need to do it - you need to deal with all the associated issues on your system in your very own way.
ATM there is no upstream generic support for such usage on lvm2 side (not even in devel).

We are aware there are some users trying to use lvm2 this way - but these are most typically some 'specific' environments with its own rules - so in these cases the thing can 'somehow' partial work - but there will be still lots of problems with presence of devices on hosting systems.

So I'd suggest to 'pass-through' the operation from a container (with some messaging mechanism) to your host - run lvm2 command on your host - and report result in some 'smart way' back to container.

I'm closing this bug - if my guess was wrong and you get the error with command on the bare-metal host with normally running udev - please feel free to reopen bug and provide/attach whole 'lvchange -vvvv .... ' command log.

Comment 2 Nir Soffer 2020-03-16 18:47:56 UTC

Zdenek, we run this on RHEL 8.2 host configured as RHV hyperpervisor. No container
are involved.

The interesting thing about this failure, is that it happens only when running
a stress test using very high load.

The stress test uses 2 hosts:
- manager host - does all the metadata operations
- regular host - does only read-only operations

We run 50 threads doing:

    create LV (on manager host)
    activate LV
    repeat 20 times:
        extend LV (on mananger host)
        refresh LV (on regular host)
        sleep 0-2 seconds
    deactivate LV (on manager host)
    remove LV (on regular host)

During the test, we run about 25 lvextend per second. Here are stats from
the test that had these failures:

$ python ~/src/vdsm/tests/storage/stress/extend.py log-stats RHEL82_LongRun_run-regular.log
{
    "activating": 31696,
    "creating": 31696,
    "deactivating": 31694,
    "errors": 30,
    "extend-rate": 25.077410163052082,
    "extending": 633656,
    "read-only": 697046,
    "refreshing": 633656,
    "removing": 31692,
    "retries": 0,
    "retry-rate": 0.0,
    "total-time": 25268,
    "warnings": 8024
}

This test simulate the way RHV is using LVM in a cluster, but
using 10-100 higher load.

We are concerned about getting these errors randomly in RHV, failing
real flows.
   
We can try to run again with -vvvv, but you can also run the stress test 
to reproduce this issue:
https://github.com/oVirt/vdsm/blob/master/tests/storage/stress/extend.py

Comment 3 David Teigland 2020-03-16 18:57:12 UTC

It would be interesting to set lvm.conf obtain_device_list_from_udev=0 which will avoid asking udev for a list of devs and will avoid the code that reports that error.  It's possible that udev is not able to handle the volume of changes you are producing.  Of course we could end up seeing different errors in the lvm code which is used when not using udev.

Comment 4 Zdenek Kabelac 2020-03-16 19:15:29 UTC

Yep - it does look like a fix in udev side might be needed.

Dropping obtaining device list from udev might 'mask' the issue for lvm2 as such - but will not fix udev side.

So while I'd probably advice this as a temporarily 'workaround' on local users system - it's not a solution for major deployment - udev needs to be bug fixed if it can't work on loaded system.

The question could be - whether locking is working in all scenarios correctly here on lvm2 side.

There are locking changes which makes it impossible now to prevent reading lvm2 metadata in cases,
some other nodes are modifying lvm2 metadata (lockless scanning).

What happens when device list is 'changed' while udev is iterating over device list is question for udev devel.

I'd assume a workaround that prevents parallel execution of lvm2 command may need to be now handled on caller side
(it used to be handled in past on lvm2 side)

Comment 5 Nir Soffer 2020-03-16 19:25:54 UTC

(In reply to David Teigland from comment #3)
> It would be interesting to set lvm.conf obtain_device_list_from_udev=0

About 6 years ago we had another stress test on RHEL 6, exposing similar
issue. The advice we got from LVM folks was to use
obtain_device_list_from_udev=0.
https://github.com/oVirt/vdsm/commit/23ce1d87fa98f35d83005e2958ff4065b78ff9d8

In RHEL 7 the issue was fixed and we enabled it again:
https://github.com/oVirt/vdsm/commit/e820cc58d0f89480ed4ce701c492501539a0b7ba

We can try this again.

(In reply to Zdenek Kabelac from comment #4)
> The question could be - whether locking is working in all scenarios
> correctly here on lvm2 side.

How do you suggest to debug this?
 
> What happens when device list is 'changed' while udev is iterating over
> device list is question for udev devel.

Should we move this bug to udev?

> I'd assume a workaround that prevents parallel execution of lvm2 command may
> need to be now handled on caller side
> (it used to be handled in past on lvm2 side)

lvm is not using local locks now?


We will try first with obtain_device_list_from_udev=0, and also enable -vvvv
to get more data when lvm command fails.

Anything else we should do to collect enough info for this issue?

Comment 6 Nir Soffer 2020-05-30 13:39:59 UTC

(In reply to David Teigland from comment #3)
I tried obtain_device_list_from_udev=0 and it solves issue.

This was my change:
https://gerrit.ovirt.org/c/109343/1/tests/storage/stress/reload.py

Tested by running 6 hours stress test:

- There is only single host, simulating SPM flows.

- Running 10 concurrent workers, each modifying single vg

- Every worker simulate volume life cycle in oVirt, modifying
  500 lvs in each vg:
  - create lv
  - change lv tags (mark lv ready)
  - deactivate lv
  - activate lv
  - doi some I/O
  - extend lv
  - deactivate
  - change lv tag (mark lv for removal)
  - activate lv
  - discard lv (using blkdiscard)
  - deactivate lv
  - remove lv

For more info on the test see:
https://gerrit.ovirt.org/c/109342/2/tests/storage/stress/reload.py


Before disabling obtain_device_list_from_udev I had 2 failing
runs, one with one error (run-01.log.xz) and one with 3 errors.
(run-02.log.xz). After disabling I had longer run (4 iterations)
with no errors (run-04.log.xz.


Errors seen in failing runs:

# xzgrep -E 'Error:.+rc=5' run-01.log.xz
Error: Command ['lvcreate', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/delay[0-9]+$|", "r|.*|"] hints="none"} global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0} backup { retain_min=50 retain_days=0}', '--autobackup', 'n', '--contiguous', 'n', '--size', '1g', '--addtag', 'OVIRT_VOL_INITIALIZING', '--activate', 'y', '--name', 'lv-0000000000000000000000000000-0359', 'bz1837199-000000000000000000000-0000'] failed rc=5 out='' err='Failed to udev_enumerate_scan_devices.\n  Volume group "bz1837199-000000000000000000000-0000" not found.\n  Cannot process volume group bz1837199-000000000000000000000-0000'

# xzgrep -E 'Error:.+rc=5' run-02.log.xz
Error: Command ['lvcreate', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/delay[0-9]+$|", "r|.*|"] hints="none"} global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0} backup { retain_min=50 retain_days=0}', '--autobackup', 'n', '--contiguous', 'n', '--size', '1g', '--addtag', 'OVIRT_VOL_INITIALIZING', '--activate', 'y', '--name', 'lv-0000000000000000000000000000-0133', 'bz1837199-000000000000000000000-0004'] failed rc=5 out='' err='Failed to udev_enumerate_scan_devices.\n  Volume group "bz1837199-000000000000000000000-0004" not found.\n  Cannot process volume group bz1837199-000000000000000000000-0004'
Error: Command ['lvchange', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/delay[0-9]+$|", "r|.*|"] hints="none"} global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0} backup { retain_min=50 retain_days=0}', '--autobackup', 'n', '--activate', 'n', 'bz1837199-000000000000000000000-0006/lv-0000000000000000000000000000-0317'] failed rc=5 out='' err='Failed to udev_enumerate_scan_devices.\n  Volume group "bz1837199-000000000000000000000-0006" not found.\n  Cannot process volume group bz1837199-000000000000000000000-0006'


Tested on RHEL 8.2.1 nightly using:
lvm2-2.03.08-3.el8.x86_64 


Should RHV use obtain_device_list_from_udev=0 in RHEL 8 until this
issue is fixed?

Do we need more info for debugging this?

Comment 7 Nir Soffer 2020-05-30 13:41:30 UTC

Created attachment 1693651 [details]
Failing run 1

Comment 8 Nir Soffer 2020-05-30 13:41:56 UTC

Created attachment 1693652 [details]
Failing run 2

Comment 9 Nir Soffer 2020-05-30 13:42:49 UTC

Created attachment 1693653 [details]
Successful run using obtain_device_list_from_udev=0

Comment 10 Nir Soffer 2020-05-31 19:04:47 UTC

More info, testing on CentOS 7.8:

The version with --no-udev uses obtain_device_list_form_udev=0.

The run without udev had less errors in pvs, vgs, and lvs (3.7% vs 4.1%)
and has almost identical performance.

# time ../reload.py run

2020-05-31 21:51:17,783 INFO    (reload/pv) Stats: reloads=5263 errors=200 error_rate=3.80% avg_time=0.851 med_time=0.824 min_time=0.140 max_time=2.422
2020-05-31 21:51:17,902 INFO    (reload/vg) Stats: reloads=4196 errors=172 error_rate=4.10% avg_time=1.068 med_time=1.037 min_time=0.135 max_time=3.533
2020-05-31 21:51:17,780 INFO    (reload/lv) Stats: reloads=4789 errors=198 error_rate=4.13% avg_time=0.935 med_time=0.942 min_time=0.136 max_time=2.633

real 74m41.284s
user 72m12.264s
sys 27m3.581s

# time ../reload.py run --no-udev
...
2020-05-31 20:09:15,257 INFO    (reload/pv) Stats: reloads=5192 errors=192 error_rate=3.70% avg_time=0.862 med_time=0.829 min_time=0.132 max_time=2.407
2020-05-31 20:09:15,461 INFO    (reload/vg) Stats: reloads=4178 errors=157 error_rate=3.76% avg_time=1.071 med_time=1.020 min_time=0.132 max_time=5.709
2020-05-31 20:09:15,262 INFO    (reload/lv) Stats: reloads=4736 errors=177 error_rate=3.74% avg_time=0.945 med_time=0.953 min_time=0.133 max_time=2.393

real 74m37.137s
user 73m5.567s
sys 25m14.252s

David, do we have a reason to disable obtain_device_list_from_udev
on RHEL 7.8?

Comment 11 David Teigland 2020-06-01 16:19:03 UTC

Primarily, the obtain_device_list_from_udev setting controls whether lvm will iterate through /dev nodes to get a list of devices to scan, or will iterate through a libudev api to get a list of devices to scan.  In theory using the udev api is probably more efficient, but perhaps it makes no real difference for you.  There's another behavior that is linked with that config setting: when disabled, lvm will not use udev to check if a device is an mpath or md compononent.  It's important that lvm not scan mpath or md components, and disabling this setting makes it somewhat more likely (in unknown cases) that lvm may not detect a dev is an mpath/md component. The mpath/md detection may not be relevant to RHV.  We probably can't change the default in RHEL7 because of unknown fallout, but if the non-default works better for RHV it would be fine for you to change it.

Comment 12 Nir Soffer 2020-06-15 14:45:58 UTC

(In reply to David Teigland from comment #11)
> There's another behavior that is linked with that config setting: when 
> disabled, lvm will not use udev to check if a device is an mpath or
> md compononent. It's important that lvm not scan mpath or md components,
> and disabling  this setting makes it somewhat more likely 
> (in unknown cases) that lvm  may not detect a dev is an mpath/md 
> component. The mpath/md detection may not be relevant to RHV.

We use only multipath devices in lvm commands, for example:

2020-06-11 12:12:08,703-0400 DEBUG (jsonrpc/0) [common.commands] /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /sbin/lvm vgs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600a098038304437415d4b6a59684968$|^/dev/mapper/3600a098038304437415d4b6a59684969$|^/dev/mapper/3600a098038304437415d4b6a5968496a$|^/dev/mapper/3600a098038304437415d4b6a5968496b$|^/dev/mapper/3600a098038304437415d4b6a5968496c$|^/dev/mapper/3600a098038304437415d4b6a5968496d$|^/dev/mapper/3600a098038304437415d4b6a5968496e$|^/dev/mapper/3600a098038304437415d4b6a5968496f$|^/dev/mapper/3600a098038304437415d4b6a59684970$|^/dev/mapper/3600a098038304437415d4b6a59684971$|^/dev/mapper/3600a098038304437415d4b6a59684972$|^/dev/mapper/3600a098038304437415d4b6a59684979$|", "r|.*|"]  hints="none" } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name --select 'vg_name = 2e7d7e21-917b-4202-b4bc-ba5e6fffa558' (cwd None) (commands:153)

All commands issued by vdsm use this filter, typically using only the
multipath devices relevant for the command.

Looks like the issue about not detecting multipath componenet
is not relevant to RHV.

David did I got this right?

Comment 13 David Teigland 2020-06-15 14:59:49 UTC

> All commands issued by vdsm use this filter, typically using only the
> multipath devices relevant for the command.
> 
> Looks like the issue about not detecting multipath componenet
> is not relevant to RHV.
> 
> David did I got this right?

Right, since you are not exposing the mpath component devs to lvm, we don't need the code that tries to distinguish the component from the mpath dev.

Comment 14 David Teigland 2020-07-15 17:04:11 UTC

This is another udev bug.  It seems to me there are three options:

1. If we thought that udev could fix this we could reassign this to udev, but I doubt that would happen.

2. We could leave this as it is and when people hit this (or other udev issues) have them change lvm.conf obtain_device_info_from_udev=0.  This is the option we've chosen thus far, and what we get by just closing this bug.

3. We could change the lvm default obtain_device_list_from_udev to 0 and avoid these udev issues by default.

My suggestion is 3, given how often I'm telling people to change this, and given that I don't see any value in using udev for obtaining the device list.

Comment 16 David Teigland 2020-11-19 21:02:33 UTC

I'm not sure if it makes more sense to reassign this bz to udev for a fix, or change the lvm default to get the list of devs from /dev instead of from a udev query.  Any suggestions?

Comment 17 Peter Rajnoha 2020-11-20 14:10:49 UTC

Have we ever had this problem reported for real environment (at customers) or is it just this stress test where we hit this failure? Honestly, I haven't seen error from this part of the code reported before.

Anyway, I'd really like to know why this fails because when I look at libudev code, it just walks through the sysfs content and does filtering on it, but I'm not quite clear where it could fail due to the high number of devices or operations happening at once. There are log_debugs there in that code so we could try to enable these logs, though I'm not sure where the knob is for libudev to enable these (the /etc/udev/udev.conf seems to apply for udevd only?)

There's one more detail - looking into the man page, it also states that:

  "On success, udev_enumerate_scan_devices(), udev_enumerate_scan_subsystems() and udev_enumerate_add_syspath() return an integer greater than, or equal to, 0"

...while LVM considers only 0 as success:

   if (udev_enumerate_scan_devices(udev_enum)) {                                                                                                                                                                
                log_error("Failed to udev_enumerate_scan_devices.");
                goto out;
   }

I don't think it's caused by this, but we can try for sure...

Comment 19 Igor Raits 2021-07-12 12:39:03 UTC

FTR we've hit this issue on CentOS Stream 8. As bug mentions, it happens only under heavy load = not big deal but it probably should be fixed one day :)

Comment 22 RHEL Program Management 2021-09-12 07:26:58 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.