Bug 1592960
| Summary: | disappearing partitioned devices causing an array of test failures "Device excluded by a filter" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> | ||||
| Component: | lvm2 | Assignee: | David Teigland <teigland> | ||||
| lvm2 sub component: | Devices, Filtering and Stacking | QA Contact: | cluster-qe <cluster-qe> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | urgent | ||||||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, teigland, zkabelac | ||||
| Version: | 7.6 | Keywords: | Regression, TestBlocker | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | lvm2-2.02.179-2.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-10-30 11:03:47 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Corey Marthaler
2018-06-19 16:44:20 UTC
Created attachment 1453005 [details]
verbose pvcreate attempt
I've reproduced this on my own test machine. It only seems to happen when using partitions. The device nodes for the partitions in /dev actually disappear, so it's not just an lvm issue. In one terminal I run: # while true; do ls /dev/sdb1 > /dev/null; ls /dev/sdd1 > /dev/null; ls /dev/sde1 > /dev/null; ls /dev/sdf1 > /dev/null; ls /dev/sdg1 > /dev/null; done In another terminal I run repeated: # pvcreate /dev/sd[bdefg]1; pvremove /dev/sd[bdefg]1 The first terminal will report a stream of: ls: cannot access /dev/sdd1: No such file or directory ls: cannot access /dev/sdd1: No such file or directory ls: cannot access /dev/sdd1: No such file or directory ls: cannot access /dev/sde1: No such file or directory ls: cannot access /dev/sde1: No such file or directory ls: cannot access /dev/sdg1: No such file or directory ls: cannot access /dev/sdg1: No such file or directory ls: cannot access /dev/sdd1: No such file or directory ls: cannot access /dev/sdf1: No such file or directory ls: cannot access /dev/sdg1: No such file or directory ls: cannot access /dev/sdd1: No such file or directory udev is the only thing I know of that mucks with device nodes, so the suspicion is that udev is doing something wrong or unexpected. Perhaps run the command under: strace -e trace=open,close and confirm it is not opening /dev/sdb (and others which it should only be reading) with O_RDWR. (I'm guessing here, but if a device is partitioned, udev assumes nothing would open the underlying device for writing unless it is changing the partition table - other things should write only directly to the partitions themselves. If such an open+close does happen, udev may assume the partition table was changed and recreate the partitioned devices - it's not clever enough to look at the 'delta' between old and new and work out what actually changed.) Probably not running the exact same version here, but on the branch 2018-06-01-stable and a simple 'pvs' command I'm seeing lots of lines like:
open("/dev/sda1", O_RDWR|O_DIRECT|O_NOATIME) = 9
which is a regression - it needs to be O_RDONLY like in older versions:
open("/dev/sda1", O_RDONLY|O_DIRECT|O_NOATIME) = 6
Easy test: Run 'udevadm monitor' alongside. If you use the 'old' lvm and run 'pvs -a' you see no udev activity. If you do the same with the 'new' lvm you see lots of udev activity due to opening devices O_RDRW when that wasn't needed. Because of this udev behaviour, it's important that the code doesn't open devices O_RDWR unless it thinks it does need to write to them. Until we can stamp out this udev brokenness, this workaround works for me: https://sourceware.org/git/?p=lvm2.git;a=commit;h=a30e6222799409ab6e6151683c95eb13f4abaefb (So unrelated to partitions in comment #7, rather just a general problem that it's losing track of which devices need opening readonly and which ones read/write, and that if you open a device read/write then that triggers udev operations after you close the device and these have either to be waited for (or suppressed) before you make further use of the same device.) There is definately a problem with device nodes for partitions going missing from /dev. This doesn't happen with non-partitioned devices. Quick check seems to show lvm2-2.02.179-2 fixes this issue. Our tests scenarios are once again working fine on top of partitioned devices. Marking verified in the latest rpms.
3.10.0-931.el7.x86_64
lvm2-2.02.180-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018
lvm2-libs-2.02.180-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018
lvm2-cluster-2.02.180-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018
device-mapper-1.02.149-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018
device-mapper-libs-1.02.149-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018
device-mapper-event-1.02.149-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018
device-mapper-event-libs-1.02.149-2.el7 BUILT: Wed Aug 1 11:22:48 CDT 2018
device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 05:07:18 CST 2017
9 disk(s) to be used:
host-093=/dev/sdg /dev/sdd /dev/sdh /dev/sda /dev/sdi /dev/sdc /dev/sdf /dev/sdb /dev/sde
on host-093...
dicing /dev/sdg into 2...
dicing /dev/sdd into 2...
dicing /dev/sdh into 2...
dicing /dev/sda into 2...
dicing /dev/sdi into 2...
dicing /dev/sdc into 2...
dicing /dev/sdf into 2...
dicing /dev/sdb into 2...
dicing /dev/sde into 2...
re-reading disks on host-093...
Zeroing out the new partitions.../dev/sdg1.../dev/sdg2.../dev/sdd1.../dev/sdd2.../dev/sdh1.../dev/sdh2.../dev/sda1.../dev/sda2.../dev/sdi1.../dev/sdi2.../dev/sdc1.../dev/sdc2.../dev/sdf1.../dev/sdf2.../dev/sdb1.../dev/sdb2.../dev/sde1.../dev/sde2...
creating lvm devices...
host-093: pvcreate /dev/sde2 /dev/sde1 /dev/sdb2 /dev/sdb1 /dev/sdf2 /dev/sdf1 /dev/sdc2 /dev/sdc1 /dev/sdi2 /dev/sdi1
host-093: vgcreate raid_sanity /dev/sde2 /dev/sde1 /dev/sdb2 /dev/sdb1 /dev/sdf2 /dev/sdf1 /dev/sdc2 /dev/sdc1 /dev/sdi2 /dev/sdi1
============================================================
Iteration 1 of 1 started at Mon Aug 13 17:14:55 CDT 2018
============================================================
SCENARIO (raid1) - [display_raid]
Create a raid and then display it a couple ways
host-093: lvcreate --nosync --type raid1 -m 1 -n display_raid -L 300M raid_sanity
[...]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3193 |