1544409 – left over devfs entries after force lvremove

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1544409 - left over devfs entries after force lvremove

Summary: left over devfs entries after force lvremove

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Sekletar
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-12 12:05 UTC by Roman Bednář
Modified:	2023-09-14 04:16 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-06 15:48:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
test.log (57.37 KB, text/plain) 2018-02-21 08:08 UTC, Roman Bednář	no flags	Details
View All

Description Roman Bednář 2018-02-12 12:05:07 UTC

Force removing all LVs in a vg leaves a cache pool devfs entry behind. VG contained just a single cache pool and one origin on top of raid1 when force remove was attempted as shown in the reproducer. So far I was not able to reproduce manually, possible race.


Reproducer:

# lvcreate --activate ey --type raid1 -m 1 -L 100M -n corigin cache_sanity /dev/sda1 /dev/sdb1

# lvcreate --activate ey --type raid1 -m 1 -L 2G -n force_remove cache_sanity /dev/sdc1 /dev/sdd1

# lvcreate --activate ey --type raid1 -m 1 -L 2G -n force_remove_meta cache_sanity /dev/sdc1 /dev/sdd1

# lvconvert --yes --type cache-pool --cachepolicy mq --cachemode writeback -c 32 --poolmetadata cache_sanity/force_remove_meta cache_sanity/force_remove

# lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/force_remove cache_sanity/corigin

# lvremove -ff cache_sanity


3.10.0-843.el7.x86_64

lvm2-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-libs-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-cluster-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
package lvm2-lockd is not installed
date: invalid date ‘1970-01-01 UTC package lvm2-lockd is not installed seconds’
package lvm2-python-boom is not installed
date: invalid date ‘1970-01-01 UTC package lvm2-python-boom is not installed seconds’
cmirror-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-libs-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-event-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-event-libs-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 12:07:18 CET 2017



Test run:

SCENARIO - [force_vg_remove_cache_on_raid]
Create a cache volume, then force remove it while active ensuring no invalid error messages

*** Cache info for this scenario ***
*  origin (slow):  /dev/sdb1 /dev/sdd1
*  pool (fast):    /dev/sdf1 /dev/sda1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --activate ey --type raid1 -m 1 -L 4G -n corigin cache_sanity @slow
Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 42.62% )
   0/1 mirror(s) are fully synced: ( 92.90% )
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Create cache data and cache metadata (fast) volumes
lvcreate --activate ey --type raid1 -m 1 -L 2G -n force_remove cache_sanity @fast
lvcreate --activate ey --type raid1 -m 1 -L 12M -n force_remove_meta cache_sanity @fast
Waiting until all mirror|raid volumes become fully syncd...
   2/2 mirror(s) are fully synced: ( 100.00% 100.00% )
Sleeping 15 sec
Sleeping 15 sec

Create cache pool volume by combining the cache data and cache metadata (fast) volumes with policy: mq  mode: writeback
lvconvert --yes --type cache-pool --cachepolicy mq --cachemode writeback -c 32 --poolmetadata cache_sanity/force_remove_meta cache_sanity/force_remove
  WARNING: Converting cache_sanity/force_remove and cache_sanity/force_remove_meta to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type cache --cachemetadataformat 2 --cachepool cache_sanity/force_remove cache_sanity/corigin
dmsetup status | grep cache_sanity-corigin | grep writeback | grep -w mq

Force remove all LVs associated with cache_sanity
lvremove -ff cache_sanity
There shouldn't be any left over devfs entries for this lv on cache_sanity/n at /usr/tests/sts-rhel7.5/lvm2/lib/cache_sanity/Cache_sanity.pm line 4868.

Comment 2 Roman Bednář 2018-02-21 08:08:33 UTC

Created attachment 1398564 [details]
test.log

I hit this again with different scenario today:

pvcreate --dataalignment 136192k vg /dev/sd{a..j}
vgcreate --physicalextentsize 34048k vg /dev/sd{a..j}
lvcreate --activate ey --type raid1 -m 1 -L 4G -n corigin vg /dev/sda /dev/sdb
lvcreate --activate ey --type raid1 -m 1 -L 2G -n 34048 vg /dev/sdc /dev/sdd
lvcreate --activate ey --type raid1 -m 1 -L 12M -n 34048_meta vg /dev/sde /dev/sdf
lvconvert --yes --type cache-pool --cachepolicy mq --cachemode writeback -c 64 --poolmetadata vg/34048_meta vg/34048
lvconvert --yes --type cache --cachemetadataformat 2 --cachepool vg/34048 vg/corigin
lvchange --syncaction repair vg/34048_cdata
lvchange --syncaction repair vg/34048_cmeta
lvconvert --splitcache vg/corigin
lvchange --syncaction check vg/corigin
lvremove -f /dev/vg/corigin
vgremove --yes vg
pvremove --yes /dev/sd{a..j}
pvcreate /dev/sd{a..j}
vgcreate vg /dev/sd{a..j}  <<<<  "already exists in filesystem" error should appear here


Leftover device on the node that this sequence ran from:

# ls -la /dev/cache_sanity/34048 
lrwxrwxrwx. 1 root root 8 Feb 20 10:51 /dev/cache_sanity/34048 -> ../dm-11


Attaching full log of the run as well. Adding testblocker flag since this is preventing us from getting a reliable pass on cache regression suite.

Comment 4 Roman Bednář 2018-02-21 08:12:51 UTC

So far I was not able to reproduce scenario in Comment 2 manually.

Comment 5 Zdenek Kabelac 2018-02-23 22:37:39 UTC

I'd expect this to be a bug of 'older'  systemd-udevd leaking symlink.


To confirm this - you can enable "verify_udev_operations" in lvm.conf 
in this case lvm2 should 'spot' missing symlink removal.

Also 'primary' way is to check present in dm table
(dmsetup table)

If device is NOT there, while symlink is in /dev dir - it's  udev bug.

I also believe it's the case of older version of udev in RHEL, since recent upsteam seems to be working fine here.

Comment 6 Roman Bednář 2018-02-26 12:00:49 UTC

dm table did not contain any entry related to removed lv. Running ~30 iterations of the same scenario with verify_udev_operations enabled did not reproduce the bug. Reassigning to udev.

Comment 7 Lukáš Nykrýn 2018-02-26 12:07:40 UTC

We don't have separate udev in rhel7. Also this does not look like an issue that should block rhel-7.5 RC, moving to 7.6

Comment 8 Corey Marthaler 2018-02-27 16:45:44 UTC

Isn't this another version of raid scrub bug 1549272? 

Check the log for "Failed to lock logical volume" messages during the scrubbing actions. I don't believe this requires a cluster to reproduce.

Comment 15 Michal Sekletar 2019-02-06 15:48:13 UTC

Quite frankly I have no idea how to move this forward. I can't reproduce locally and without reproducer it is close to impossible to say what exactly happened and why the symlink wasn't removed. At the very least I need a dump of udev database (before and after lvremove), part of the udev debug log from the time of removal of VG and corresponding udevadm monitor output. According to last few comments the bug was not observed with latest LVM builds so I am closing this as INSUFFICIENT_DATA.

In case someone is able to reproduce please reopen and attach relevant debug information.

Comment 16 Red Hat Bugzilla 2023-09-14 04:16:31 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.