Bug 2095525

Summary: detect, remove, and don't use incorrect device names
Product: Red Hat Enterprise Linux 8 Reporter: David Teigland <teigland>
Component: lvm2Assignee: David Teigland <teigland>
lvm2 sub component: Devices, Filtering and Stacking QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: agk, cmarthal, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, zkabelac
Version: 8.0Keywords: Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.03.14-4.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-08 10:55:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Teigland 2022-06-09 20:50:18 UTC
Description of problem:

Paths to all devices on the system is created at the start of a command (saved in "dev-cache"), but while the command is running, some of those device paths may go away.  lvm was not detecting and dropping invalid paths in many cases, which could lead to segfaults or other undefined behavior.  Also, many places in the lvm code assumed that "dev_name()" always returned an actual device path name, and would attempt to use that path.  In fact this function has always been meant for printing messages, and may return "unknown" instead of an actual device path.

An upstream user reported repeatable crashes due to these issues, and reports that the problems are resolved with these fixes:

https://listman.redhat.com/archives/linux-lvm/2022-February/026122.html
https://listman.redhat.com/archives/linux-lvm/2022-April/026151.html

lvm commits in main branch:
cc73d99886df devices: only close PVs on LVs when scan_lvs is enabled
7b1a857d5ac4 devices: use dev-cache aliases handling from label scan functions
4eb04c8c05e5 devices: fix dev_name assumptions
00c3069872ab devices: initial use of existing option
7e70041e324e devices: drop incorrect paths from aliases list
1126be8f8dbc devices: simplify dev_cache_get_by_devt

(cc73d99886df is rhel9-only)

I have not been able to reproduce any of these issues, but based on the code that crashed, and the known problems with that code, we can say that it would be triggered by path names on the system being removed (e.g. devices being removed) while lvm commands were running.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Corey Marthaler 2022-07-16 21:16:51 UTC
Marking Verified:Tested (SanityOnly) with the latest rpms. All devicesfile regression scenarios passed.

kernel-4.18.0-398.g366e.el8.kpq1    BUILT: Tue Jun  7 04:56:38 CDT 2022
lvm2-2.03.14-4.el8    BUILT: Wed Jun 15 17:14:34 CDT 2022
lvm2-libs-2.03.14-4.el8    BUILT: Wed Jun 15 17:14:34 CDT 2022

Comment 7 Corey Marthaler 2022-07-28 15:58:08 UTC
All devicesfile regression scenarios passed on latest kernel/lvm2 as well.

kernel-4.18.0-411.el8    BUILT: Wed Jul 20 18:42:42 CDT 2022
lvm2-2.03.14-5.el8    BUILT: Thu Jul 14 09:23:13 CDT 2022
lvm2-libs-2.03.14-5.el8    BUILT: Thu Jul 14 09:23:13 CDT 2022

Comment 9 errata-xmlrpc 2022-11-08 10:55:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7792