Bug 1260194 - Reproducible LVM errors on a few systems when .cache file contains a device that is no longer on the system
Reproducible LVM errors on a few systems when .cache file contains a device t...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.6
All Linux
high Severity high
: pre-dev-freeze
: 6.8
Assigned To: Peter Rajnoha
cluster-qe@redhat.com
: ZStream
Depends On:
Blocks: 1261070 1261071 1268411
  Show dependency treegraph
 
Reported: 2015-09-04 13:18 EDT by Alex Wang
Modified: 2016-05-10 21:18 EDT (History)
11 users (show)

See Also:
Fixed In Version: lvm2-2.02.140-1.el6
Doc Type: Bug Fix
Doc Text:
When the /etc/lvm/cache/.cache file contained an entry that did not exist, the code processed an uninitialized structure which led to unreliable behavior. As a consequence, an error message referencing an undefined (major, minor) pair was returned. With this update, non-existent devices are handled correctly while processing the .cache file, and the aforementioned error message is no longer generated.
Story Points: ---
Clone Of:
: 1261070 1261071 (view as bug list)
Environment:
Last Closed: 2016-05-10 21:18:12 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alex Wang 2015-09-04 13:18:02 EDT
Created attachment 1070364 [details]
Verbose console output of vgchange from rc.sysinit

Description of problem:

After installation of lvm2 package lvm2-libs-2.02.118-3.el6_7.2.s390x the following errors shows up during boot on console:

Setting up Logical Volume Management:   /dev/VolGroup00/TmpVol00: stat failed: N
o such file or directory""                                                      
  Path /dev/VolGroup00/TmpVol00 no longer valid for device(0,1)""               

[Continues..]

In customer's case, package was installed during system update using YUM.  Customer downgraded the following packages and the error messages disappears

lvm2-2.02.118-3.el6_7.2.s390x
device-mapper-1.02.95-3.el6_7.2.s390x
lvm2-libs-2.02.118-3.el6_7.2.s390x
device-mapper-persistent-data-0.3.2-1.el6.s390x
device-mapper-libs-1.02.95-3.el6_7.2.s390x
device-mapper-event-1.02.95-3.el6_7.2.s390x
kpartx-0.4.9-87.el6.s390x
device-mapper-multipath-0.4.9-87.el6.s390x
device-mapper-event-libs-1.02.95-3.el6_7.2.s390x
device-mapper-multipath-libs-0.4.9-87.el6.s390x

Version-Release number of selected component (if applicable):

lvm2-libs-2.02.118-3.el6_7.2.s390x

How reproducible:

Everytime

Steps to Reproduce:
1. Run yum update to updated lvm2 package
2. Reboot server
3. 

Actual results:

Large amount of stat failed error

Expected results:

No errors should be displayed
Comment 3 Alasdair Kergon 2015-09-04 19:58:07 EDT
When the .cache file contains an entry that no longer exists, the code reads an uninitialised structure which leads to unreliable behaviour and the message with undefined major and minor numbers like "(0,1)".

Fixed in 2.02.130 upstream.
Comment 4 Alasdair Kergon 2015-09-04 20:01:52 EDT
This is seen on all architectures and can happen at other times, not just at boot.

To reproduce, add an entry for a device that does not exist on the system to the .cache file then run any normal LVM command.  If it doesn't fail (it often works OK), then try adding different entries in different places in the file, till you hit a case that is reproducible on the specific system you are running on.
Comment 5 Peter Rajnoha 2015-09-07 04:04:08 EDT
This is is needed for both 6.6.EUS (to complete bug #1248030) and 6.7.z (to complete bug #1248032).
Comment 8 Peter Rajnoha 2015-09-09 09:25:54 EDT
One needs to be quite lucky to get this reproduced exactly this way so that the message is printed. If you don't hit the error after trying several places in the .cache file for the nonexistent device, you can also alternatively use valgrind maybe.

Valgrind displays the error all the time:

$ valgrind pvs
...
==2450== Conditional jump or move depends on uninitialised value(s)
==2450==    at 0x185AFA: _insert (dev-cache.c:617)
==2450==    by 0x186AC1: dev_cache_get (dev-cache.c:955)
==2450==    by 0x18FA6A: _read_array (filter-persistent.c:85)
==2450==    by 0x18FD16: persistent_filter_load (filter-persistent.c:122)
==2450==    by 0x17B550: _init_filters (toolcontext.c:1220)
==2450==    by 0x17CE66: create_toolcontext (toolcontext.c:1769)
==2450==    by 0x14CBD3: init_lvm (lvmcmdline.c:1770)
==2450==    by 0x14D556: lvm2_main (lvmcmdline.c:1938)
==2450==    by 0x16BB37: main (lvm.c:21)

This is exactly the place - the dev_cache_get/_insert function where there's uninitialised value detected. Also install lvm2-debuginfo for the function names to be printed.
Comment 11 Roman Bednář 2016-02-22 06:56:09 EST
Verified. Reproduced as suggested in Comment #8

Before fix:
lvm2-2.02.118-3.el6_7.2.x86_64

# grep missing /etc/lvm/cache/.cache
		"/dev/vg/missing",

# valgrind pvs
...
==1739== Conditional jump or move depends on uninitialised value(s)
==1739==    at 0x171AE2: ??? (in /sbin/lvm)
==1739==    by 0x17296F: dev_cache_get (in /sbin/lvm)
==1739==    by 0x179BDB: persistent_filter_load (in /sbin/lvm)
==1739==    by 0x168C2B: ??? (in /sbin/lvm)
==1739==    by 0x16B456: create_toolcontext (in /sbin/lvm)
==1739==    by 0x13E5F0: init_lvm (in /sbin/lvm)
==1739==    by 0x143E5C: lvm2_main (in /sbin/lvm)
==1739==    by 0x56DFD5C: (below main) (in /lib64/libc-2.12.so)
...
==1739== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 10 from 6)

==========================================================================

After fix:
# grep missing /etc/lvm/cache/.cache
		"/dev/vg/missing",

# valgrind pvs
...
==24830== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 10 from 6)


Tested on:
2.6.32-615.el6.x86_64

lvm2-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
lvm2-libs-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
lvm2-cluster-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
udev-147-2.71.el6    BUILT: Wed Feb 10 14:07:17 CET 2016
device-mapper-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-libs-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-event-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-event-libs-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-persistent-data-0.6.2-0.1.rc1.el6    BUILT: Wed Feb 10 16:52:15 CET 2016
cmirror-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
Comment 13 errata-xmlrpc 2016-05-10 21:18:12 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0964.html

Note You need to log in before you can comment on or make changes to this bug.