Bug 1260194
| Summary: | Reproducible LVM errors on a few systems when .cache file contains a device that is no longer on the system | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Alex Wang <alex.wang> | |
| Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | |
| lvm2 sub component: | Devices, Filtering and Stacking (RHEL6) | QA Contact: | cluster-qe <cluster-qe> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | high | CC: | agk, cmarthal, heinzm, jbrassow, jsvarova, msnitzer, prajnoha, prockai, rbednar, tlavigne, zkabelac | |
| Version: | 6.6 | Keywords: | ZStream | |
| Target Milestone: | pre-dev-freeze | |||
| Target Release: | 6.8 | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | lvm2-2.02.140-1.el6 | Doc Type: | Bug Fix | |
| Doc Text: | 
       When the /etc/lvm/cache/.cache file contained an entry that did not exist, the code processed an uninitialized structure which led to unreliable behavior. As a consequence, an error message referencing an undefined (major, minor) pair was returned. With this update, non-existent devices are handled correctly while processing the .cache file, and the aforementioned error message is no longer generated. 
 | 
        
        
        
        Story Points: | --- | |
| Clone Of: | ||||
| : | 1261070 1261071 (view as bug list) | Environment: | ||
| Last Closed: | 2016-05-11 01:18:12 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1261070, 1261071, 1268411 | |||
https://www.redhat.com/archives/lvm-devel/2015-September/msg00039.html https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=55c13f3de4a8b709b950297369182cd81a6c738f When the .cache file contains an entry that no longer exists, the code reads an uninitialised structure which leads to unreliable behaviour and the message with undefined major and minor numbers like "(0,1)". Fixed in 2.02.130 upstream. This is seen on all architectures and can happen at other times, not just at boot. To reproduce, add an entry for a device that does not exist on the system to the .cache file then run any normal LVM command. If it doesn't fail (it often works OK), then try adding different entries in different places in the file, till you hit a case that is reproducible on the specific system you are running on. This is is needed for both 6.6.EUS (to complete bug #1248030) and 6.7.z (to complete bug #1248032). One needs to be quite lucky to get this reproduced exactly this way so that the message is printed. If you don't hit the error after trying several places in the .cache file for the nonexistent device, you can also alternatively use valgrind maybe. Valgrind displays the error all the time: $ valgrind pvs ... ==2450== Conditional jump or move depends on uninitialised value(s) ==2450== at 0x185AFA: _insert (dev-cache.c:617) ==2450== by 0x186AC1: dev_cache_get (dev-cache.c:955) ==2450== by 0x18FA6A: _read_array (filter-persistent.c:85) ==2450== by 0x18FD16: persistent_filter_load (filter-persistent.c:122) ==2450== by 0x17B550: _init_filters (toolcontext.c:1220) ==2450== by 0x17CE66: create_toolcontext (toolcontext.c:1769) ==2450== by 0x14CBD3: init_lvm (lvmcmdline.c:1770) ==2450== by 0x14D556: lvm2_main (lvmcmdline.c:1938) ==2450== by 0x16BB37: main (lvm.c:21) This is exactly the place - the dev_cache_get/_insert function where there's uninitialised value detected. Also install lvm2-debuginfo for the function names to be printed. Verified. Reproduced as suggested in Comment #8 Before fix: lvm2-2.02.118-3.el6_7.2.x86_64 # grep missing /etc/lvm/cache/.cache "/dev/vg/missing", # valgrind pvs ... ==1739== Conditional jump or move depends on uninitialised value(s) ==1739== at 0x171AE2: ??? (in /sbin/lvm) ==1739== by 0x17296F: dev_cache_get (in /sbin/lvm) ==1739== by 0x179BDB: persistent_filter_load (in /sbin/lvm) ==1739== by 0x168C2B: ??? (in /sbin/lvm) ==1739== by 0x16B456: create_toolcontext (in /sbin/lvm) ==1739== by 0x13E5F0: init_lvm (in /sbin/lvm) ==1739== by 0x143E5C: lvm2_main (in /sbin/lvm) ==1739== by 0x56DFD5C: (below main) (in /lib64/libc-2.12.so) ... ==1739== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 10 from 6) ========================================================================== After fix: # grep missing /etc/lvm/cache/.cache "/dev/vg/missing", # valgrind pvs ... ==24830== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 10 from 6) Tested on: 2.6.32-615.el6.x86_64 lvm2-2.02.141-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 lvm2-libs-2.02.141-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 lvm2-cluster-2.02.141-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 udev-147-2.71.el6 BUILT: Wed Feb 10 14:07:17 CET 2016 device-mapper-1.02.115-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 device-mapper-libs-1.02.115-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 device-mapper-event-1.02.115-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 device-mapper-event-libs-1.02.115-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 device-mapper-persistent-data-0.6.2-0.1.rc1.el6 BUILT: Wed Feb 10 16:52:15 CET 2016 cmirror-2.02.141-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0964.html  | 
Created attachment 1070364 [details] Verbose console output of vgchange from rc.sysinit Description of problem: After installation of lvm2 package lvm2-libs-2.02.118-3.el6_7.2.s390x the following errors shows up during boot on console: Setting up Logical Volume Management: /dev/VolGroup00/TmpVol00: stat failed: N o such file or directory"" Path /dev/VolGroup00/TmpVol00 no longer valid for device(0,1)"" [Continues..] In customer's case, package was installed during system update using YUM. Customer downgraded the following packages and the error messages disappears lvm2-2.02.118-3.el6_7.2.s390x device-mapper-1.02.95-3.el6_7.2.s390x lvm2-libs-2.02.118-3.el6_7.2.s390x device-mapper-persistent-data-0.3.2-1.el6.s390x device-mapper-libs-1.02.95-3.el6_7.2.s390x device-mapper-event-1.02.95-3.el6_7.2.s390x kpartx-0.4.9-87.el6.s390x device-mapper-multipath-0.4.9-87.el6.s390x device-mapper-event-libs-1.02.95-3.el6_7.2.s390x device-mapper-multipath-libs-0.4.9-87.el6.s390x Version-Release number of selected component (if applicable): lvm2-libs-2.02.118-3.el6_7.2.s390x How reproducible: Everytime Steps to Reproduce: 1. Run yum update to updated lvm2 package 2. Reboot server 3. Actual results: Large amount of stat failed error Expected results: No errors should be displayed