Bug 1240507
| Summary: | bad .cache file prevents RHEV host from functioning after reboot | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Marcus West <mwest> | ||||
| Component: | lvm2 | Assignee: | Peter Rajnoha <prajnoha> | ||||
| lvm2 sub component: | Other (RHEL6) | QA Contact: | cluster-qe <cluster-qe> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | agk, heinzm, jbrassow, jherrman, mjuricek, msnitzer, mwest, prajnoha, prockai, rbednar, tlavigne, zkabelac | ||||
| Version: | 6.6 | Keywords: | ZStream | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | lvm2-2.02.140-1.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Prior to this update, using the lvm utility when the persistent cache file was outdated caused devices that were stored in the persistent cache to unintentionally bypass logical volume manager (LVM) filters set in the LVM configuration. As a consequence, Red Hat Enterprise Virtualization hosts in some cases failed to start with an outdated cache file. This update fixes LVM's internal cache handling so that the filters are applied properly, and the described problem no longer occurs.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1248030 1248032 (view as bug list) | Environment: | |||||
| Last Closed: | 2016-05-11 01:17:38 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1248030, 1248032, 1268411 | ||||||
| Attachments: |
|
||||||
Marcus, I'll need more info, specifically output from: (when the failure situation happens) lsblk ls -l -R /dev dmsetup info -c cat /etc/lvm/cache/.cache pvs -vvvv Now, remove the /etc/lvm/cache/.cache and again try to ge the output from: pvs -vvvv Thanks. Might be same cause as bug #1232054. Created attachment 1051280 [details]
requested info from c#5
(In reply to Marcus West from comment #9) > Created attachment 1051280 [details] > requested info from c#5 The broken .cache seems to be incomplete - it contains c0d0p2, but it doesn't contain c0d0p1 nor c0d0p3 device. However, it does contain the symlinks for those other c0d0* devices: /dev/disk/by-id/lvm-pv-uuid-Bq0C91-eVjH-ui2V-2ofH-qdwZ-wCBN-YC6G5w which is a symlink for c0d0p3. Also /dev/disk/by-id/lvm-pv-uuid-myBYhX-vfM6-LYO8-AFdU-EXMe-uFcj-e2xMeE which is a symlink for c0d1. I suspect LVM scan was done in the middle of operation where /dev/ content for those devices was changing (possibly by udev). Is the reproducer from comment #0 really all that is used to reproduce this? How does the .cache look like just before issuing the panic? Marcus, how are you reproducing this exactly? Are you calling any extra LVM commands? Or you're just rebooting the machine. I mean, we need to find the LVM command which produces the incorrect .cache file. (In reply to Peter Rajnoha from comment #14) > Marcus, how are you reproducing this exactly? > Are you calling any extra LVM commands? Isn't there anything in parallel that is running more LVM commands - what about the RHEV environment? We know that it's using "--config" to change configuration often - is it exactly this case. If yes, it shouldn't use persistent cache (devices/write_cache_state=0). Having fresh look at this right now... One more question for now - when customer reproduces the problem (after deleting any old /etc/lvm/cache/.cache file), is the incorrect .cache file generated during usual system run, or does it appear after rebooting the machine (I'm asking to make sure whether this is not caused by any init or shutdown script which may see incomplete state of /dev when devices are initialized or shutting down). How often the invalid .cache file appears again after its removal? ...even if the device is removed just in the middle of scanning where we'd have a situation in which the symlink is scanned but its device node removed already, we should end up with correct .cache file. I tried to induce the error by using this scenario: $ ls -lR /dev/test /dev/test: total 0 drwxr-xr-x. 2 root root 80 Jul 27 11:37 alinks brw-r--r--. 1 root root 8, 0 Jul 27 12:05 sda brw-r--r--. 1 root root 8, 16 Jul 27 11:35 sdb /dev/test/alinks: total 0 lrwxrwxrwx. 1 root root 6 Jul 27 11:37 a -> ../sda lrwxrwxrwx. 1 root root 6 Jul 27 11:27 b -> ../sdb The "alinks" subdir is added first to the cache with symlinks pointing to sda. Then when the target node sda is removed just in the middle of this /dev/test traversal, lvm will find /dev/test/sda missing. Then, during scanning, each device name is confirmed - and this is the place where sda and all its symlinks get removed from the cache and we end up with correct .cache (not including removed sda and all its symlinks). So whatever is causing the error reported here, it doesn't seem to be caused by the LVM scan done just in the middle of the /dev content being changed. We need to find WHEN exactly the incorrect .cache appears. And then we can try adding more loging for LVM commands which are executed during this exact period. But first, we need to identify the time in which the correct .cache changes into the incorrect one - we're not sure abot that yet. To sum up the error situation as reported here: for some devices, the .cache file contains symlinks, but not their targets (device nodes) - we don't know how this could happen yet. This leads to a failure where LVM does not filter out devices properly as it doesn't have complete picture of the /dev content. When devices are not properly filtered, they're claimed by LVM before tgtd has a chance to claim them, hence causing the error as reported here. Looking at the .cache file again, it contains devices allowed by the filter and also any DM device that exists in the system... Based on the logs already attached here, the proper layout is: - c0d0p3 and c0d1 used as backing devices for tgtd - those devices exported by tgtd are then connected locally as sda and sdb - sda is then a backing device for multipath device 1000500007 - sdb is then a backing device for multipath device 1000500008 ...so the chain is: block dev --> tgtd --> iscsi attached device --> multipath --> LVM PVs When we call pvs with the state above, the /dev/disk/by-id/lvm-pv-uuid-* symlinks will point to those multipath devices and it's recorded that way. When we're booting, we see only the "block dev", but since it contains the PV label and the /dev/disk/by-id/lvm-pv-uuid-* symlink is created based on the pure "block dev" appeareance *before* any other initialization happens in the chain and those symlinks now point to the base "block dev" instead of the multipath device as it was before... Now, there's the vgchange -aay call which activates volumes at boot and it sees the .cache file which contains the lvm-pv-uuid-* symlinks and all not yet existing dm devices. OK, so finally I can probably see a path leading to this problem now. I'll try a few tests to prove this is exactly the case... Great, Marcus, I think I don't need any of the logs from you now - finally I have managed to reproduce locally with simple set of commands! So it's caused by the combination of multipath (or any other DM mapping) over a block dev containing a PV signature. But the fact that it's multipath is not quite important here - any DM mapping over a block device that contains a PV signature would end up with the same result. My reproducer: - /dev/sda contains a PV header with UUID y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M - lvm.conf: filter = [ "r|.*|" ] - rm -f .cache (to start with clean state) - dmsetup create test --table "0 8388608 linear /dev/sda 0" (8388608 is just the size of the /dev/sda device I use in the reproducer - simply we have a DM mapping over the base device /dev/sda, this is to simulate the multipath) - pvs (this will create .cache file which contains "/dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M" as well as "/dev/mapper/test" and the target node "/dev/dm-2" - all the dm mappings get into the .cache file even though the filter is set to "ignore all"! This might be the actual source of the problem, but I need to check the code further here...) - dmsetup remove test (this is to simulate the state at boot) - pvs and we get: $ pvs PV VG Fmt Attr PSize PFree /dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M vg lvm2 a-- 4.00g 0 Even though we have filter = [ "r|.*|" ] in the lvm.conf file! (So the .cache file is not incomplete in the end, I didn't realize the lvm-pv-uuid-* was pointing to the DM mapping which is the state before reboot, while the actual state is that the lvm-pv-uuid-* points to the underlying device after reboot now - so that was misleading...) Reproducible with RHEL 6.6.z lvm2-2.02.111-2.el6_6.3, not reproducible with 6.7 lvm2-2.02.118-3.el6. So it's already fixed it seems, possibly as part of some other fix. Looking for the exact change/patch now... lvm2 upstream commit 6c0e44d5a2e82aa160d48e83992e7ca342bc4bdf caused this regression ("Optimize double call of stat() for cached devices.").
...and "fixed" by upstream commit f1a000a477558e157532d5f2cd2f9c9139d4f87c ("toollib: Rewrite process_each_pv"), which I don't think actually fixes it, just hides the problem.
The problematic part as well as exact patch is identified. I can do 6.6.z if needed, but will need an ack... Flagging for 6.6.z and 6.7.z (this should also go together with bug #1232054 which is also aimed for 6.6.z). Verified. Reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1248030#c4 ========================================================== # rm -f /etc/lvm/cache/.cache # lvm dumpconfig /devices/filter Configuration node /devices/filter not found # pvcreate /dev/sda Physical volume "/dev/sda" successfully created # pvs -o name,uuid /dev/sda PV PV UUID /dev/sda jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd # vi /etc/lvm/lvm.conf # lvm dumpconfig devices/filter filter="r|.*|" # blockdev --getsize /dev/sda 83886080 # dmsetup create test --table "0 83886080 linear /dev/sda 0" # dmsetup info -c -o name,blkdevname test Name BlkDevName test dm-2 # ls -l /dev/disk/by-id/lvm-pv-uuid-jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd lrwxrwxrwx. 1 root root 10 Feb 22 15:23 /dev/disk/by-id/lvm-pv-uuid-jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd -> ../../dm-2 # pvs # dmsetup remove test # echo add > /sys/block/sda/uevent # ls -l /dev/disk/by-id/lvm-pv-uuid-jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd lrwxrwxrwx. 1 root root 9 Feb 22 15:24 /dev/disk/by-id/lvm-pv-uuid-jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd -> ../../sda # pvs # echo $? 0 ========================================================== Tested on: 2.6.32-615.el6.x86_64 lvm2-2.02.141-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 lvm2-libs-2.02.141-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 lvm2-cluster-2.02.141-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 udev-147-2.71.el6 BUILT: Wed Feb 10 14:07:17 CET 2016 device-mapper-1.02.115-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 device-mapper-libs-1.02.115-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 device-mapper-event-1.02.115-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 device-mapper-event-libs-1.02.115-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 device-mapper-persistent-data-0.6.2-0.1.rc1.el6 BUILT: Wed Feb 10 16:52:15 CET 2016 cmirror-2.02.141-2.el6 BUILT: Wed Feb 10 14:49:03 CET 2016 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0964.html |
## Description of problem: When a rhev host has a bad .cache file (/etc/lvm/cache), tgtd cannot present disks as expected, which breaks the rhev environment. For this setup, customer is using iscsi storage, which is on the actual host itself (self hosted iscsi on 127.0.0.1). It makes creating an lvm filter difficult, because the host does need to see the lvm data on the device, but not until it is presented correctly (as an iscsi device) ## Version-Release number of selected component (if applicable): RHEL 6.6 kernel-2.6.32-504.16.2.el6.x86_64 kpartx-0.4.9-80.el6_6.3.x86_64 lvm2-2.02.111-2.el6_6.3.x86_64 ## How reproducible: Always ## Steps to Reproduce: 1. panic system (echo c > /proc/sysrq-trigger) 2. check if tgtd is presenting disks two configured disks: ('tgtadm --lld iscsi --mode target --op show') To fix, either remove the .cache file and reboot, or copy in one that 'works' I still don't understand fully the difference between the 'working' and 'bad' .cache file, and how the bad one is created in the first place ## Actual results: tgtadm doesn't display any disks $ tgtadm --lld iscsi --mode target --op show ... LUN: 0 Type: controller (no LUN: 1 or LUN: 2) pvs shows devices, even though they are filtered out: $ pvs PV VG Fmt Attr PSize PFree /dev/cciss/c0d0p2 myvg lvm2 a-- 62.50g 0 /dev/disk/by-id/lvm-pv-uuid-Bq0C91-eVjH-ui2V-2ofH-qdwZ-wCBN-YC6G5w e0ffa0fc-29ca-4d44-a9fb-bbe38724a5e5 lvm2 a-- 73.38g 67.25g /dev/disk/by-id/lvm-pv-uuid-myBYhX-vfM6-LYO8-AFdU-EXMe-uFcj-e2xMeE 041f9edd-e28a-4228-b221-ef81e1ba2c6d lvm2 a-- 136.38g 132.25g $ grep filter /etc/lvm/lvm.conf filter = [ "a|/dev/cciss/c0d0p2|", "a|/dev/mapper/36.*|", "r|.*|" ] ## Expected results: tgtadm shows expected disks: (edited) # tgtadm --lld iscsi --mode target --op show ... LUN: 0 Type: controller LUN: 1 Backing store path: /dev/cciss/c0d0p3 LUN: 2 Backing store path: /dev/cciss/c0d1 pvs output looks as expected: $ pvs PV VG Fmt Attr PSize PFree /dev/cciss/c0d0p2 myvg lvm2 a-- 62.50g 0 ## Additional info: I have a reproducer configured