Bug 1240507

Summary:

bad .cache file prevents RHEV host from functioning after reboot

Product:

Red Hat Enterprise Linux 6

Reporter:

Marcus West <mwest>

Component:

lvm2

Assignee:

Peter Rajnoha <prajnoha>

lvm2 sub component:

Other (RHEL6)

QA Contact:

cluster-qe <cluster-qe>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

high

Priority:

high

CC:

agk, heinzm, jbrassow, jherrman, mjuricek, msnitzer, mwest, prajnoha, prockai, rbednar, tlavigne, zkabelac

Version:

6.6

Keywords:

ZStream

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

lvm2-2.02.140-1.el6

Doc Type:

Bug Fix

Doc Text:

Prior to this update, using the lvm utility when the persistent cache file was outdated caused devices that were stored in the persistent cache to unintentionally bypass logical volume manager (LVM) filters set in the LVM configuration. As a consequence, Red Hat Enterprise Virtualization hosts in some cases failed to start with an outdated cache file. This update fixes LVM's internal cache handling so that the filters are applied properly, and the described problem no longer occurs.

Story Points:

---

Clone Of:

Clones:

1248030 1248032 (view as bug list)

Environment:

Last Closed:

2016-05-11 01:17:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1248030, 1248032, 1268411

Attachments:

Description	Flags
requested info from c#5	none

Description Marcus West 2015-07-07 07:01:04 UTC

## Description of problem:

When a rhev host has a bad .cache file (/etc/lvm/cache), tgtd cannot present disks as expected, which breaks the rhev environment.

For this setup, customer is using iscsi storage, which is on the actual host itself (self hosted iscsi on 127.0.0.1).  It makes creating an lvm filter difficult, because the host does need to see the lvm data on the device, but not until it is presented correctly (as an iscsi device)

## Version-Release number of selected component (if applicable):

RHEL 6.6
kernel-2.6.32-504.16.2.el6.x86_64
kpartx-0.4.9-80.el6_6.3.x86_64
lvm2-2.02.111-2.el6_6.3.x86_64

## How reproducible:

Always

## Steps to Reproduce:

1. panic system (echo c > /proc/sysrq-trigger)
2. check if tgtd is presenting disks two configured disks:  ('tgtadm --lld iscsi --mode target --op show')

To fix, either remove the .cache file and reboot, or copy in one that 'works'

I still don't understand fully the difference between the 'working' and 'bad' .cache file, and how the bad one is created in the first place

## Actual results:

tgtadm doesn't display any disks

$ tgtadm --lld iscsi --mode target --op show
...
        LUN: 0
            Type: controller

(no LUN: 1 or LUN: 2)

pvs shows devices, even though they are filtered out:

$ pvs
  PV                                                                 VG                                   Fmt  Attr PSize   PFree  
  /dev/cciss/c0d0p2                                                  myvg                                 lvm2 a--   62.50g      0 
  /dev/disk/by-id/lvm-pv-uuid-Bq0C91-eVjH-ui2V-2ofH-qdwZ-wCBN-YC6G5w e0ffa0fc-29ca-4d44-a9fb-bbe38724a5e5 lvm2 a--   73.38g  67.25g
  /dev/disk/by-id/lvm-pv-uuid-myBYhX-vfM6-LYO8-AFdU-EXMe-uFcj-e2xMeE 041f9edd-e28a-4228-b221-ef81e1ba2c6d lvm2 a--  136.38g 132.25g

$ grep filter /etc/lvm/lvm.conf 
filter = [ "a|/dev/cciss/c0d0p2|", "a|/dev/mapper/36.*|", "r|.*|" ]

## Expected results:

tgtadm shows expected disks: (edited)

# tgtadm --lld iscsi --mode target --op show
...
        LUN: 0
            Type: controller
        LUN: 1
            Backing store path: /dev/cciss/c0d0p3
        LUN: 2
            Backing store path: /dev/cciss/c0d1

pvs output looks as expected:

$ pvs
  PV                                                                 VG                                   Fmt  Attr PSize   PFree  
  /dev/cciss/c0d0p2                                                  myvg                                 lvm2 a--   62.50g      0

## Additional info:

I have a reproducer configured

Comment 5 Peter Rajnoha 2015-07-10 12:02:30 UTC

Marcus, I'll need more info, specifically output from:
 (when the failure situation happens)
  lsblk
  ls -l -R /dev
  dmsetup info -c
  cat /etc/lvm/cache/.cache
  pvs -vvvv

Now, remove the /etc/lvm/cache/.cache and again try to ge the output from:
  pvs -vvvv

Thanks.

Comment 7 Alasdair Kergon 2015-07-10 14:10:44 UTC

Might be same cause as bug #1232054.

Comment 9 Marcus West 2015-07-13 05:28:41 UTC

Created attachment 1051280 [details]
requested info from c#5

Comment 13 Peter Rajnoha 2015-07-15 07:12:46 UTC

(In reply to Marcus West from comment #9)
> Created attachment 1051280 [details]
> requested info from c#5

The broken .cache seems to be incomplete - it contains c0d0p2, but it doesn't contain c0d0p1 nor c0d0p3 device. However, it does contain the symlinks for those other c0d0* devices:

/dev/disk/by-id/lvm-pv-uuid-Bq0C91-eVjH-ui2V-2ofH-qdwZ-wCBN-YC6G5w which is a symlink for c0d0p3.

Also /dev/disk/by-id/lvm-pv-uuid-myBYhX-vfM6-LYO8-AFdU-EXMe-uFcj-e2xMeE which is a symlink for c0d1.

I suspect LVM scan was done in the middle of operation where /dev/ content for those devices was changing (possibly by udev).

Is the reproducer from comment #0 really all that is used to reproduce this? How does the .cache look like just before issuing the panic?

Comment 14 Peter Rajnoha 2015-07-15 11:09:26 UTC

Marcus, how are you reproducing this exactly?
Are you calling any extra LVM commands? Or you're just rebooting the machine. I mean, we need to find the LVM command which produces the incorrect .cache file.

Comment 15 Peter Rajnoha 2015-07-15 12:39:29 UTC

(In reply to Peter Rajnoha from comment #14)
> Marcus, how are you reproducing this exactly?
> Are you calling any extra LVM commands?

Isn't there anything in parallel that is running more LVM commands - what about the RHEV environment? We know that it's using "--config" to change configuration often - is it exactly this case. If yes, it shouldn't use persistent cache (devices/write_cache_state=0).

Comment 21 Peter Rajnoha 2015-07-27 08:25:33 UTC

Having fresh look at this right now...

One more question for now - when customer reproduces the problem (after deleting any old /etc/lvm/cache/.cache file), is the incorrect .cache file generated during usual system run, or does it appear after rebooting the machine (I'm asking to make sure whether this is not caused by any init or shutdown script which may see incomplete state of /dev when devices are initialized or shutting down).

How often the invalid .cache file appears again after its removal?

Comment 22 Peter Rajnoha 2015-07-27 10:30:07 UTC

...even if the device is removed just in the middle of scanning where we'd have a situation in which the symlink is scanned but its device node removed already, we should end up with correct .cache file.

I tried to induce the error by using this scenario:

$ ls -lR /dev/test

/dev/test:
total 0
drwxr-xr-x. 2 root root    80 Jul 27 11:37 alinks
brw-r--r--. 1 root root 8,  0 Jul 27 12:05 sda
brw-r--r--. 1 root root 8, 16 Jul 27 11:35 sdb

/dev/test/alinks:
total 0
lrwxrwxrwx. 1 root root 6 Jul 27 11:37 a -> ../sda
lrwxrwxrwx. 1 root root 6 Jul 27 11:27 b -> ../sdb

The "alinks" subdir is added first to the cache with symlinks pointing to sda. Then when the target node sda is removed just in the middle of this /dev/test traversal, lvm will find /dev/test/sda missing. Then, during scanning, each device name is confirmed - and this is the place where sda and all its symlinks get removed from the cache and we end up with correct .cache (not including removed sda and all its symlinks).

So whatever is causing the error reported here, it doesn't seem to be caused by the LVM scan done just in the middle of the /dev content being changed.


We need to find WHEN exactly the incorrect .cache appears. And then we can try adding more loging for LVM commands which are executed during this exact period. But first, we need to identify the time in which the correct .cache changes into the incorrect one - we're not sure abot that yet.

Comment 23 Peter Rajnoha 2015-07-27 10:35:55 UTC

To sum up the error situation as reported here: for some devices, the .cache file contains symlinks, but not their targets (device nodes) - we don't know how this could happen yet. This leads to a failure where LVM does not filter out devices properly as it doesn't have complete picture of the /dev content.

When devices are not properly filtered, they're claimed by LVM before tgtd has a chance to claim them, hence causing the error as reported here.

Comment 28 Peter Rajnoha 2015-07-28 11:20:39 UTC

Looking at the .cache file again, it contains devices allowed by the filter and also any DM device that exists in the system...

Based on the logs already attached here, the proper layout is:

 - c0d0p3 and c0d1 used as backing devices for tgtd
 - those devices exported by tgtd are then connected locally as sda and sdb
 - sda is then a backing device for multipath device 1000500007
 - sdb is then a backing device for multipath device 1000500008

...so the chain is: block dev --> tgtd --> iscsi attached device --> multipath --> LVM PVs

When we call pvs with the state above, the /dev/disk/by-id/lvm-pv-uuid-* symlinks will point to those multipath devices and it's recorded that way.

When we're booting, we see only the "block dev", but since it contains the PV label and the /dev/disk/by-id/lvm-pv-uuid-* symlink is created based on the pure "block dev" appeareance *before* any other initialization happens in the chain and those symlinks now point to the base "block dev" instead of the multipath device as it was before...

Now, there's the vgchange -aay call which activates volumes at boot and it sees the .cache file which contains the lvm-pv-uuid-* symlinks and all not yet existing dm devices.

OK, so finally I can probably see a path leading to this problem now. I'll try a few tests to prove this is exactly the case...

Comment 29 Peter Rajnoha 2015-07-28 11:55:13 UTC

Great, Marcus, I think I don't need any of the logs from you now - finally I have managed to reproduce locally with simple set of commands! So it's caused by the combination of multipath (or any other DM mapping) over a block dev containing a PV signature.

But the fact that it's multipath is not quite important here - any DM mapping over a block device that contains a PV signature would end up with the same result.

My reproducer:

- /dev/sda contains a PV header with UUID y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M

- lvm.conf: filter = [ "r|.*|" ]

- rm -f .cache (to start with clean state)

- dmsetup create test --table "0 8388608 linear /dev/sda 0" (8388608 is just the size of the /dev/sda device I use in the reproducer - simply we have a DM mapping over the base device /dev/sda, this is to simulate the multipath)

- pvs (this will create .cache file which contains "/dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M" as well as "/dev/mapper/test" and the target node "/dev/dm-2" - all the dm mappings get into the .cache file even though the filter is set to "ignore all"! This might be the actual source of the problem, but I need to check the code further here...)

- dmsetup remove test (this is to simulate the state at boot)

- pvs and we get:
$ pvs
  PV                                                                 VG   Fmt  Attr PSize PFree
  /dev/disk/by-id/lvm-pv-uuid-y5PzRD-RBAv-7sBx-V3SP-vDmy-DeSq-GUh65M vg   lvm2 a--  4.00g    0 

Even though we have filter = [ "r|.*|" ] in the lvm.conf file!

Comment 30 Peter Rajnoha 2015-07-28 12:01:33 UTC

(So the .cache file is not incomplete in the end, I didn't realize the lvm-pv-uuid-* was pointing to the DM mapping which is the state before reboot, while the actual state is that the lvm-pv-uuid-* points to the underlying device after reboot now - so that was misleading...)

Comment 31 Peter Rajnoha 2015-07-28 12:10:36 UTC

Reproducible with RHEL 6.6.z lvm2-2.02.111-2.el6_6.3, not reproducible with 6.7 lvm2-2.02.118-3.el6. So it's already fixed it seems, possibly as part of some other fix. Looking for the exact change/patch now...

Comment 32 Peter Rajnoha 2015-07-28 12:40:41 UTC

lvm2 upstream commit 6c0e44d5a2e82aa160d48e83992e7ca342bc4bdf caused this regression ("Optimize double call of stat() for cached devices.").

Comment 33 Peter Rajnoha 2015-07-28 12:59:13 UTC

...and "fixed" by upstream commit f1a000a477558e157532d5f2cd2f9c9139d4f87c ("toollib: Rewrite process_each_pv"), which I don't think actually fixes it, just hides the problem.

Comment 34 Peter Rajnoha 2015-07-28 16:23:55 UTC

The problematic part as well as exact patch is identified. I can do 6.6.z if needed, but will need an ack...

Comment 35 Peter Rajnoha 2015-07-29 08:23:10 UTC

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=cf700151eba483aeedbf790fd66ce1c44e19c707

Comment 36 Peter Rajnoha 2015-07-29 08:51:18 UTC

Flagging for 6.6.z and 6.7.z (this should also go together with bug #1232054 which is also aimed for 6.6.z).

Comment 42 Roman Bednář 2016-02-22 14:29:54 UTC

Verified. Reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1248030#c4

==========================================================
# rm -f /etc/lvm/cache/.cache 

# lvm dumpconfig /devices/filter
  Configuration node /devices/filter not found

# pvcreate /dev/sda
  Physical volume "/dev/sda" successfully created

# pvs -o name,uuid /dev/sda
  PV         PV UUID                               
  /dev/sda   jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd

# vi /etc/lvm/lvm.conf

# lvm dumpconfig devices/filter
filter="r|.*|"

# blockdev --getsize /dev/sda
83886080

# dmsetup create test --table "0 83886080 linear /dev/sda 0"

# dmsetup info -c -o name,blkdevname test
Name             BlkDevName      
test             dm-2            

# ls -l /dev/disk/by-id/lvm-pv-uuid-jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd
lrwxrwxrwx. 1 root root 10 Feb 22 15:23 /dev/disk/by-id/lvm-pv-uuid-jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd -> ../../dm-2

# pvs

# dmsetup remove test

# echo add > /sys/block/sda/uevent

# ls -l /dev/disk/by-id/lvm-pv-uuid-jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd
lrwxrwxrwx. 1 root root 9 Feb 22 15:24 /dev/disk/by-id/lvm-pv-uuid-jSNhlo-EcRU-l4I8-xMQX-Xkes-mSKm-qwbWRd -> ../../sda

# pvs

# echo $?
0
==========================================================

Tested on:

2.6.32-615.el6.x86_64

lvm2-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
lvm2-libs-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
lvm2-cluster-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
udev-147-2.71.el6    BUILT: Wed Feb 10 14:07:17 CET 2016
device-mapper-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-libs-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-event-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-event-libs-1.02.115-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016
device-mapper-persistent-data-0.6.2-0.1.rc1.el6    BUILT: Wed Feb 10 16:52:15 CET 2016
cmirror-2.02.141-2.el6    BUILT: Wed Feb 10 14:49:03 CET 2016

Comment 44 errata-xmlrpc 2016-05-11 01:17:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0964.html