Bug 1290874 - --test cache|cachepool conversion appears to lead to corrupted locks on disk
Summary: --test cache|cachepool conversion appears to lead to corrupted locks on disk
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1313485 1213541 1295577 1364088
TreeView+ depends on / blocked
 
Reported: 2015-12-11 18:16 UTC by Corey Marthaler
Modified: 2016-11-04 04:13 UTC (History)
7 users (show)

Fixed In Version: lvm2-2.02.156-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-04 04:13:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
first lvconvert attempt (57.25 KB, text/plain)
2015-12-11 18:19 UTC, Corey Marthaler
no flags Details
second lvconvert attempt (49.83 KB, text/plain)
2015-12-11 18:19 UTC, Corey Marthaler
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1445 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2016-11-03 13:46:41 UTC

Description Corey Marthaler 2015-12-11 18:16:58 UTC
Description of problem:
If two test mode cache or cachepool converts are attempted, it appears to corrupt the locks on disk. I've tried these same operations w/o using the test mode and it works fine. Also, I see no errors when attempting one test mode conversion, the errors are first reported during the second attempt.

Easiest way to reproduce this:

lvcreate --activate ey -L 4G -n corigin cache_sanity @slow
lvcreate --activate ey -L 2G -n test_cache cache_sanity /dev/mapper/mpathh1
lvcreate --activate ey -L 8M -n test_cache_meta cache_sanity /dev/mapper/mpathh1

[root@mckinley-03 ~]# lvs -a -o +devices
  LV              VG               Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                 
  corigin         cache_sanity     -wi-a-----   4.00g                                                     /dev/mapper/mpatha1(0)  
  [lvmlock]       cache_sanity     -wi-ao---- 256.00m                                                     /dev/mapper/mpathh1(0)  
  test_cache      cache_sanity     -wi-a-----   2.00g                                                     /dev/mapper/mpathh1(64) 
  test_cache_meta cache_sanity     -wi-a-----   8.00m                                                     /dev/mapper/mpathh1(576)
  [lvmlock]       global           -wi-ao---- 256.00m                                                     /dev/mapper/mpathb1(0)  

[root@mckinley-03 ~]# lvconvert --yes --test --type cache-pool --poolmetadata cache_sanity/test_cache_meta cache_sanity/test_cache
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  WARNING: Converting logical volume cache_sanity/test_cache and cache_sanity/test_cache_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted cache_sanity/test_cache to cache pool.

[root@mckinley-03 ~]# lvs -a -o +devices
  LV              VG               Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                 
  corigin         cache_sanity     -wi-a-----   4.00g                                                     /dev/mapper/mpatha1(0)  
  [lvmlock]       cache_sanity     -wi-ao---- 256.00m                                                     /dev/mapper/mpathh1(0)  
  test_cache      cache_sanity     -wi-a-----   2.00g                                                     /dev/mapper/mpathh1(64) 
  test_cache_meta cache_sanity     -wi-a-----   8.00m                                                     /dev/mapper/mpathh1(576)
  [lvmlock]       global           -wi-ao---- 256.00m                                                     /dev/mapper/mpathb1(0)  

[root@mckinley-03 ~]# lvconvert --yes --test --type cache --cachepool cache_sanity/test_cache cache_sanity/corigin
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  WARNING: Converting logical volume cache_sanity/test_cache to pool's data volume.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  LV cache_sanity/test_cache lock failed: error -227


Version-Release number of selected component (if applicable):
3.10.0-327.el7.x86_64

lvm2-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-libs-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-cluster-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015


How reproducible:
Everytime

Comment 1 Corey Marthaler 2015-12-11 18:19:16 UTC
Created attachment 1104755 [details]
first lvconvert attempt

Comment 2 Corey Marthaler 2015-12-11 18:19:49 UTC
Created attachment 1104756 [details]
second lvconvert attempt

Comment 3 David Teigland 2015-12-11 21:02:53 UTC
It's not as terrible as it seemed at first.  The lvmlockd steps in lvconvert are ignoring the '--test' option, so they are doing the normal steps of freeing the locks for the two original LVs being converted.  The lvconvert then quits before creating the actual cache pool, which leaves the two LVs without locks.  Any subsequent command that tries to use the lock on the LV will get the error -227 which indicates that no lock was found where it was expected.  I just need to add awareness of the --test option to short-circuit the lvmlockd actions.

I'm not sure how to clean up from this state at the moment.  (I once had special lock options to override specific locking calls, which would allow us to easily clean up and correct unexpected failures.)

Comment 5 David Teigland 2016-01-18 22:56:31 UTC
Until I can find all the locations that need to be check the test mode, I've disabled test mode in shared VGs.

Comment 6 David Teigland 2016-06-10 19:22:29 UTC
disabled test mode in commit 48f270970fc526f9f0ac7d074639e8ed90346586

Comment 8 Corey Marthaler 2016-06-17 19:03:55 UTC
Verified that the --test flag no longer works with sanlock in the latest rpms.


3.10.0-418.el7.x86_64
lvm2-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
lvm2-libs-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
lvm2-cluster-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-libs-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-event-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-event-libs-1.02.126-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
device-mapper-persistent-data-0.6.2-0.1.rc8.el7    BUILT: Wed May  4 02:56:34 CDT 2016
cmirror-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016
sanlock-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
sanlock-lib-3.3.0-1.el7    BUILT: Wed Feb 24 09:52:30 CST 2016
lvm2-lockd-2.02.156-1.el7    BUILT: Mon Jun 13 03:05:51 CDT 2016




SCENARIO - [test_cache_create]
Test cache pool volume creation and combining cache data and cache metadata volumes

*** Cache info for this scenario ***
*  origin (slow):  /dev/mapper/mpathc1
*  pool (fast):    /dev/mapper/mpatha1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --activate ey -L 4G -n corigin cache_sanity @slow

lvcreate --activate ey -L 2G -n test_cache cache_sanity /dev/mapper/mpatha1
lvcreate --activate ey -L 8M -n test_cache_meta cache_sanity /dev/mapper/mpatha1

1A. Test that cache pool volume creation works by combining the cache data and cache metadata (fast) volumes
lvconvert --yes --test --type cache-pool --poolmetadata cache_sanity/test_cache_meta cache_sanity/test_cache
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  Test mode is not yet supported with lock type sanlock
couldn't create combined cache pool volume

1B. Test that cache pool volume creation doesn't work by combining a nonexistent cache data and cache metadata (fast) volumes
lvconvert --yes --test --type cache-pool --poolmetadata cache_sanity/test_cache_meta cache_sanity/FAKE
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  Failed to find logical volume "cache_sanity/FAKE"

Check that no cache pool volume was actually created when attempted with '--test'

2A. Test the origin volume can be cached with lvm auto creating a cash pool out of cache_sanity/test_cache
lvconvert --yes --test --type cache --cachepool cache_sanity/test_cache cache_sanity/corigin
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  Test mode is not yet supported with lock type sanlock
couldn't create cache volume and required cache pool

2B. Test the origin volume can't be cached since we are attempting with a non existing cache pool
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  Test mode is not yet supported with lock type sanlock

Now actually create the cache pool by combining the cache data and cache metadata (fast) volumes
lvconvert --yes --type cache-pool --poolmetadata cache_sanity/test_cache_meta cache_sanity/test_cache
  WARNING: Converting logical volume cache_sanity/test_cache and cache_sanity/test_cache_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)

3. Test the origin volume can be cached now since we have a proper cache pool (fast) volumes now
lvconvert --yes --test --type cache --cachepool cache_sanity/test_cache cache_sanity/corigin
  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
  Test mode is not yet supported with lock type sanlock
couldn't create combined cached volume

Comment 11 errata-xmlrpc 2016-11-04 04:13:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html


Note You need to log in before you can comment on or make changes to this bug.