Bug 1111671

Summary: need a more graceful way to fail when attempting to remove cache pool during I/O
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
lvm2 sub component: Cache Logical Volumes QA Contact: Cluster QE <mspqa-list>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, nperic, prajnoha, zkabelac
Version: 7.0   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.112-1.el7 Doc Type: Bug Fix
Doc Text:
Initial implementation of cache volume support failed to properly detach caching resources from cached volume. This has been fixed by correctly ordering removal operation.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 13:09:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1119326    

Description Corey Marthaler 2014-06-20 17:00:40 UTC
Description of problem:

[root@host-001 ~]# lvs -a -o +devices
  LV                                Attr       LSize   Pool                      Origin          Devices
  corigin                           Cwi-a-C---   4.00g remove_during_origin_io_1 [corigin_corig] corigin_corig(0)
  [corigin_corig]                   -wi-ao----   4.00g                                           /dev/sdg1(0)
  [lvol0_pmspare]                   ewi-------   8.00m                                           /dev/sdf1(0)
  remove_during_origin_io_1         Cwi-a-C---   2.00g                                           remove_during_origin_io_1_cdata(0)
  [remove_during_origin_io_1_cdata] Cwi-aoC---   2.00g                                           /dev/sdb1(0)
  [remove_during_origin_io_1_cmeta] ewi-aoC---   8.00m                                           /dev/sdb1(512)

[root@host-001 ~]# dd if=/dev/zero of=/dev/cache_sanity/corigin bs=1M count=2000 &
[1] 21473

[root@host-001 ~]# lvremove -f /dev/cache_sanity/remove_during_origin_io_1
  Flushing cache for corigin
  device-mapper: resume ioctl on  failed: Invalid argument
  Unable to resume cache_sanity-corigin (253:2)
  libdevmapper exiting with 1 device(s) still suspended.

Jun 20 11:53:38 host-001 kernel: device-mapper: block manager: validator mismatch (old=index vs new=array) for block 74
Jun 20 11:53:38 host-001 kernel: device-mapper: cache: could not load cache mappings
Jun 20 11:53:38 host-001 kernel: device-mapper: table: 253:2: cache: preresume failed, error = -22
Jun 20 11:53:38 host-001 kernel: Buffer I/O error on device dm-3, logical block 524272
Jun 20 11:53:38 host-001 kernel: Buffer I/O error on device dm-3, logical block 524272

# Wait for I/O to finish

[root@host-001 ~]# lvremove -f /dev/cache_sanity/remove_during_origin_io_1
  Flushing cache for corigin
  Attempted to decrement suspended device counter below zero.
  0 blocks must still be flushed.
  Logical volume "remove_during_origin_io_1" successfully removed

[root@host-001 ~]# lvs -a -o +devices
  LV      Attr       LSize   Pool Origin Devices
  corigin -wi-a-----   4.00g             /dev/sdg1(0)

Version-Release number of selected component (if applicable):
3.10.0-123.el7.x86_64
lvm2-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
lvm2-libs-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
lvm2-cluster-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-libs-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-event-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-event-libs-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-persistent-data-0.3.2-1.el7    BUILT: Thu Apr  3 09:58:51 CDT 2014
cmirror-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014

Comment 1 Jonathan Earl Brassow 2014-09-30 14:38:11 UTC
Fairly certain this is fixed now.  I checked this as part of looking over bug 1086426.  See https://bugzilla.redhat.com/show_bug.cgi?id=1086426#c3

Comment 3 Nenad Peric 2015-01-16 17:09:18 UTC
The scenario which covered this specifig bug passes without errors, and there are no errors logged either in dmesg or /var/log/messages

I will mark this as verified based on 2 consecutive runs of SCENARIO - [remove_pool_during_origin_io]


VERIFIED with:

3.10.0-223.el7.x86_64

lvm2-2.02.114-5.el7    BUILT: Wed Jan 14 15:42:28 CET 2015
lvm2-libs-2.02.114-5.el7    BUILT: Wed Jan 14 15:42:28 CET 2015
lvm2-cluster-2.02.114-5.el7    BUILT: Wed Jan 14 15:42:28 CET 2015
device-mapper-1.02.92-5.el7    BUILT: Wed Jan 14 15:42:28 CET 2015
device-mapper-libs-1.02.92-5.el7    BUILT: Wed Jan 14 15:42:28 CET 2015
device-mapper-event-1.02.92-5.el7    BUILT: Wed Jan 14 15:42:28 CET 2015
device-mapper-event-libs-1.02.92-5.el7    BUILT: Wed Jan 14 15:42:28 CET 2015
device-mapper-persistent-data-0.4.1-2.el7    BUILT: Wed Nov 12 19:39:46 CET 2014
cmirror-2.02.114-5.el7    BUILT: Wed Jan 14 15:42:28 CET 2015

Comment 5 errata-xmlrpc 2015-03-05 13:09:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0513.html