1189051 – LVM Cache: Add failure modes for NEEDSCHECK flag

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1189051 - LVM Cache: Add failure modes for NEEDSCHECK flag

Summary: LVM Cache: Add failure modes for NEEDSCHECK flag

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Zdenek Kabelac
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1189058
Blocks:	1186924
TreeView+	depends on / blocked

Reported:	2015-02-04 11:12 UTC by Jonathan Earl Brassow
Modified:	2021-09-03 12:54 UTC (History)
CC List:	7 users (show)
Fixed In Version:	lvm2-2.02.125-1.el7
Doc Type:	Enhancement
Doc Text:	Feature: Repair cached volume with NEEDCHECKFLAG set. Reason: Kernel cache target produces NEEDCHECKFLAG, thus lvm2 needs to check this and perform necessary repair operation just like it's doing for thin provisioning. Result: Lvm2 now checks and repairs this flag before any activation of cached LV.
Clone Of:
Clones:	1189058 (view as bug list)
Environment:
Last Closed:	2015-11-19 12:46:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
kernel dump during the vgchange deadlock (263.01 KB, text/plain) 2015-10-15 16:54 UTC, Corey Marthaler	no flags	Details
verbose output of lvs cmd after loaded error target (39.83 KB, text/plain) 2015-10-22 18:45 UTC, Corey Marthaler	no flags	Details
verbose output of deadlocked lvchange cmd after loaded error target (51.34 KB, text/plain) 2015-10-22 18:47 UTC, Corey Marthaler	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:2147	0	normal	SHIPPED_LIVE	lvm2 bug fix and enhancement update	2015-11-19 11:11:07 UTC

Description Jonathan Earl Brassow 2015-02-04 11:12:35 UTC

Cache will be producing a NEEDSCHECK flag.  LVM cache needs to look for this and perform the necessary repair operation - similar to thin-p.

Comment 4 Zdenek Kabelac 2015-07-08 14:29:14 UTC

Support went upstream with this patch:

https://www.redhat.com/archives/lvm-devel/2015-July/msg00033.html

Comment 6 Corey Marthaler 2015-09-16 16:59:30 UTC

I can't seem to "corrupt" the meta data area in a way that causes a needs_check flag to appear. How exactly is this supposed to work?


# small write far away from the superblock: nothing is detected
lvcreate -L 4G -n corigin cache_sanity /dev/sde1

lvcreate -L 2G -n POOL cache_sanity /dev/sdc1
lvcreate -L 12M -n POOL_meta cache_sanity /dev/sdc1

lvconvert --yes --type cache-pool --cachepolicy cleaner --cachemode writeback -c 64 --poolmetadata cache_sanity/POOL_meta cache_sanity/POOL
  WARNING: Converting logical volume cache_sanity/POOL and cache_sanity/POOL_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)

lvconvert --yes --type cache --cachepool cache_sanity/POOL cache_sanity/corigin

[root@host-109 ~]# lvs -a -o +devices
  LV              Attr       LSize  Pool   Origin          Data%  Meta% Cpy%Sync Devices
  [POOL]          Cwi---C---  2.00g                        0.00   2.31  100.00   POOL_cdata(0)
  [POOL_cdata]    Cwi-ao----  2.00g                                              /dev/sdc1(0)
  [POOL_cmeta]    ewi-ao---- 12.00m                                              /dev/sdc1(512)
  corigin         Cwi-a-C---  4.00g [POOL] [corigin_corig] 0.00   2.31  100.00   corigin_corig(0)
  [corigin_corig] owi-aoC---  4.00g                                              /dev/sde1(0)
  [lvol0_pmspare] ewi-------  2.00m                                              /dev/sdc1(515)

[root@host-109 ~]# dd if=/dev/urandom of=/dev/mapper/cache_sanity-POOL_cmeta bs=1 count=1 seek=8192 
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.00102703 s, 1.0 kB/s

[root@host-109 ~]# lvchange -an cache_sanity/corigin
[root@host-109 ~]# lvchange -ay cache_sanity/corigin
[root@host-109 ~]# dmsetup status
cache_sanity-corigin: 0 8388608 cache 8 72/3072 128 0/32768 0 642 0 5752 0 0 0 1 writeback 2 migration_threshold 2048 cleaner 0 rw - 
cache_sanity-corigin_corig: 0 8388608 linear 
cache_sanity-POOL_cdata: 0 4194304 linear 
cache_sanity-POOL_cmeta: 0 24576 linear 



# larger write far away from the superblock: nothing is detected
(same exact setup as above)

[root@host-109 ~]# lvs -a -o +devices
  LV              Attr       LSize  Pool   Origin          Data%  Meta%  Cpy%Sync Devices
  [POOL]          Cwi---C---  2.00g                        0.00   2.31   100.00   POOL_cdata(0)
  [POOL_cdata]    Cwi-ao----  2.00g                                               /dev/sde1(0)
  [POOL_cmeta]    ewi-ao---- 12.00m                                               /dev/sde1(512)
  corigin         Cwi-a-C---  4.00g [POOL] [corigin_corig] 0.00   2.31   100.00   corigin_corig(0)
  [corigin_corig] owi-aoC---  4.00g                                               /dev/sda1(0)
  [lvol0_pmspare] ewi------- 12.00m                                               /dev/sdc1(0)

[root@host-109 ~]# dd if=/dev/urandom of=/dev/mapper/cache_sanity-POOL_cmeta bs=1 count=512 seek=4096
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00284691 s, 180 kB/s

[root@host-109 ~]# lvchange -an cache_sanity/corigin
[root@host-109 ~]# lvchange -ay cache_sanity/corigin
[root@host-109 ~]# dmsetup status
cache_sanity-corigin: 0 8388608 cache 8 72/3072 128 0/32768 0 644 0 5743 0 0 0 1 writethrough 2 migration_threshold 2048 cleaner 0 rw - 
cache_sanity-corigin_corig: 0 8388608 linear 
cache_sanity-POOL_cdata: 0 4194304 linear 
cache_sanity-POOL_cmeta: 0 24576 linear 



# small write near the superblock: cache is now corrupt, manual repair required
(same exact setup as above)

[root@host-109 ~]# lvs -a -o +devices
  LV              Attr       LSize  Pool   Origin          Data%  Meta%  Cpy%Sync Devices
  [POOL]          Cwi---C---  2.00g                        0.00   4.39   100.00   POOL_cdata(0)
  [POOL_cdata]    Cwi-ao----  2.00g                                               /dev/sdc1(0)
  [POOL_cmeta]    ewi-ao---- 12.00m                                               /dev/sdc1(512)
  corigin         Cwi-a-C---  4.00g [POOL] [corigin_corig] 0.00   4.39   100.00   corigin_corig(0)
  [corigin_corig] owi-aoC---  4.00g                                               /dev/sda1(0)
  [lvol0_pmspare] ewi------- 12.00m                                               /dev/sdc1(515)

[root@host-109 ~]# dd if=/dev/urandom of=/dev/mapper/cache_sanity-POOL_cmeta bs=1 count=1 seek=2
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.000996451 s, 1.0 kB/s

[root@host-109 ~]# lvchange -an cache_sanity/corigin
  WARNING: Integrity check of metadata for pool cache_sanity/POOL failed.
[root@host-109 ~]# lvchange -ay cache_sanity/corigin
  Check of pool cache_sanity/POOL failed (status:1). Manual repair required!



3.10.0-313.el7.x86_64
lvm2-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-libs-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-cluster-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015

Comment 8 Corey Marthaler 2015-10-15 16:54:49 UTC

Created attachment 1083314 [details]
kernel dump during the vgchange deadlock

Comment 9 Corey Marthaler 2015-10-19 16:22:53 UTC

Moving this back to assigned for now. The "needs_check" flag does show up when an error target is loaded under the meta device, however, there currently doesn't appear to be a "failure mode" that performs the "necessary repair operation"

[root@host-082 ~]# lvs -a -o +devices
  LV              VG           Attr       LSize   Pool   Origin         Data%  Meta% Cpy%Sync Devices
  [lvol0_pmspare] cache_sanity ewi-------  12.00m                                             /dev/sda1(1024)
  origin          cache_sanity Cwi-a-C---   4.00g [pool] [origin_corig] 0.00   8.66  100.00   origin_corig(0)
  [origin_corig]  cache_sanity owi-aoC---   4.00g                                             /dev/sda1(0)
  [pool]          cache_sanity Cwi---C---   4.00g                       0.00   8.66  100.00   pool_cdata(0)
  [pool_cdata]    cache_sanity Cwi-ao----   4.00g                                             /dev/sdb1(0)
  [pool_cmeta]    cache_sanity ewi-ao----  12.00m                                             /dev/sdc1(0)

[root@host-082 ~]# dmsetup table
cache_sanity-origin: 0 8388608 cache 253:4 253:3 253:5 64 1 writethrough cleaner 0
rhel_host--082-swap: 0 1679360 linear 252:2 2048
rhel_host--082-root: 0 13983744 linear 252:2 1681408
cache_sanity-origin_corig: 0 8388608 linear 8:1 2048
cache_sanity-pool_cdata: 0 8388608 linear 8:17 2048
cache_sanity-pool_cmeta: 0 24576 linear 8:33 2048

[root@host-082 ~]# dmsetup suspend /dev/mapper/cache_sanity-origin
[root@host-082 ~]# dmsetup suspend /dev/mapper/cache_sanity-pool_cmeta
[root@host-082 ~]# dmsetup load cache_sanity-pool_cmeta  --table " 0 24576 error 8:33 2048"
[root@host-082 ~]# dmsetup resume /dev/mapper/cache_sanity-origin
[root@host-082 ~]# dmsetup resume /dev/mapper/cache_sanity-pool_cmeta
[root@host-082 ~]# lvs -a -o +devices
  Failed to parse cache params: Fail
  Failed to parse cache params: Fail
  Failed to parse cache params: Fail
  Failed to parse cache params: Fail
  Failed to parse cache params: Fail
  Failed to parse cache params: Fail
  LV              VG           Attr       LSize   Pool   Origin         Data%  Meta% Cpy%Sync Devices
  [lvol0_pmspare] cache_sanity ewi-------  12.00m                                             /dev/sda1(1024)
  origin          cache_sanity Cwi-a-C---   4.00g [pool] [origin_corig]                       origin_corig(0)
  [origin_corig]  cache_sanity owi-aoC---   4.00g                                             /dev/sda1(0)
  [pool]          cache_sanity Cwi---C---   4.00g                                             pool_cdata(0)
  [pool_cdata]    cache_sanity Cwi-ao----   4.00g                                             /dev/sdb1(0)
  [pool_cmeta]    cache_sanity ewi-ao----  12.00m                                             /dev/sdc1(0)

Oct 19 11:13:20 host-082 kernel: device-mapper: cache cleaner: version 1.0.0 loaded
Oct 19 11:15:13 host-082 kernel: device-mapper: cache: 253:2: metadata operation 'dm_cache_commit' failed: error = -5
Oct 19 11:15:13 host-082 kernel: device-mapper: cache: 253:2: aborting current metadata transaction
Oct 19 11:15:13 host-082 kernel: device-mapper: cache: 253:2: failed to abort metadata transaction
Oct 19 11:15:13 host-082 kernel: device-mapper: cache: 253:2: switching cache to fail mode

[root@host-082 ~]# vgchange -an cache_sanity
  0 logical volume(s) in volume group "cache_sanity" now active
[root@host-082 ~]# vgchange -ay cache_sanity
[deadlock]
^C  wait4 child process 2701 failed: Interrupted system call
  Check of pool cache_sanity/pool failed (status:-1). Manual repair required!
  Interrupted...
  0 logical volume(s) in volume group "cache_sanity" now active

[root@host-082 ~]# vgchange -ay cache_sanity
[deadlock again]

Comment 10 Corey Marthaler 2015-10-19 16:24:08 UTC

comment #9 was run with the following rpms:

3.10.0-325.el7.x86_64
lvm2-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-libs-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-cluster-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015

Comment 11 Zdenek Kabelac 2015-10-22 14:35:31 UTC

Failed to parse cache params: Fail

'lvm2' command currently does not parse this status properly - needs to  be enhanced together with proper 'status' reporting.


This message looks like some not easily repairable damage happened to cache ?

Check of pool cache_sanity/pool failed (status:-1). Manual repair required!

Probably for Joe.

Comment 12 Corey Marthaler 2015-10-22 18:45:40 UTC

Created attachment 1085641 [details]
verbose output of lvs cmd after loaded error target

Comment 13 Corey Marthaler 2015-10-22 18:47:11 UTC

Created attachment 1085642 [details]
verbose output of deadlocked lvchange cmd after loaded error target

#misc/lvm-exec.c:71     Executing: /usr/sbin/cache_check -q --clear-needs-check-flag /dev/mapper/cache_sanity-pool_cmeta
#misc/lvm-flock.c:38         _drop_shared_flock /run/lock/lvm/V_cache_sanity.
#misc/lvm-flock.c:38         _drop_shared_flock /run/lock/lvm/A_RQ56VOoYlp9vrBCJUHm0Sq4VPsz8tHZvckfLlVIW5yorEnN9ViNu4Kz5iG0kn9f.
#mm/memlock.c:629         memlock reset.

Comment 14 Corey Marthaler 2015-10-22 18:58:14 UTC

[root@host-082 ~]# strace /usr/sbin/cache_check -q --clear-needs-check-flag /dev/mapper/cache_sanity-pool_cmeta
execve("/usr/sbin/cache_check", ["/usr/sbin/cache_check", "-q", "--clear-needs-check-flag", "/dev/mapper/cache_sanity-pool_cm"...], [/* 38 vars */]) = 0
brk(0)                                  = 0xaf2000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdc58e32000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=43496, ...}) = 0
mmap(NULL, 43496, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fdc58e27000
close(3) 

[...]

munmap(0x7fdc58e27000, 43496)           = 0
brk(0)                                  = 0xaf2000
brk(0xb13000)                           = 0xb13000
brk(0)                                  = 0xb13000
stat("/dev/mapper/cache_sanity-pool_cmeta", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 4), ...}) = 0
stat("/dev/mapper/cache_sanity-pool_cmeta", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 4), ...}) = 0
open("/dev/mapper/cache_sanity-pool_cmeta", O_RDONLY) = 3
ioctl(3, BLKGETSIZE64, 12582912)        = 0
close(3)                                = 0
stat("/dev/mapper/cache_sanity-pool_cmeta", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 4), ...}) = 0
open("/dev/mapper/cache_sanity-pool_cmeta", O_RDONLY|O_EXCL|O_DIRECT) = -1 EBUSY (Device or resource busy)
exit_group(1)                           = ?
+++ exited with 1 +++

Comment 15 Zdenek Kabelac 2015-10-23 09:29:11 UTC

Yep -  here  is simple  'reproduced'  for cache check 'busy-loop':


dmsetup create --table '0 2000 error'  errdev

cache_check  /dev/mapper/errdev

examining superblock
  superblock is corrupt
    incomplete io for block 0, e.res = 18446744073709551611, e.res2 = 0, offset = 0, nbytes = 4096


--- very slooooowly going here  ---

(gdb) bt
#0  0x00007f6cc456c644 in __io_getevents_0_4 (ctx=0x7f6cc4c9d000, min_nr=1, nr=3949, events=0x555ce9562820, timeout=0x0)
    at io_getevents.c:25
#1  0x00007f6cc456c67d in io_getevents_0_4 (ctx=<optimized out>, min_nr=min_nr@entry=1, nr=<optimized out>, 
    events=<optimized out>, timeout=timeout@entry=0x0) at io_getevents.c:54
#2  0x0000555ce7e3a7f0 in bcache::block_cache::wait_io (this=this@entry=0x555ce955e6c8) at block-cache/block_cache.cc:202
#3  0x0000555ce7e3add0 in bcache::block_cache::wait_all (this=<optimized out>) at block-cache/block_cache.cc:261
#4  bcache::block_cache::flush (this=this@entry=0x555ce955e6c8) at block-cache/block_cache.cc:674
#5  0x0000555ce7e3aecc in bcache::block_cache::~block_cache (this=0x555ce955e6c8, __in_chrg=<optimized out>)
    at block-cache/block_cache.cc:491
#6  0x0000555ce7eb1ef3 in persistent_data::block_manager<4096u>::~block_manager (this=0x555ce955e6c0, 
    __in_chrg=<optimized out>) at persistent-data/block.h:42
#7  boost::checked_delete<persistent_data::block_manager<4096u> > (x=0x555ce955e6c0)
    at /usr/include/boost/core/checked_delete.hpp:34
#8  boost::detail::sp_counted_impl_p<persistent_data::block_manager<4096u> >::dispose (this=<optimized out>)
    at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:78
#9  0x0000555ce7e3eaf8 in boost::detail::sp_counted_base::release (this=0x555ce95815d0)
    at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
#10 boost::detail::shared_count::~shared_count (this=0x7ffc84793ad8, __in_chrg=<optimized out>)
    at /usr/include/boost/smart_ptr/detail/shared_count.hpp:467
#11 boost::shared_ptr<persistent_data::block_manager<4096u> >::~shared_ptr (this=0x7ffc84793ad0, __in_chrg=<optimized out>)
    at /usr/include/boost/smart_ptr/shared_ptr.hpp:330
#12 (anonymous namespace)::metadata_check (fs=<synthetic pointer>, path="/dev/mapper/errdev") at caching/cache_check.cc:224
#13 (anonymous namespace)::check (fs=<synthetic pointer>, path="/dev/mapper/errdev") at caching/cache_check.cc:300
#14 (anonymous namespace)::check_with_exception_handling (fs=<synthetic pointer>, path="/dev/mapper/errdev")
    at caching/cache_check.cc:318
#15 cache_check_main (argc=<optimized out>, argv=<optimized out>) at caching/cache_check.cc:410
#16 0x0000555ce7e364d4 in base::command::run (this=0x555ce815d6c0 <caching::cache_check_cmd>, argv=0x7ffc84794798, argc=2)
    at base/application.h:26
#17 base::application::run (this=0x7ffc84794680, argc=2, argv=0x7ffc84794798) at base/application.cc:32
#18 0x0000555ce7e35431 in main (argc=2, argv=0x7ffc84794798) at main.cc:39


---  at some point it will finish ---

but this clearly relates to the size of  errored device

For 'reproducer'  with 2000 sectors - it's about 40 seconds
with  20000 sectors  it has NOT finished in 10 minutes - so it's not even linear time increase...

So - new bug for cache_check tool to address this 'slowness' should be created.


As for actual testing of this NEEDSCHECKFLAG - I'd probably suggest  to restore 'previous' table content before 'vgchange -an' -  since  cache_check  is  executed on 'deactivation' as well as on 'activation' phase.

So if you want to check 'needs' has been cleared in this test case - you would need to give it back original device - after  cache target itself switched to 'Fail' mode.

Comment 16 Corey Marthaler 2015-10-23 15:05:27 UTC

Based on comment #15 I'll mark this feature verified since LVM does try to do the right thing by calling check_cache when the needs_check flag is present. I'll file a new bug for the check_cache "slowness".

[root@host-109 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin         Data%  Meta% Cpy%Sync Devices
  [lvol0_pmspare] cache_sanity  ewi-------  12.00m                                             /dev/sda1(0)
  origin          cache_sanity  Cwi-a-C---   4.00g [pool] [origin_corig] 0.00   8.66  100.00   origin_corig(0)
  [origin_corig]  cache_sanity  owi-aoC---   4.00g                                             /dev/sdd1(0)
  [pool]          cache_sanity  Cwi---C---   4.00g                       0.00   8.66  100.00   pool_cdata(0)
  [pool_cdata]    cache_sanity  Cwi-ao----   4.00g                                             /dev/sde1(0)
  [pool_cmeta]    cache_sanity  ewi-ao----  12.00m                                             /dev/sdf1(0)

[root@host-109 ~]# dmsetup table
cache_sanity-origin: 0 8388608 cache 253:4 253:3 253:5 64 1 writethrough cleaner 0
cache_sanity-origin_corig: 0 8388608 linear 8:49 2048
cache_sanity-pool_cdata: 0 8388608 linear 8:65 2048
cache_sanity-pool_cmeta: 0 24576 linear 8:81 2048

[root@host-109 ~]# dd if=/dev/zero of=/dev/mapper/cache_sanity-origin bs=512 count=128 seek=0
128+0 records in
128+0 records out
65536 bytes (66 kB) copied, 0.0127035 s, 5.2 MB/s

[root@host-109 ~]# dmsetup suspend /dev/mapper/cache_sanity-origin
[root@host-109 ~]# dmsetup suspend /dev/mapper/cache_sanity-pool_cmeta
[root@host-109 ~]# dmsetup load cache_sanity-pool_cmeta  --table " 0 24576 error 8:81 2048"
[root@host-109 ~]# dmsetup resume /dev/mapper/cache_sanity-origin
[root@host-109 ~]# dmsetup resume /dev/mapper/cache_sanity-pool_cmeta
[root@host-109 ~]# dd if=/dev/urandom of=/dev/mapper/cache_sanity-origin bs=512 count=256 seek=0
256+0 records in
256+0 records out
131072 bytes (131 kB) copied, 0.0426256 s, 3.1 MB/s

# needs_check flag is now present
[root@host-109 ~]# dmsetup status cache_sanity-origin
0 8388608 cache 8 397/3072 64 0/131072 0 304 0 48 0 0 0 1 writethrough 2 migration_threshold 2048 cleaner 0 rw needs_check 

# Reload the original non error target meta device so that check_cache can actually finish
[root@host-109 ~]# dmsetup suspend /dev/mapper/cache_sanity-pool_cmeta
[root@host-109 ~]# dmsetup load cache_sanity-pool_cmeta  --table " 0 24576 linear 8:81 2048"
[root@host-109 ~]# dmsetup resume /dev/mapper/cache_sanity-pool_cmeta

# Origin is still in a Failed mode
[root@host-109 ~]# dmsetup status cache_sanity-origin
0 8388608 cache Fail

[root@host-109 ~]# vgchange -an cache_sanity
  0 logical volume(s) in volume group "cache_sanity" now active
[root@host-109 ~]# vgchange -ay cache_sanity
  1 logical volume(s) in volume group "cache_sanity" now active

# After reactivation the cache now appears cleared and fine
[root@host-109 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool   Origin         Data%  Meta% Cpy%Sync Devices
  [lvol0_pmspare] cache_sanity  ewi-------  12.00m                                             /dev/sda1(0)
  origin          cache_sanity  Cwi-a-C---   4.00g [pool] [origin_corig] 0.00   12.92 100.00   origin_corig(0)
  [origin_corig]  cache_sanity  owi-aoC---   4.00g                                             /dev/sdd1(0)
  [pool]          cache_sanity  Cwi---C---   4.00g                       0.00   12.92 100.00   pool_cdata(0)
  [pool_cdata]    cache_sanity  Cwi-ao----   4.00g                                             /dev/sde1(0)
  [pool_cmeta]    cache_sanity  ewi-ao----  12.00m                                             /dev/sdf1(0)

[root@host-109 ~]#  dmsetup status cache_sanity-origin
0 8388608 cache 8 397/3072 64 0/131072 0 208 0 16 0 0 0 1 writethrough 2 migration_threshold 2048 cleaner 0 rw - 
[root@host-109 ~]#  dmsetup table
cache_sanity-origin: 0 8388608 cache 253:3 253:2 253:4 64 1 writethrough cleaner 0
cache_sanity-origin_corig: 0 8388608 linear 8:49 2048
cache_sanity-pool_cdata: 0 8388608 linear 8:65 2048
cache_sanity-pool_cmeta: 0 24576 linear 8:81 2048

Comment 17 Corey Marthaler 2015-10-23 15:27:21 UTC

FYI bug 1274834 was filed for the issue mentioned in comment #15.

Comment 18 errata-xmlrpc 2015-11-19 12:46:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2147.html

Note You need to log in before you can comment on or make changes to this bug.