Bug 1937533 - write cache splitting cmd can spin attempting to split an (E)rror state origin volume who's cache pool is an error target
Summary: write cache splitting cmd can spin attempting to split an (E)rror state origi...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: lvm2
Version: 8.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-10 21:29 UTC by Corey Marthaler
Modified: 2022-09-10 07:25 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-10 07:25:25 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
verbose lvconvert attempt (5.05 MB, text/plain)
2021-03-10 21:54 UTC, Corey Marthaler
no flags Details

Description Corey Marthaler 2021-03-10 21:29:42 UTC
Description of problem:

This is basically an attempt to clean up the state used to verify the bug 1805893.


SCENARIO - [verify_writecache_error_attr_after_offline_pool_corruption]
Create a snapshot of writecache origin and then verify new (E)rror attribute after OFFLINE pool corruption

*** Writecache info for this scenario ***
*  origin (slow):  /dev/sdk1
*  pool (fast):    /dev/sdc1
************************************

Adding "slow" and "fast" tags to corresponding pvs
Create origin (slow) volume
lvcreate --yes --wipesignatures y  -L 4G -n cworigin writecache_sanity @slow

Create writecache cvol (fast) volumes
lvcreate --yes  -L 4G -n POOL writecache_sanity @fast

Deactivate *ONLY* fast pool before conversion to write cache (SEE bug 1185347)
Create writecached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --yes --type writecache --cachesettings 'low_watermark=47 high_watermark=55 writeback_jobs=2119 autocommit_blocks=2041 autocommit_time=1296' --cachevol writecache_sanity/POOL writecache_sanity/cworigin
Placing an XFS filesystem on origin volume
Mounting origin volume

Writing files to /mnt/cworigin

Checking files on /mnt/cworigin

Creating thin snapshot of origin volume
lvcreate --yes  -s /dev/writecache_sanity/cworigin -c 128 -n snap -L 500M
Mounting thin snapshot volume and origin
mkdir -p /mnt/snap
mount -o nouuid /dev/writecache_sanity/snap /mnt/snap
Checking files on /mnt/snap

dmsetup table | grep writecache_sanity-POOL_cvol
Swapping the current pool device table (writecache_sanity-POOL_cvol: 0 8388608 linear 8:33 2048) for an error target
dmsetup suspend writecache_sanity-cworigin
dmsetup suspend writecache_sanity-POOL_cvol
Loading in new pool device table (0 8388608 error 8:33 2048)
dmsetup load writecache_sanity-POOL_cvol --table " 0 8388608 error 8:33 2048"
dmsetup resume writecache_sanity-POOL_cvol
dmsetup resume writecache_sanity-cworigin
Attempt --repair on live pool (should fail)
  Command on LV writecache_sanity/POOL_cvol does not accept LV type linear.
  Command not permitted on LV writecache_sanity/POOL_cvol.

[root@hayes-03 ~]# lvs -a -o +devices
  Internal error: WARNING: Segment type error found does not match expected type striped for writecache_sanity/POOL_cvol.
  LV                VG                Attr       LSize   Pool        Origin            Data%  Meta%  Move Log Cpy%Sync Convert Devices           
  [POOL_cvol]       writecache_sanity Cwi-XXC-X-   4.00g                                                                       /dev/sdc1(0)      
  cworigin          writecache_sanity owi-a-C---   4.00g [POOL_cvol] [cworigin_wcorig] 9.00                                    cworigin_wcorig(0)
  [cworigin_wcorig] writecache_sanity owi-aoC---   4.00g                                                                       /dev/sdk1(0)      
  snap              writecache_sanity swi-a-s--- 500.00m             cworigin          0.52                                    /dev/sdc1(1024)   

[root@hayes-03 ~]# dmsetup table
writecache_sanity-cworigin-real: 0 8388608 writecache s 253:2 253:1 512 10 high_watermark 55 low_watermark 47 writeback_jobs 2119 autocommit_blocks 2041 autocommit_time 1296
writecache_sanity-snap: 0 8388608 snapshot 253:3 253:4 P 256
writecache_sanity-POOL_cvol: 0 8388608 error 
writecache_sanity-snap-cow: 0 1024000 linear 8:33 8390656
writecache_sanity-cworigin_wcorig: 0 8388608 linear 8:161 2048
writecache_sanity-cworigin: 0 8388608 snapshot-origin 253:3

[root@hayes-03 ~]# dmsetup status
writecache_sanity-cworigin-real: 0 8388608 writecache 0 8134407 7402283 0
writecache_sanity-snap: 0 8388608 snapshot 5376/1024000 512
writecache_sanity-POOL_cvol: 0 8388608 error 
writecache_sanity-snap-cow: 0 1024000 linear 
writecache_sanity-cworigin_wcorig: 0 8388608 linear 
writecache_sanity-cworigin: 0 8388608 snapshot-origin 


Attempt to split off write cache pool (should fail and cause the (E)rror state mentioned in bug 1805893)

[root@hayes-03 ~]# lvconvert -vvvv --splitcache writecache_sanity/cworigin > /tmp/lvs 2>&1
  
^C


[root@hayes-03 ~]# lvs -a -o +devices
  Internal error: WARNING: Segment type error found does not match expected type striped for writecache_sanity/POOL_cvol.
  LV                VG                Attr       LSize   Pool        Origin            Data%  Meta%  Move Log Cpy%Sync Convert Devices           
  [POOL_cvol]       writecache_sanity Cwi-XXC-X-   4.00g                                                                       /dev/sdc1(0)      
  cworigin          writecache_sanity owi-a-C-E-   4.00g [POOL_cvol] [cworigin_wcorig] 9.00                                    cworigin_wcorig(0)
  [cworigin_wcorig] writecache_sanity owi-aoC---   4.00g                                                                       /dev/sdk1(0)      
  snap              writecache_sanity swi-a-s--- 500.00m             cworigin          0.52                                    /dev/sdc1(1024)   



The following is looping over and over.

[...] 
14:24:46.658349 lvconvert[3214] format_text/format-text.c:196  Reading mda header sector from /dev/sdc1 at 4096
14:24:46.658358 lvconvert[3214] format_text/format-text.c:334  VG writecache_sanity metadata check /dev/sdc1 mda 4096 slot0 offset 142336 size 3096
14:24:46.658386 lvconvert[3214] format_text/import.c:155  Reading metadata from /dev/sdc1 at 146432 size 3096 (+0)
14:24:46.658458 lvconvert[3214] metadata/vg.c:64  Allocated VG writecache_sanity at 0x564fec300870.
14:24:46.658480 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/cworigin.
14:24:46.658490 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/snap.
14:24:46.658501 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/POOL_cvol.
14:24:46.658510 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/cworigin_wcorig.
14:24:46.658521 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/snapshot0.
14:24:46.658532 lvconvert[3214] metadata/lv_manip.c:1331  Stack writecache_sanity/cworigin:0[0] on LV writecache_sanity/cworigin_wcorig:0.
14:24:46.658542 lvconvert[3214] metadata/lv_manip.c:871  Adding writecache_sanity/cworigin:0 as an user of writecache_sanity/cworigin_wcorig.
14:24:46.658551 lvconvert[3214] metadata/lv_manip.c:871  Adding writecache_sanity/cworigin:0 as an user of writecache_sanity/POOL_cvol.
14:24:46.658572 lvconvert[3214] format_text/format-text.c:499  Found metadata on /dev/sdc1 at 146432 size 3096 for VG writecache_sanity
14:24:46.658581 lvconvert[3214] metadata/metadata.c:4848  Reading VG writecache_sanity metadata from /dev/sdk1 4096
14:24:46.658588 lvconvert[3214] format_text/format-text.c:196  Reading mda header sector from /dev/sdk1 at 4096
14:24:46.658597 lvconvert[3214] format_text/format-text.c:334  VG writecache_sanity metadata check /dev/sdk1 mda 4096 slot0 offset 142336 size 3096
14:24:46.658608 lvconvert[3214] format_text/import.c:155  Reading metadata from /dev/sdk1 at 146432 size 3096 (+0)
14:24:46.658628 lvconvert[3214] format_text/import.c:174  Skipped parsing metadata on /dev/sdk1
14:24:46.658635 lvconvert[3214] format_text/format-text.c:499  Found metadata on /dev/sdk1 at 146432 size 3096 for VG writecache_sanity
14:24:46.658644 lvconvert[3214] metadata/metadata.c:4848  Reading VG writecache_sanity metadata from /dev/sdl1 4096
14:24:46.658652 lvconvert[3214] format_text/format-text.c:196  Reading mda header sector from /dev/sdl1 at 4096
14:24:46.658663 lvconvert[3214] format_text/format-text.c:334  VG writecache_sanity metadata check /dev/sdl1 mda 4096 slot0 offset 142336 size 3096
14:24:46.658671 lvconvert[3214] format_text/import.c:155  Reading metadata from /dev/sdl1 at 146432 size 3096 (+0)
14:24:46.658688 lvconvert[3214] format_text/import.c:174  Skipped parsing metadata on /dev/sdl1
14:24:46.658696 lvconvert[3214] format_text/format-text.c:499  Found metadata on /dev/sdl1 at 146432 size 3096 for VG writecache_sanity
14:24:46.658703 lvconvert[3214] metadata/metadata.c:4848  Reading VG writecache_sanity metadata from /dev/sdm1 4096
14:24:46.658709 lvconvert[3214] format_text/format-text.c:196  Reading mda header sector from /dev/sdm1 at 4096
14:24:46.658718 lvconvert[3214] format_text/format-text.c:334  VG writecache_sanity metadata check /dev/sdm1 mda 4096 slot0 offset 142336 size 3096
14:24:46.658727 lvconvert[3214] format_text/import.c:155  Reading metadata from /dev/sdm1 at 146432 size 3096 (+0)
14:24:46.658746 lvconvert[3214] format_text/import.c:174  Skipped parsing metadata on /dev/sdm1
14:24:46.658753 lvconvert[3214] format_text/format-text.c:499  Found metadata on /dev/sdm1 at 146432 size 3096 for VG writecache_sanity
14:24:46.658762 lvconvert[3214] cache/lvmcache.c:1834  lvmcache_update_vg writecache_sanity for info /dev/sdc1
14:24:46.658772 lvconvert[3214] cache/lvmcache.c:1834  lvmcache_update_vg writecache_sanity for info /dev/sdm1
14:24:46.658779 lvconvert[3214] cache/lvmcache.c:1834  lvmcache_update_vg writecache_sanity for info /dev/sdk1
14:24:46.658792 lvconvert[3214] cache/lvmcache.c:1834  lvmcache_update_vg writecache_sanity for info /dev/sdl1
14:24:46.658804 lvconvert[3214] device_mapper/libdm-config.c:1014  metadata/lvs_history_retention_time not found in config: defaulting to 0
14:24:46.658811 lvconvert[3214] metadata/pv_manip.c:417  /dev/sdc1 0:      0   1024: POOL_cvol(0:0)
14:24:46.658819 lvconvert[3214] metadata/pv_manip.c:417  /dev/sdc1 1:   1024    125: snap(0:0)
14:24:46.658828 lvconvert[3214] metadata/pv_manip.c:417  /dev/sdc1 2:   1149 113186: NULL(0:0)
14:24:46.658836 lvconvert[3214] metadata/pv_manip.c:417  /dev/sdm1 0:      0 476799: NULL(0:0)
14:24:46.658844 lvconvert[3214] metadata/pv_manip.c:417  /dev/sdk1 0:      0   1024: cworigin_wcorig(0:0)
14:24:46.658852 lvconvert[3214] metadata/pv_manip.c:417  /dev/sdk1 1:   1024 475775: NULL(0:0)
14:24:46.658860 lvconvert[3214] metadata/pv_manip.c:417  /dev/sdl1 0:      0 476799: NULL(0:0)
14:24:46.658871 lvconvert[3214] device/dev-io.c:117  /dev/sdc1: size is 936640432 sectors
14:24:46.658880 lvconvert[3214] device/dev-io.c:117  /dev/sdm1: size is 3905945520 sectors
14:24:46.658887 lvconvert[3214] device/dev-io.c:117  /dev/sdk1: size is 3905945520 sectors
14:24:46.658893 lvconvert[3214] device/dev-io.c:117  /dev/sdl1: size is 3905945520 sectors
14:24:46.659032 lvconvert[3214] metadata/vg.c:64  Allocated VG writecache_sanity at 0x564fec304880.
14:24:46.659054 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/cworigin.
14:24:46.659064 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/snap.
14:24:46.659072 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/POOL_cvol.
14:24:46.659082 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/cworigin_wcorig.
14:24:46.659091 lvconvert[3214] format_text/import_vsn1.c:610  Importing logical volume writecache_sanity/snapshot0.
14:24:46.659104 lvconvert[3214] metadata/lv_manip.c:1331  Stack writecache_sanity/cworigin:0[0] on LV writecache_sanity/cworigin_wcorig:0.
14:24:46.659113 lvconvert[3214] metadata/lv_manip.c:871  Adding writecache_sanity/cworigin:0 as an user of writecache_sanity/cworigin_wcorig.
14:24:46.659123 lvconvert[3214] metadata/lv_manip.c:871  Adding writecache_sanity/cworigin:0 as an user of writecache_sanity/POOL_cvol.
14:24:46.659149 lvconvert[3214] activate/dev_manager.c:819  Getting device info for writecache_sanity-cworigin [LVM-Z8hf12w7Ag1hFIUvMTqGfohJpKUW1GB9ypYyXWH2dakcXfM6apIbrw42Rn490yrg].
14:24:46.659170 lvconvert[3214] device_mapper/ioctl/libdm-iface.c:1898  dm info  LVM-Z8hf12w7Ag1hFIUvMTqGfohJpKUW1GB9ypYyXWH2dakcXfM6apIbrw42Rn490yrg [ noopencount flush ]   [16384] (*1)
14:24:46.659189 lvconvert[3214] activate/dev_manager.c:819  Getting device info for writecache_sanity-cworigin [LVM-Z8hf12w7Ag1hFIUvMTqGfohJpKUW1GB9ypYyXWH2dakcXfM6apIbrw42Rn490yrg].
14:24:46.659199 lvconvert[3214] device_mapper/ioctl/libdm-iface.c:1898  dm info  LVM-Z8hf12w7Ag1hFIUvMTqGfohJpKUW1GB9ypYyXWH2dakcXfM6apIbrw42Rn490yrg [ noopencount flush ]   [16384] (*1)
14:24:46.659212 lvconvert[3214] activate/dev_manager.c:819  Getting device info for writecache_sanity-cworigin-real [LVM-Z8hf12w7Ag1hFIUvMTqGfohJpKUW1GB9ypYyXWH2dakcXfM6apIbrw42Rn490yrg-real].
14:24:46.659222 lvconvert[3214] device_mapper/ioctl/libdm-iface.c:1898  dm status  LVM-Z8hf12w7Ag1hFIUvMTqGfohJpKUW1GB9ypYyXWH2dakcXfM6apIbrw42Rn490yrg-real [ noopencount noflush ]   [16384] (*1)
14:24:46.659241 lvconvert[3214] mm/memlock.c:598  Unlock: Memlock counters: prioritized:0 locked:0 critical:0 daemon:0 suspended:0
14:24:46.659250 lvconvert[3214] activate/fs.c:492  Syncing device names
14:24:46.659262 lvconvert[3214] misc/lvm-flock.c:84  Unlocking /run/lock/lvm/V_writecache_sanity
14:24:46.659273 lvconvert[3214] misc/lvm-flock.c:47  _undo_flock /run/lock/lvm/V_writecache_sanity
14:24:46.659294 lvconvert[3214] metadata/vg.c:79  Freeing VG writecache_sanity at 0x564fec304880.
14:24:46.659304 lvconvert[3214] metadata/vg.c:79  Freeing VG writecache_sanity at 0x564fec300870.
14:24:46.659318 lvconvert[3214] lvconvert.c:5767  Detaching writecache cleaning 732124 blocks
14:24:46.659323 lvconvert[3214] lvconvert.c:5768  This command can be cancelled and rerun to complete writecache detach.


Version-Release number of selected component (if applicable):
kernel-4.18.0-295.el8    BUILT: Fri Mar  5 07:26:41 CST 2021

lvm2-2.03.11-5.el8    BUILT: Fri Mar  5 07:13:31 CST 2021
lvm2-libs-2.03.11-5.el8    BUILT: Fri Mar  5 07:13:31 CST 2021
lvm2-dbusd-2.03.11-4.el8    BUILT: Thu Feb 11 04:35:01 CST 2021

device-mapper-1.02.175-5.el8    BUILT: Fri Mar  5 07:13:31 CST 2021
device-mapper-libs-1.02.175-5.el8    BUILT: Fri Mar  5 07:13:31 CST 2021
device-mapper-event-1.02.175-5.el8    BUILT: Fri Mar  5 07:13:31 CST 2021
device-mapper-event-libs-1.02.175-5.el8    BUILT: Fri Mar  5 07:13:31 CST 2021

Comment 1 Corey Marthaler 2021-03-10 21:54:40 UTC
Created attachment 1762474 [details]
verbose lvconvert attempt

Comment 2 David Teigland 2021-03-15 22:36:49 UTC
I think you'd need to do a force detach for this since it won't be possible to write back the content of a missing writecache.  The ability to force detach a cachevol is covered in bug 1915492.  In terms of usage you'd just add the --force option to the command, which will skip any attempt to write back the cache before detaching.

The fix for bug 1915492 is:
https://sourceware.org/git/?p=lvm2.git;a=commit;h=cb54d0801d776205788f8f46b31dd9e487833343

commit cb54d0801d776205788f8f46b31dd9e487833343
Author: David Teigland <teigland>
Date:   Tue Jan 12 11:53:02 2021 -0600

    cachevol: allow forced detaching of damaged or invalid cachevol
    
    A cachevol can be forcibly detached when it's missing devices.
    Also allow this if it's damaged/invalid and unrepairable.
    This would be needed to recover data from the origin LV after
    a cachevol is lost or damaged beyond repair.

Comment 3 David Teigland 2021-04-13 17:15:08 UTC
I think using --force will solve it.

Comment 4 Corey Marthaler 2021-04-13 20:53:17 UTC
Currently in the latest 8.4 rpms, the force doesn't work. 

Like mentioned above, it is intended to work in 8.5 with the fix for bug 1915492. We'll wait to try this again once that bug is verified. Otherwise, we can kill the spinning lvconvert cmd for the purposes of testing the 'E' state if this deadlock is not deemed to be a big deal.

Comment 7 RHEL Program Management 2022-09-10 07:25:25 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.