Hide Forgot
Description of problem: I've reproduced this twice so far in roughly two dozen attempts. I created the snapshot volumes "larger" than the origin volumes due do the re-encryption of the origin volume. I'll continue to attempt to boil this down, but for now let me know if anyone sees an operation that is blatantly shooting myself in the foot. This exact scenario always worked with dm-cache and 90%+ of the time with writecache so I don't think i'm doing anything too crazy. SCENARIO - [cache_origin_rename_in_between_luks_encryption_operations] Create a writecache volume with fs data, encrypt origin, and then rename the cache origin volume (pool rename not supported) in between re-encryption stack operations *** Cache info for this scenario *** * origin (slow): /dev/sdb1 * pool (fast): /dev/sdf1 ************************************ Adding "slow" and "fast" tags to corresponding pvs Create origin (slow) volume lvcreate --wipesignatures y -L 4G -n rename_orig_A writecache_sanity @slow Create cache data and cache metadata (fast) volumes lvcreate -L 4G -n rename_pool_A writecache_sanity @fast Deactivte both fast and slow volumes before conversion to write cache Create writecached volume by combining the cache pool (fast) and origin (slow) volumes lvconvert --yes --type writecache --cachesettings ' low_watermark=42 high_watermark=77 writeback_jobs=2173 autocommit_blocks=1091 autocommit_time=1354' --cachevol writecache_sanity/rename_pool_A writecache_sanity/rename_orig_A Activating volume: rename_orig_A Encrypting rename_orig_A volume cryptsetup luksFormat /dev/writecache_sanity/rename_orig_A cryptsetup luksOpen /dev/writecache_sanity/rename_orig_A luks_rename_orig_A Placing an xfs filesystem on origin volume Mounting origin volume Writing files to /mnt/rename_orig_A Checking files on /mnt/rename_orig_A syncing before snap creation... Making 1st snapshot of origin volume lvcreate -s /dev/writecache_sanity/rename_orig_A -c 64 -n snap1 -L 4296704 Renaming cache ORIGIN volume... lvrename writecache_sanity/rename_orig_A writecache_sanity/rename_orig_B Mounting 1st snap volume cryptsetup luksOpen /dev/writecache_sanity/snap1 luks_snap1 Checking files on /mnt/snap1 Writing files to /mnt/rename_orig_A syncing before snap creation... (ONLINE) Re-encrypting rename_orig_A volume cryptsetup reencrypt --resilience none --active-name luks_rename_orig_A Making 2nd snapshot of origin volume lvcreate -s /dev/writecache_sanity/rename_orig_B -c 64 -n snap2 -L 4296704 Mounting 2nd snap volume cryptsetup luksOpen /dev/writecache_sanity/snap2 luks_snap2 Checking files on /mnt/rename_orig_A Checking files on /mnt/snap2 Can not stat qdiihtdlbbvxycvnyjlwphytajrgfmcyskhagknjnkjpogd: Structure needs cleaning checkit verify failed Feb 4 11:45:49 host-083 qarshd[3079]: Running cmdline: /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/snap2 -f /tmp/checkit_origin_1 -v Feb 4 11:46:11 host-083 restraintd[1461]: *** Current Time: Tue Feb 04 11:46:11 2020 Localwatchdog at: * Disabled! * Feb 4 11:46:13 host-083 kernel: XFS (dm-16): Metadata CRC error detected at xfs_dir3_data_read_verify+0xc2/0xe0 [xfs], xfs_dir3_data block 0x39ea8 Feb 4 11:46:13 host-083 kernel: XFS (dm-16): Unmount and run xfs_repair Feb 4 11:46:13 host-083 kernel: XFS (dm-16): First 128 bytes of corrupted metadata buffer: Feb 4 11:46:13 host-083 kernel: 00000000: df 01 4e 5d 42 83 e8 b9 06 aa 90 4d 3f fb e2 3d ..N]B......M?..= Feb 4 11:46:13 host-083 kernel: 00000010: 68 94 57 9b 53 58 7a 13 11 42 06 06 5b a7 64 e2 h.W.SXz..B..[.d. Feb 4 11:46:13 host-083 kernel: 00000020: 94 e7 01 ec 40 95 5a 84 dc 27 e2 85 b3 63 2d a4 ....@.Z..'...c-. Feb 4 11:46:13 host-083 kernel: 00000030: 95 58 09 ff f8 ce 72 74 f7 7f b0 45 c1 07 cd b3 .X....rt...E.... Feb 4 11:46:13 host-083 kernel: 00000040: 2b 90 43 60 ce 7a c4 9c 9b c3 fe 32 fa c3 0d 45 +.C`.z.....2...E Feb 4 11:46:13 host-083 kernel: 00000050: e4 a8 e9 a6 ec c1 8a 4b c1 56 cd 39 fd 87 9d 93 .......K.V.9.... Feb 4 11:46:13 host-083 kernel: 00000060: 27 78 56 ea 7d ab 48 fb 90 de 2c 91 06 4a de 78 'xV.}.H...,..J.x Feb 4 11:46:13 host-083 kernel: 00000070: da 9a da 82 9d e9 3e 21 ac f9 e6 c9 0d 53 c1 ca ......>!.....S.. Feb 4 11:46:13 host-083 kernel: XFS (dm-16): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x39ea8 len 8 error 74 [root@host-083 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/luks_rename_orig_A 4.0G 778M 3.3G 20% /mnt/rename_orig_A /dev/mapper/luks_snap1 4.0G 421M 3.6G 11% /mnt/snap1 /dev/mapper/luks_snap2 4.0G 778M 3.3G 20% /mnt/snap2 [root@host-083 ~]# ls -l /mnt/rename_orig_A | wc -l 2401 [root@host-083 ~]# ls -l /mnt/snap1 | wc -l 1201 [root@host-083 ~]# ls -l /mnt/snap2 | wc -l ls: reading directory '/mnt/snap2': Structure needs cleaning 958 [root@host-083 ~]# umount /mnt/snap2 [root@host-083 ~]# xfs_repair /dev/mapper/luks_snap2 [...] entry "ppyeoxxlwwomxtfqv" at block 12 offset 920 in directory inode 128 references free inode 709403 clearing inode number in entry at offset 920... entry "hmanrjtwiwspdtpxuqlucanjjjavwvueiwscgoyrothtpar" at block 12 offset 952 in directory inode 128 references free inode 709404 clearing inode number in entry at offset 952... entry "kcsxbukaaudxniyxnfvkawsxxr" at block 12 offset 1016 in directory inode 128 references free inode 709405 clearing inode number in entry at offset 1016... entry "bdekkgy" at block 12 offset 1056 in directory inode 128 references free inode 709406 clearing inode number in entry at offset 1056... entry "ldbkrbvngmcqidcqocai" at block 12 offset 1080 in directory inode 128 references free inode 709407 clearing inode number in entry at offset 1080... Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... Metadata CRC error detected at 0x56145067c919, xfs_dir3_data block 0x39ea8/0x1000 corrupt block 10 in directory inode 128: junking block Metadata CRC error detected at 0x56145067c919, xfs_dir3_data block 0x44330/0x1000 corrupt block 11 in directory inode 128: junking block rebuilding directory inode 128 - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 556862, moving to lost+found disconnected inode 556863, moving to lost+found disconnected inode 596673, moving to lost+found [...] disconnected inode 633594, moving to lost+found disconnected inode 633595, moving to lost+found disconnected inode 633596, moving to lost+found disconnected inode 633597, moving to lost+found disconnected inode 633598, moving to lost+found disconnected inode 633599, moving to lost+found disconnected inode 709417, moving to lost+found Phase 7 - verify and correct link counts... done [root@host-083 ~]# mount /dev/mapper/luks_snap2 /mnt/snap2 mount: /mnt/snap2: wrong fs type, bad option, bad superblock on /dev/mapper/luks_snap2, missing codepage or helper program, or other error. Version-Release number of selected component (if applicable): 4.18.0-173.el8.x86_64 kernel-4.18.0-173.el8 BUILT: Fri Jan 24 06:02:03 CST 2020 lvm2-2.03.07-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 lvm2-libs-2.03.07-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 device-mapper-1.02.167-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 device-mapper-libs-1.02.167-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 device-mapper-event-1.02.167-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 device-mapper-event-libs-1.02.167-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 device-mapper-persistent-data-0.8.5-3.el8 BUILT: Wed Nov 27 07:05:21 CST 2019 cryptsetup-2.2.2-1.el8 BUILT: Mon Nov 18 06:06:33 CST 2019 cryptsetup-libs-2.2.2-1.el8 BUILT: Mon Nov 18 06:06:33 CST 2019 How reproducible: seldom
Here are the specific commands being run: # set up pvcreate /dev/sd[bc]1 vgcreate writecache_sanity /dev/sd[bc]1 pvchange --addtag slow /dev/sdc1 pvchange --addtag fast /dev/sdb1 lvcreate --wipesignatures y -L 4G -n rename_orig_A writecache_sanity @slow lvcreate -L 4G -n rename_pool_A writecache_sanity @fast lvchange -an writecache_sanity/rename_orig_A writecache_sanity/rename_pool_A lvconvert --yes --type writecache --cachesettings ' low_watermark=42 high_watermark=77 writeback_jobs=2173 autocommit_blocks=1091 autocommit_time=1354' --cachevol writecache_sanity/rename_pool_A writecache_sanity/rename_orig_A lvchange -vvvv -ay writecache_sanity/rename_orig_A echo Str0ngP455w0rd### | cryptsetup luksFormat /dev/writecache_sanity/rename_orig_A sync echo Str0ngP455w0rd### | cryptsetup luksOpen /dev/writecache_sanity/rename_orig_A luks_rename_orig_A cryptsetup -v status luks_rename_orig_A mkfs.xfs -f /dev/mapper/luks_rename_orig_A mkdir -p /mnt/rename_orig_A mount /dev/mapper/luks_rename_orig_A /mnt/rename_orig_A # i/o /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/rename_orig_A -f /tmp/checkit_origin_1 -n 1200 /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/rename_orig_A -f /tmp/checkit_origin_1 -v sync lvs --noheadings -o lv_size --units k writecache_sanity/rename_orig_A lvs --noheadings -o lv_attr writecache_sanity/rename_orig_A # snapshot lvcreate -s /dev/writecache_sanity/rename_orig_A -c 64 -n snap1 -L 4296704 lvs --noheadings -o lv_attr writecache_sanity/snap1 # origin rename lvrename writecache_sanity/rename_orig_A writecache_sanity/rename_orig_B # snap usage echo Str0ngP455w0rd### | cryptsetup luksOpen /dev/writecache_sanity/snap1 luks_snap1 cryptsetup -v status luks_snap1 mkdir -p /mnt/snap1 mount -o nouuid /dev/mapper/luks_snap1 /mnt/snap1 /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/snap1 -f /tmp/checkit_origin_1 -v /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/rename_orig_A -f /tmp/checkit_origin_2 -n 1200 sync # reencrypt the origin echo Str0ngP455w0rd### | cryptsetup reencrypt --resilience none --active-name luks_rename_orig_A cryptsetup -v status luks_rename_orig_A lvs --noheadings -o lv_attr writecache_sanity/rename_orig_B # create another snap lvcreate -s /dev/writecache_sanity/rename_orig_B -c 64 -n snap2 -L 4296704 lvs --noheadings -o lv_attr writecache_sanity/snap2 echo Str0ngP455w0rd### | cryptsetup luksOpen /dev/writecache_sanity/snap2 luks_snap2 cryptsetup -v status luks_snap2 sync mkdir -p /mnt/snap2 mount -o nouuid /dev/mapper/luks_snap2 /mnt/snap2 /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/rename_orig_A -f /tmp/checkit_origin_1 -v /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/snap2 -f /tmp/checkit_origin_1 -v
Actually, are just having snapshots of writecache origin volumes the issue here? SCENARIO - [simple_writecache_snap_merge] Create snaps of cache origin with fs data, verify data on snaps, change data on origin, merge data back to origin, verify origin data *** Cache info for this scenario *** * origin (slow): /dev/sdc1 * pool (fast): /dev/sde1 ************************************ Adding "slow" and "fast" tags to corresponding pvs Create origin (slow) volume lvcreate --wipesignatures y -L 4G -n corigin writecache_sanity @slow Create cache data and cache metadata (fast) volumes lvcreate -L 2G -n pool writecache_sanity @fast Deactivte both fast and slow volumes before conversion to write cache Create writecached volume by combining the cache pool (fast) and origin (slow) volumes lvconvert --yes --type writecache --cachesettings ' low_watermark=37 high_watermark=93 writeback_jobs=2567 autocommit_blocks=2257 autocommit_time=2912' --cachevol writecache_sanity/pool writecache_sanity/corigin Placing an xfs filesystem on origin volume Mounting origin volume Writing files to /mnt/corigin Checking files on /mnt/corigin Making a snapshot of the origin volume, mounting, and verifying original data lvcreate -s /dev/writecache_sanity/corigin -c 64 -n merge -L 500M +++ Mounting and verifying snapshot merge data +++ Checking files on /mnt/merge Can not stat byqckoqikfbgprmlmkmyvprgug: Structure needs cleaning Feb 5 14:11:49 hayes-02 qarshd[37391]: Running cmdline: mount -o nouuid /dev/writecache_sanity/merge /mnt/merge Feb 5 14:11:49 hayes-02 kernel: XFS (dm-5): Mounting V5 Filesystem Feb 5 14:11:49 hayes-02 kernel: XFS (dm-5): Starting recovery (logdev: internal) [516010.515258] XFS (dm-5): Ending recovery (logdev: internal) Feb 5 14:11:49 hayes-02 kernel: XFS (dm-5): Ending recovery (logdev: internal) Feb 5 14:11:50 hayes-02 systemd[1]: Started qarsh Per-Connection Server (10.15.80.223:41534). Feb 5 14:11:50 hayes-02 qarshd[37403]: Talking to peer ::ffff:10.15.80.223:41534 (IPv6) Feb 5 14:11:50 hayes-02 qarshd[37403]: Running cmdline: /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/merge -f /tmp/original.1199439 -v Feb 5 14:11:50 hayes-02 kernel: XFS (dm-5): Metadata corruption detected at xfs_da3_node_read_verify+0x8b/0x160 [xfs], xfs_da3_node block 0x1b430 Feb 5 14:11:50 hayes-02 kernel: XFS (dm-5): Unmount and run xfs_repair Feb 5 14:11:50 hayes-02 kernel: XFS (dm-5): First 128 bytes of corrupted metadata buffer: Feb 5 14:11:50 hayes-02 kernel: 00000000: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 5 14:11:50 hayes-02 kernel: 00000010: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 5 14:11:50 hayes-02 kernel: 00000020: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 5 14:11:50 hayes-02 kernel: 00000030: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 5 14:11:50 hayes-02 kernel: 00000040: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 5 14:11:50 hayes-02 kernel: 00000050: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 5 14:11:50 hayes-02 kernel: 00000060: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 5 14:11:50 hayes-02 kernel: 00000070: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 5 14:11:50 hayes-02 kernel: XFS (dm-5): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x1b430 len 8 error [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices corigin writecache_sanity owi-aoC--- 4.00g [pool_cvol] [corigin_wcorig] corigin_wcorig(0) [corigin_wcorig] writecache_sanity owi-aoC--- 4.00g /dev/sdc1(0) merge writecache_sanity swi-aos--- 500.00m corigin 0.45 /dev/sdb1(0) [pool_cvol] writecache_sanity Cwi-aoC--- 2.00g /dev/sde1(0) [root@hayes-02 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/writecache_sanity-corigin 4.0G 417M 3.6G 11% /mnt/corigin /dev/mapper/writecache_sanity-merge 4.0G 417M 3.6G 11% /mnt/merge [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices corigin writecache_sanity owi-aoC--- 4.00g [pool_cvol] [corigin_wcorig] corigin_wcorig(0) [corigin_wcorig] writecache_sanity owi-aoC--- 4.00g /dev/sdc1(0) merge writecache_sanity swi-aos--- 500.00m corigin 0.45 /dev/sdb1(0) [pool_cvol] writecache_sanity Cwi-aoC--- 2.00g /dev/sde1(0)
It doesn't make sense to do anything with the origin LV while it has a writecache, it's not a valid view of the data. Will need to find the place to stop that.
This is confusing because of the naming/terms being used. In your example the main/original LV is named "corigin", but the word corigin also implies a role/position of the LV which is what I thought you meant. Say we have two linear LVs "main" and "fast". In that state neither of them have the role of a cache origin. Once fast is attached to main to start caching, then we have LVs: "main", "fast" and "main_wcorig". In that state, "main_wcorig" is the cache origin. In your example, you're taking a snapshot of "main" (the top level LV which doesn't really have a term), not "main_wcorig" (the writecache origin in lvm terms). Given all that, it does conceptually make sense to take a snapshot of "main", but technically there could be issues with the way that the snapshot target is doing i/o in the kernel that's not compatible with using writecache under it. Will need to investigate that.
> In that state, "main_wcorig" is the cache origin. > In your example, you're taking a snapshot of "main" (the top level LV which > doesn't really have a term), not "main_wcorig" (the writecache origin in lvm > terms). Now that I think about it, lvm uses the word "origin" everywhere for everything so I think I'll give up trying to make sense of that word.
I did a similar sort of test but without any cachesettings (so using defaults), and didn't see a problem. When I tried using the same cachesettings you have above I did get an xfs corruption, but after removing the snapshot. Could you try without changing any settings?
> Once fast is attached to main to start caching, then we have LVs: "main", > "fast" and "main_wcorig". In that state, "main_wcorig" is the cache origin. > In your example, you're taking a snapshot of "main" (the top level LV which > doesn't really have a term), not "main_wcorig" (the writecache origin in lvm > terms). I'll try and make sure we have tests for both scenarios, however, once a writecached volume, your only choice is to snap "main". ### origin "slow" snap attempt of *existing* writecache [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices corigin writecache_sanity owi-aoC--- 4.00g [pool_cvol] [corigin_wcorig] corigin_wcorig(0) [corigin_wcorig] writecache_sanity owi-aoC--- 4.00g /dev/sdc1(0) [pool_cvol] writecache_sanity Cwi-aoC--- 2.00g /dev/sde1(0) [root@hayes-02 ~]# lvcreate -s writecache_sanity/corigin_wcorig -n snap_attempt -L 100M Snapshots of hidden volumes are not supported. ### origin "slow" snap attempt *prior* to becoming writecache [root@hayes-02 ~]# lvcreate --wipesignatures y -L 4G -n corigin writecache_sanity @slow Logical volume "corigin" created. [root@hayes-02 ~]# lvcreate -L 2G -n pool writecache_sanity @fast Logical volume "pool" created. [root@hayes-02 ~]# lvcreate -s writecache_sanity/corigin -n snap_attempt -L 100M Logical volume "snap_attempt" created. [root@hayes-02 ~]# lvchange -an writecache_sanity [root@hayes-02 ~]# lvconvert --yes --type writecache --cachevol writecache_sanity/pool writecache_sanity/corigin Logical volume writecache_sanity/corigin now has write cache. [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices corigin writecache_sanity owi---C--- 4.00g [pool_cvol] [corigin_wcorig] corigin_wcorig(0) [corigin_wcorig] writecache_sanity owi---C--- 4.00g /dev/sdc1(0) [pool_cvol] writecache_sanity Cwi---C--- 2.00g /dev/sde1(0) snap_attempt writecache_sanity swi---s--- 100.00m corigin /dev/sdb1(0)
I can reliably hit this even with default cache settings. Here's a "boiled down" set of cmds: pvchange --addtag slow /dev/sdf1 pvchange --addtag fast /dev/sde1 lvcreate --wipesignatures y -L 4G -n corigin writecache_sanity @slow lvcreate -L 2G -n pool writecache_sanity @fast lvchange -an writecache_sanity/corigin writecache_sanity/pool lvconvert --yes --type writecache --cachevol writecache_sanity/pool writecache_sanity/corigin lvchange -ay writecache_sanity/corigin mkfs.xfs -f /dev/writecache_sanity/corigin mkdir -p /mnt/corigin mount /dev/writecache_sanity/corigin /mnt/corigin /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/corigin -f /tmp/original.1217267 -n 1200 /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/corigin -f /tmp/original.1217267 -v lvcreate -s /dev/writecache_sanity/corigin -n merge -L 1G mkdir -p /mnt/merge mount -o nouuid /dev/writecache_sanity/merge /mnt/merge /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/merge -f /tmp/original.1217267 -v Feb 6 10:33:15 hayes-02 kernel: XFS (dm-5): Metadata corruption detected at xfs_da3_node_read_verify+0x8b/0x160 [xfs], xfs_da3_node block 0x2f5b0 Feb 6 10:33:15 hayes-02 kernel: XFS (dm-5): Unmount and run xfs_repair Feb 6 10:33:15 hayes-02 kernel: XFS (dm-5): First 128 bytes of corrupted metadata buffer: Feb 6 10:33:15 hayes-02 kernel: 00000000: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 6 10:33:15 hayes-02 kernel: 00000010: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 6 10:33:15 hayes-02 kernel: 00000020: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 6 10:33:15 hayes-02 kernel: 00000030: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 6 10:33:15 hayes-02 kernel: 00000040: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 6 10:33:15 hayes-02 kernel: 00000050: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 6 10:33:15 hayes-02 kernel: 00000060: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 6 10:33:15 hayes-02 kernel: 00000070: 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 5555555555555555 Feb 6 10:33:15 hayes-02 kernel: XFS (dm-5): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x2f5b0 len 8 error 117
Added a check to prevent snapshots of writecache until we figure this out: https://sourceware.org/git/?p=lvm2.git;a=commit;h=ffea7daec3d09fb4315104bb346cfa9d6db6f9e9 Mikulas, do you have any ideas about possible problems with putting a snapshot on top of writecache?
I don't know how writecache could interfere with a snapshot.
I have found a bug in dm-writecache that could cause crash or data corruption if the writecache device was suspended. Try this patch: https://www.redhat.com/archives/dm-devel/2020-February/msg00106.html
The following scratch kernel does *not* fix this snap (with inactive origin and pool conversion) issue.. [root@hayes-02 ~]# uname -ar Linux hayes-02.lab.msp.redhat.com 4.18.0-184.el8.test.x86_64 #1 SMP Wed Feb 26 14:24:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux [root@hayes-02 ~]# rpm -qa | grep kernel kernel-4.18.0-184.el8.test.x86_64 kernel-modules-4.18.0-184.el8.test.x86_64 kernel-modules-extra-4.18.0-184.el8.test.x86_64 kernel-tools-libs-4.18.0-184.el8.test.x86_64 kernel-tools-4.18.0-184.el8.test.x86_64 kernel-core-4.18.0-184.el8.test.x86_64 Create writecached volume by combining the cache pool (fast) and origin (slow) volumes lvconvert --yes --type writecache --cachesettings ' low_watermark=42 high_watermark=90 writeback_jobs=1637 autocommit_blocks=2916 autocommit_time=1414' --cachevol writecache_sanity/pool writecache_sanity/cworigin Placing an xfs filesystem on origin volume Mounting origin volume Writing files to /mnt/cworigin Checking files on /mnt/cworigin Making a snapshot of the origin volume, mounting, and verifying original data lvcreate -s /dev/writecache_sanity/cworigin -c 64 -n merge -L 500M +++ Mounting and verifying snapshot merge data +++ Checking files on /mnt/merge Can not stat bbulnrywkusukrbcdjxkbnwpkfgvbcoffjqueprqmmla: Structure needs cleaning Feb 27 11:03:35 hayes-02 qarshd[7485]: Running cmdline: /usr/tests/sts-rhel8.2/bin/checkit -w /mnt/merge -f /tmp/origv Feb 27 11:03:35 hayes-02 kernel: XFS (dm-5): Metadata corruption detected at xfs_da3_node_read_verify+0x8b/0x160 [xfs Feb 27 11:03:35 hayes-02 kernel: XFS (dm-5): Unmount and run xfs_repair Feb 27 11:03:35 hayes-02 kernel: XFS (dm-5): First 128 bytes of corrupted metadata buffer: Feb 27 11:03:35 hayes-02 kernel: 00000000: 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d MMMMMMMMMMMMMMMM Feb 27 11:03:35 hayes-02 kernel: 00000010: 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d MMMMMMMMMMMMMMMM Feb 27 11:03:35 hayes-02 kernel: 00000020: 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d MMMMMMMMMMMMMMMM Feb 27 11:03:35 hayes-02 kernel: 00000030: 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d MMMMMMMMMMMMMMMM Feb 27 11:03:35 hayes-02 kernel: 00000040: 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d MMMMMMMMMMMMMMMM Feb 27 11:03:35 hayes-02 kernel: 00000050: 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d MMMMMMMMMMMMMMMM Feb 27 11:03:35 hayes-02 kernel: 00000060: 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d MMMMMMMMMMMMMMMM Feb 27 11:03:35 hayes-02 kernel: 00000070: 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d 4d MMMMMMMMMMMMMMMM Feb 27 11:03:35 hayes-02 kernel: XFS (dm-5): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x55138 len 8 er7
These bugs still exist with the latest msnitzer kernel (4.18.0-178.el8.bz1798631.x86_64) as well.
Mikulas, it looks like the patch in comment 11 didn't resolve the issue. We're trying to decide if attaching writecache to an active LV should be disabled in lvm for 8.2, or if you think a kernel fix is still possible?
Looks like this can be avoided by providing a non default block size (512) at conversion time. I'll remove the Testblocker flag since we can work around this. # Use the default block size [root@hayes-03 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices cache_during_fs_io writecache_sanity -wi------- 2.00g /dev/sdn1(0) cworigin writecache_sanity -wi-ao---- 4.00g /dev/sdk1(0) [root@hayes-03 ~]# pvscan PV /dev/sdb1 VG writecache_sanity lvm2 [446.62 GiB / 446.62 GiB free] PV /dev/sdk1 VG writecache_sanity lvm2 [<1.82 TiB / 1.81 TiB free] PV /dev/sdl1 VG writecache_sanity lvm2 [<1.82 TiB / <1.82 TiB free] PV /dev/sdn1 VG writecache_sanity lvm2 [<1.82 TiB / <1.82 TiB free] Total: 4 [5.89 TiB] / in use: 4 [5.89 TiB] / in no VG: 0 [0 ] [root@hayes-03 ~]# dmsetup table writecache_sanity/cworigin 0 8388608 linear 8:161 2048 [root@hayes-03 ~]# xfs_info /dev/writecache_sanity/cworigin meta-data=/dev/mapper/writecache_sanity-cworigin isize=512 agcount=4, agsize=262144 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=1048576, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # Use default block_size, cause corruption [root@hayes-03 ~]# lvconvert --yes --type writecache --cachesettings ' low_watermark=26 high_watermark=69 writeback_jobs=2649 autocommit_blocks=1616 autocommit_time=2443' --cachevol writecache_sanity/cache_during_fs_io writecache_sanity/cworigin Logical volume writecache_sanity/cworigin now has write cache. Mar 5 15:57:06 hayes-03 kernel: device-mapper: writecache: I/O is not aligned, sector 4214122, size 1024, block size 6 Mar 5 15:57:06 hayes-03 kernel: XFS (dm-0): metadata I/O error in "xlog_iodone" at daddr 0x404d6a len 64 error 5 Mar 5 15:57:06 hayes-03 kernel: XFS (dm-0): xfs_do_force_shutdown(0x2) called from line 1272 of file fs/xfs/xfs_log.c2 Mar 5 15:57:06 hayes-03 kernel: XFS (dm-0): Log I/O Error Detected. Shutting down filesystem Mar 5 15:57:06 hayes-03 kernel: XFS (dm-0): Please unmount the filesystem and rectify the problem(s) [root@hayes-03 ~]# dmsetup table writecache_sanity/cworigin 0 8388608 writecache s 253:2 253:1 4096 10 high_watermark 69 low_watermark 26 writeback_jobs 2649 autocommit_blocks 1616 autocommit_time 2443 [root@hayes-03 ~]# xfs_info /dev/writecache_sanity/cworigin /mnt/cworigin: Input/output error # Start over, and provide a non default block size of 512 [root@hayes-03 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/writecache_sanity-cworigin 4.0G 416M 3.6G 11% /mnt/cworigin [root@hayes-03 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices cache_during_fs_io writecache_sanity -wi------- 2.00g /dev/sdn1(0) cworigin writecache_sanity -wi-ao---- 4.00g /dev/sdk1(0) [root@hayes-03 ~]# pvscan PV /dev/sdb1 VG writecache_sanity lvm2 [446.62 GiB / 446.62 GiB free] PV /dev/sdk1 VG writecache_sanity lvm2 [<1.82 TiB / 1.81 TiB free] PV /dev/sdl1 VG writecache_sanity lvm2 [<1.82 TiB / <1.82 TiB free] PV /dev/sdn1 VG writecache_sanity lvm2 [<1.82 TiB / <1.82 TiB free] Total: 4 [5.89 TiB] / in use: 4 [5.89 TiB] / in no VG: 0 [0 ] [root@hayes-03 ~]# dmsetup table writecache_sanity/cworigin 0 8388608 linear 8:161 2048 [root@hayes-03 ~]# xfs_info /dev/writecache_sanity/cworigin meta-data=/dev/mapper/writecache_sanity-cworigin isize=512 agcount=4, agsize=262144 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=1048576, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # Provide 512 block_size, *no* corruption [root@hayes-03 ~]# lvconvert --yes --type writecache --cachesettings 'block_size=512 low_watermark=26 high_watermark=69 writeback_jobs=2649 autocommit_blocks=1616 autocommit_time=2443' --cachevol writecache_sanity/cache_during_fs_io writecache_sanity/cworigin Logical volume writecache_sanity/cworigin now has write cache. [root@hayes-03 ~]# dmsetup table writecache_sanity/cworigin 0 8388608 writecache s 253:2 253:1 512 10 high_watermark 69 low_watermark 26 writeback_jobs 2649 autocommit_blocks 1616 autocommit_time 2443 [root@hayes-03 ~]# xfs_info /dev/writecache_sanity/cworigin meta-data=/dev/mapper/writecache_sanity-cworigin isize=512 agcount=4, agsize=262144 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=1048576, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
We only thought of this issue previously as it applies to mounting the file system. In that case xfs won't mount, the user can then detach the writecache, and go back to using the fs without the writecache. We didn't recognize that attaching the writecache to an active LV with a mounted fs would not give us the simple error path to catch and avoid this combination. The man page notes regarding this as it relates to mount copied below. The question now is how easily lvm could detect this invalid combination when the user attempts to attach writecache to an active LV so that we can refuse. Without an easy solution for that we should disable attaching writecache to active LVs. dm-writecache block size The dm-writecache block size can be 4096 bytes (the default), or 512 bytes. The default 4096 has better performance and should be used except when 512 is necessary for compatibility. The dm-writecache block size is specified with --cachesettings block_size=4096|512 when caching is started. When a file system like xfs already exists on the main LV prior to caching, and the file system is using a block size of 512, then the writecache block size should be set to 512. (The file system will likely fail to mount if writecache block size of 4096 is used in this case.) Check the xfs sector size while the fs is mounted: $ xfs_info /dev/vg/main Look for sectsz=512 or sectsz=4096 The writecache block size should be chosen to match the xfs sectsz value. It is also possible to specify a sector size of 4096 to mkfs.xfs when creating the file system. In this case the writecache block size of 4096 can be used.
> The question now is how easily lvm could detect this invalid combination when > the user attempts to attach writecache to an active LV so that we can refuse. > Without an easy solution for that we should disable attaching writecache to > active LVs. libblkid (starting with version 2.35) can detect filesystem block size for majority of filesystems. So, we should backport it to RHEL8 and use it in lvm to select the default block size when attaching writecache or integrity to an existing logical volume.
This bug has become confusing with different bugs/issues: 1. bug when taking a snapshot of a writecache LV. There was a kernel patch for this which probably works, but we disabled writecache snapshots in lvm based on the uncertainty caused by the other points. We could probably reenable writecache snapshots, but I don't know if we want to take the time to revisit that. 2. bug when adding writecache to an active LV. This can be avoided by users doing the right thing, but it's so easy to do the wrong thing, and the consequences are severe enough, that we should disable adding writecache to an active LV for 8.2. 3. bug 1630978 should be used to enable active conversions to writecache in 8.3 by using libblkid to select the correct block size. Which issues should this bug be used for?
I think this bug should be used for #1. We move this to ON_QA and verify that in 8.2 snapshots of writecache volumes are not allowed. We then create an RFE in 8.3 (or whenever) to enable them again. For #2, we should be using/commenting on bug 1808012. I wrongly added comment #18 (and then subsequently #19) to this bug when it should have been added to 1808012. If we turn off active write cache conversion, then bug 1808012 can be used to verify that it's no longer allowed, and similar to this bug, we can open a new one in 8.3 to enable it again.
Using this bug to disable snapshots of writecache LVs in 8.2.
Verifying what is mentioned in comments #22 and #23, that snapshots are *not* allowed in the latest rpms. This is *not* a verification that the corruption has been fixed. kernel-4.18.0-187.el8 BUILT: Fri Mar 6 22:09:03 CST 2020 lvm2-2.03.08-2.el8 BUILT: Mon Feb 24 11:21:38 CST 2020 lvm2-libs-2.03.08-2.el8 BUILT: Mon Feb 24 11:21:38 CST 2020 device-mapper-1.02.169-2.el8 BUILT: Mon Feb 24 11:21:38 CST 2020 device-mapper-libs-1.02.169-2.el8 BUILT: Mon Feb 24 11:21:38 CST 2020 device-mapper-event-1.02.169-2.el8 BUILT: Mon Feb 24 11:21:38 CST 2020 device-mapper-event-libs-1.02.169-2.el8 BUILT: Mon Feb 24 11:21:38 CST 2020 [root@hayes-01 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Devices cworigin writecache_sanity Cwi-aoC--- 4.00g [pool_cvol] [cworigin_wcorig] 18.81 cworigin_wcorig(0) [cworigin_wcorig] writecache_sanity owi-aoC--- 4.00g /dev/sdb1(0) [pool_cvol] writecache_sanity Cwi-aoC--- 2.00g /dev/sdd1(0) [root@hayes-01 ~]# lvcreate -s /dev/writecache_sanity/cworigin -c 64 -n merge -L 500M Snapshots of writecache are not supported.
Perhaps this bug was already fixed with this patch: https://www.redhat.com/archives/dm-devel/2020-April/msg00159.html Could someone recheck this?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1881