Bug 1719469
Summary: | cmirror gets stuck once failed non primary leg is repaired resync'ed and then written to. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> |
Component: | lvm2 | Assignee: | Heinz Mauelshagen <heinzm> |
lvm2 sub component: | Clustered Mirror / cmirrord | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, teigland, zkabelac |
Version: | 7.7 | Keywords: | Regression |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-19 21:14:46 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1711427 | ||
Attachments: |
Description
Corey Marthaler
2019-06-11 21:43:03 UTC
This existed in rhel7.6 as well. 3.10.0-957.el7.x86_64 lvm2-2.02.180-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 lvm2-libs-2.02.180-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 lvm2-cluster-2.02.180-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 lvm2-lockd-2.02.180-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 lvm2-python-boom-0.9-11.el7 BUILT: Mon Sep 10 04:49:22 CDT 2018 cmirror-2.02.180-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-1.02.149-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-libs-1.02.149-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-event-1.02.149-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-event-libs-1.02.149-8.el7 BUILT: Mon Sep 10 04:45:22 CDT 2018 device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 05:07:18 CST 2017 Waiting until all mirror|raid volumes become fully syncd... 0/1 mirror(s) are fully synced: ( 99.51% ) 0/1 mirror(s) are fully synced: ( 99.51% ) 0/1 mirror(s) are fully synced: ( 99.51% ) 0/1 mirror(s) are fully synced: ( 99.51% ) 0/1 mirror(s) are fully synced: ( 99.51% ) 0/1 mirror(s) are fully synced: ( 99.51% ) sync percent for mirror_1 hasn't changed in the past minute 0/1 mirror(s) are fully synced: ( 99.51% ) 0/1 mirror(s) are fully synced: ( 99.51% ) 0/1 mirror(s) are fully synced: ( 99.51% ) 0/1 mirror(s) are fully synced: ( 99.51% ) 0/1 mirror(s) are fully synced: ( 99.51% ) [...] This doesn't require re'enabling the failed device and bringing it back into the VG, the deadlock and gfs withdraw happens before that takes place. This issue does not exist in rhel7.5, so it appears to be a regression from 7.5 to 7.6. I ran 20 iterations of this scenario on 7.5 and saw no issues, adding "Regression" keyword. 3.10.0-862.el7.x86_64 lvm2-2.02.177-4.el7 BUILT: Fri Feb 16 06:22:31 CST 2018 lvm2-libs-2.02.177-4.el7 BUILT: Fri Feb 16 06:22:31 CST 2018 lvm2-cluster-2.02.177-4.el7 BUILT: Fri Feb 16 06:22:31 CST 2018 cmirror-2.02.177-4.el7 BUILT: Fri Feb 16 06:22:31 CST 2018 device-mapper-1.02.146-4.el7 BUILT: Fri Feb 16 06:22:31 CST 2018 device-mapper-libs-1.02.146-4.el7 BUILT: Fri Feb 16 06:22:31 CST 2018 device-mapper-event-1.02.146-4.el7 BUILT: Fri Feb 16 06:22:31 CST 2018 device-mapper-event-libs-1.02.146-4.el7 BUILT: Fri Feb 16 06:22:31 CST 2018 device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 05:07:18 CST 2017 Here's an attempt to reproduce the same kind of test. I don't see any problems below or in /var/log/messages (which contains a lot of cmirror debugging), but I'm not entirely sure what I should be looking for. I can provide lvm or clvmd or cmirror debugging messages if anyone wants to check for anything specific. Create cmirror LV ----------------- [root@null-02 ~]# lvcreate --type mirror --mirrorlog disk -m 2 -n gv2 -L1G cc Logical volume "gv2" created. [root@null-02 ~]# dmsetup status cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_1: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 3 253:4 253:5 253:6 1593/2048 1 AAA 3 clustered-disk 253:3 A cc-gv2_mlog: 0 8192 linear [root@null-02 ~]# dmsetup table cc-gv2_mimage_2: 0 2097152 linear 8:96 384 cc-gv2_mimage_1: 0 2097152 linear 8:64 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtJeYpbns2NcflWcUmtYfOqoZIBceeBRuP clustered-disk 253:3 1024 3 253:4 0 253:5 0 253:6 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:80 384 [root@null-02 ~]# lvs -a -o+devices cc LV VG Attr LSize Log Cpy%Sync Devices gv2 cc mwi-a-m--- 1.00g [gv2_mlog] 100.00 gv2_mimage_0(0),gv2_mimage_1(0),gv2_mimage_2(0) [gv2_mimage_0] cc iwi-aom--- 1.00g /dev/sdb(0) [gv2_mimage_1] cc iwi-aom--- 1.00g /dev/sde(0) [gv2_mimage_2] cc iwi-aom--- 1.00g /dev/sdg(0) [gv2_mlog] cc lwi-aom--- 4.00m /dev/sdf(0) [root@null-03 ~]# dmsetup status cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_1: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 3 253:4 253:5 253:6 2048/2048 1 AAA 3 clustered-disk 253:3 A cc-gv2_mlog: 0 8192 linear [root@null-03 ~]# dmsetup table cc-gv2_mimage_2: 0 2097152 linear 8:96 384 cc-gv2_mimage_1: 0 2097152 linear 8:64 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtJeYpbns2NcflWcUmtYfOqoZIBceeBRuP clustered-disk 253:3 1024 3 253:4 0 253:5 0 253:6 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:80 384 [root@null-03 ~]# lvs -a -o+devices cc LV VG Attr LSize Log Cpy%Sync Devices gv2 cc mwi-a-m--- 1.00g [gv2_mlog] 100.00 gv2_mimage_0(0),gv2_mimage_1(0),gv2_mimage_2(0) [gv2_mimage_0] cc iwi-aom--- 1.00g /dev/sdb(0) [gv2_mimage_1] cc iwi-aom--- 1.00g /dev/sde(0) [gv2_mimage_2] cc iwi-aom--- 1.00g /dev/sdg(0) [gv2_mlog] cc lwi-aom--- 4.00m /dev/sdf(0) Create file system ------------------ [root@null-02 ~]# mkfs.gfs2 -o align=0 -J 32M -j 2 -p lock_dlm -t test:gv2 /dev/cc/gv2 It appears to contain an existing filesystem (gfs2) /dev/cc/gv2 is a symbolic link to /dev/dm-7 This will destroy any data on /dev/dm-7 Are you sure you want to proceed? [y/n] y Discarding device contents (may take a while on large devices): Done Adding journals: Done Building resource groups: Done Creating quota file: Done Writing superblock and syncing: Done Device: /dev/cc/gv2 Block size: 4096 Device size: 1.00 GB (262144 blocks) Filesystem size: 1.00 GB (262142 blocks) Journals: 2 Journal size: 32MB Resource groups: 6 Locking protocol: "lock_dlm" Lock table: "test:gv2" UUID: a17fbd83-72cc-4224-b6d4-a8cfe8a2d41e Mount and write to file system ------------------------------ [root@null-02 ~]# mount /dev/cc/gv2 /mnt [root@null-02 ~]# cd /mnt [root@null-02 mnt]# rsync -a --exclude *.o /root/lvm.git/ lvm-02.git/ [root@null-03 ~]# mount /dev/cc/gv2 /mnt [root@null-03 ~]# cd /mnt [root@null-03 mnt]# rsync -a --exclude *.o /root/lvm.git/ lvm-03.git/ Fail device for one mirror leg ------------------------------ [root@null-02 ~]# echo offline > /sys/block/sde/device/state [root@null-03 ~]# echo offline > /sys/block/sde/device/state Write to FS to trigger IO errors and recovery ---------------------------------------------- [root@null-02 ~]# dd if=/dev/zero of=/mnt/1M.2 bs=1M ^C^C^C168+0 records in 168+0 records out 176160768 bytes (176 MB) copied, 11.7896 s, 14.9 MB/s [root@null-03 ~]# dd if=/dev/zero of=/mnt/1M.3 bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00144624 s, 725 MB/s [root@null-03 ~]# ls -l /mnt total 173428 -rw-r--r--. 1 root root 176160768 Jun 17 06:41 1M.2 -rw-r--r--. 1 root root 1048576 Jun 17 06:47 1M.3 drwxrwxr-x. 25 3890 3890 3864 Jun 17 06:33 lvm-02.git drwxrwxr-x. 25 3890 3890 3864 Jun 17 06:39 lvm-03.git Check for failed mirror leg --------------------------- [root@null-02 ~]# dmsetup table cc-gv2_mimage_2: 0 2097152 linear 8:96 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtJeYpbns2NcflWcUmtYfOqoZIBceeBRuP clustered-disk 253:3 1024 2 253:4 0 253:6 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:80 384 [root@null-02 ~]# dmsetup status cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 2 253:4 253:6 2048/2048 1 AA 3 clustered-disk 253:3 A cc-gv2_mlog: 0 8192 linear [root@null-02 ~]# lvs -a -o+devices cc Couldn't find device with uuid zkOVWF-FQOi-XodY-u2z3-iCAs-Aj3u-PekSvj. LV VG Attr LSize Log Cpy%Sync Devices gv2 cc mwi-aom--- 1.00g [gv2_mlog] 100.00 gv2_mimage_0(0),gv2_mimage_2(0) [gv2_mimage_0] cc iwi-aom--- 1.00g /dev/sdb(0) [gv2_mimage_2] cc iwi-aom--- 1.00g /dev/sdg(0) [gv2_mlog] cc lwi-aom--- 4.00m /dev/sdf(0) [root@null-03 ~]# dmsetup table cc-gv2_mimage_2: 0 2097152 linear 8:96 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtJeYpbns2NcflWcUmtYfOqoZIBceeBRuP clustered-disk 253:3 1024 2 253:4 0 253:6 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:80 384 [root@null-03 ~]# dmsetup status cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 2 253:4 253:6 2048/2048 1 AA 3 clustered-disk 253:3 A cc-gv2_mlog: 0 8192 linear [root@null-03 ~]# lvs -a -o+devices cc Couldn't find device with uuid zkOVWF-FQOi-XodY-u2z3-iCAs-Aj3u-PekSvj. LV VG Attr LSize Log Cpy%Sync Devices gv2 cc mwi-aom--- 1.00g [gv2_mlog] 100.00 gv2_mimage_0(0),gv2_mimage_2(0) [gv2_mimage_0] cc iwi-aom--- 1.00g /dev/sdb(0) [gv2_mimage_2] cc iwi-aom--- 1.00g /dev/sdg(0) [gv2_mlog] cc lwi-aom--- 4.00m /dev/sdf(0) In my previous comment, mirror_image_fault_policy was "remove". I changed it to "allocate" and things no longer work. After failing the device I watched the dm state on both machines and copied the states below. Perhaps the sequence of changes is meaningful to someone. If anyone is interested, I have clvmd debugging (with lvm debugging included) in a terminal that I can search through, but didn't manage to get it saved to a file. DM info from null-02 -------------------- [root@null-02 ~]# echo offline > /sys/block/sdg/device/state [root@null-02 ~]# dmsetup table cc-gv2_mimage_3: 0 2097152 linear 8:64 384 cc-gv2_mimage_2: 0 2097152 linear 8:80 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 3 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtYFmV17rBFakydtKSV5eRzwh4yZx8ePQo clustered-core 1024 2 253:5 0 253:8 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:32 384 cc-gv2_mimagetmp_3: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt21VQncG1QyZhJVpht3gfOAfjd8vZUqdT clustered-disk 253:3 1024 2 253:4 0 253:6 0 1 handle_errors [root@null-02 ~]# dmsetup status cc-gv2_mimage_3: 0 2097152 linear cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 2 253:5 253:8 2048/2048 1 AA 1 clustered-core cc-gv2_mlog: 0 8192 linear cc-gv2_mimagetmp_3: 0 2097152 mirror 2 253:4 253:6 2048/2048 1 AA 3 clustered-disk 253:3 A [root@null-02 ~]# dmsetup status cc-gv2_mimage_3: 0 2097152 linear cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 2 253:5 253:8 2047/2048 1 RA 1 clustered-core cc-gv2_mlog: 0 8192 linear cc-gv2_mimagetmp_3: 0 2097152 error [root@null-02 ~]# dmsetup table cc-gv2_mimage_3: 0 2097152 linear 8:64 384 cc-gv2_mimage_2: 0 2097152 linear 8:80 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 3 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtYFmV17rBFakydtKSV5eRzwh4yZx8ePQo clustered-core 1024 2 253:5 0 253:8 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:32 384 cc-gv2_mimagetmp_3: 0 2097152 error [root@null-02 ~]# dmsetup status cc-gv2_mimage_3: 0 2097152 linear cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 2 253:5 253:8 2047/2048 1 RA 1 clustered-core cc-gv2_mlog: 0 8192 linear cc-gv2_mimagetmp_3: 0 2097152 error [root@null-02 ~]# dmsetup table cc-gv2_mimage_3: 0 2097152 linear 8:64 384 cc-gv2_mimage_2: 0 2097152 linear 8:80 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 3 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtYFmV17rBFakydtKSV5eRzwh4yZx8ePQo clustered-core 1024 2 253:5 0 253:8 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:32 384 cc-gv2_mimagetmp_3: 0 2097152 error DM info from null-03 -------------------- [root@null-03 ~]# echo offline > /sys/block/sdg/device/state [root@null-03 ~]# dmsetup status cc-gv2_mimage_3: 0 2097152 linear cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 2 253:5 253:8 2048/2048 1 AA 1 clustered-core cc-gv2_mlog: 0 8192 linear cc-gv2_mimagetmp_3: 0 2097152 mirror 2 253:4 253:6 2048/2048 1 AA 3 clustered-disk 253:3 A [root@null-03 ~]# dmsetup table cc-gv2_mimage_3: 0 2097152 linear 8:64 384 cc-gv2_mimage_2: 0 2097152 linear 8:80 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt21VQncG1QyZhJVpht3gfOAfjd8vZUqdT clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:32 384 cc-gv2_mimagetmp_3: 0 2097152 error [root@null-03 ~]# dmsetup status cc-gv2_mimage_3: 0 2097152 linear cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 3 253:4 253:6 253:8 2048/2048 1 AAA userspace 1 COM_FAILURE cc-gv2_mlog: 0 8192 linear cc-gv2_mimagetmp_3: 0 2097152 error [root@null-03 ~]# dmsetup table cc-gv2_mimage_3: 0 2097152 linear 8:64 384 cc-gv2_mimage_2: 0 2097152 linear 8:80 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt21VQncG1QyZhJVpht3gfOAfjd8vZUqdT clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:32 384 cc-gv2_mimagetmp_3: 0 2097152 error [root@null-03 ~]# dmsetup status cc-gv2_mimage_3: 0 2097152 linear cc-gv2_mimage_2: 0 2097152 linear cc-gv2_mimage_0: 0 2097152 linear cc-gv2: 0 2097152 mirror 3 253:4 253:6 253:8 2048/2048 1 AAA userspace 1 COM_FAILURE cc-gv2_mlog: 0 8192 linear cc-gv2_mimagetmp_3: 0 2097152 error [root@null-03 ~]# dmsetup table cc-gv2_mimage_3: 0 2097152 linear 8:64 384 cc-gv2_mimage_2: 0 2097152 linear 8:80 384 cc-gv2_mimage_0: 0 2097152 linear 8:16 384 cc-gv2: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt21VQncG1QyZhJVpht3gfOAfjd8vZUqdT clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors cc-gv2_mlog: 0 8192 linear 8:32 384 cc-gv2_mimagetmp_3: 0 2097152 error Repeated this without gfs2 on top of the LV. I did dd to the gv3 to trigger the recovery. Before: gv3 cc mwi-a-m--- 1.00g [gv3_mlog] 100.00 gv3_mimage_0(0),gv3_mimage_1(0),gv3_mimage_2(0) [gv3_mimage_0] cc iwi-aom--- 1.00g /dev/sdb(256) [gv3_mimage_1] cc iwi-aom--- 1.00g /dev/sdf(256) [gv3_mimage_2] cc iwi-aom--- 1.00g /dev/sde(256) [gv3_mlog] cc lwi-aom--- 4.00m /dev/sdg(0) # echo offline > /sys/block/sdf/device/state After: gv3 cc mwi-a-m--- 1.00g [gv3_mlog] 100.00 gv3_mimage_0(0),gv3_mimage_2(0),gv3_mimage_3(0) [gv3_mimage_0] cc iwi-aom--- 1.00g /dev/sdb(256) [gv3_mimage_2] cc iwi-aom--- 1.00g /dev/sde(256) [gv3_mimage_3] cc iwi-aom--- 1.00g /dev/sdc(1) [gv3_mlog] cc lwi-aom--- 4.00m /dev/sdg(0) After: [root@null-02 ~]# dmsetup status cc-gv3_mimage_3: 0 2097152 linear cc-gv3_mimage_2: 0 2097152 linear cc-gv3_mimage_0: 0 2097152 linear cc-gv3: 0 2097152 mirror 3 253:4 253:6 253:8 2048/2048 1 AAA 3 clustered-disk 253:3 A cc-gv3_mlog: 0 8192 linear [root@null-02 ~]# dmsetup table cc-gv3_mimage_3: 0 2097152 linear 8:32 8576 cc-gv3_mimage_2: 0 2097152 linear 8:64 2097536 cc-gv3_mimage_0: 0 2097152 linear 8:16 2097536 cc-gv3: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDturrDY8xC8hBXeD7Obr1m3BywPXVmcCk4 clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors cc-gv3_mlog: 0 8192 linear 8:96 384 [root@null-03 ~]# dmsetup status cc-gv3_mimage_3: 0 2097152 linear cc-gv3_mimage_2: 0 2097152 linear cc-gv3_mimage_0: 0 2097152 linear cc-gv3: 0 2097152 mirror 3 253:4 253:6 253:8 2048/2048 1 AAA 3 clustered-disk 253:3 A cc-gv3_mlog: 0 8192 linear [root@null-03 ~]# dmsetup table cc-gv3_mimage_3: 0 2097152 linear 8:32 8576 cc-gv3_mimage_2: 0 2097152 linear 8:64 2097536 cc-gv3_mimage_0: 0 2097152 linear 8:16 2097536 cc-gv3: 0 2097152 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDturrDY8xC8hBXeD7Obr1m3BywPXVmcCk4 clustered-disk 253:3 1024 3 253:4 0 253:6 0 253:8 0 1 handle_errors cc-gv3_mlog: 0 8192 linear 8:96 384 Dave, 'dmsetup ls -tree' would be helpful to see the temporary deeper device trees which come with 'mirror' layout changes (e.g. during repair). Your examples in c#8 and c#9 show those deeper trees, i.e. clustered-core on top of clustered-disk and the new replacement leg. It can be deducted that mimage_3 is the new leg because the lvm2 'mirror' runtime does not do in-place replacements but rather increments suffixes addigng new legs. 'dmsetup ls --tree' would show those (interim) dependencies better. In c#9, the primary leg of the temporary clustered-core log 'mirror' takes read error(s) which it shouldn't, because that leg is composed of mimage_0 and mimage_2 forming a presumably intact clustered-disk log 'mirror'. So why does it through read error(s)? Solve the read error and you will solve the hang. When the read from the primary fails, the sync for that region fails. The sync process will move on and do the rest of the mirror, but it will never go back and redo the initial region - leaving the mirror in < 100% sync forever. Repeated the steps in comment 8, but I did not write to the file system after failing the device (I noticed that recovery had already been triggered.) In this case things seemed to work. Until now I hadn't noticed that dmeventd is leaving some messages about the event. Here's what I found this time: Jun 18 03:10:22 null-03 dmeventd[18342]: Secondary mirror device 253:5 has failed (D). Jun 18 03:10:22 null-03 dmeventd[18342]: Device failure in cc-gv4. Jun 18 03:10:22 null-03 dmeventd[18342]: WARNING: Not using lvmetad because of repair. Jun 18 03:10:22 null-03 dmeventd[18342]: Couldn't find device with uuid 9DqAOn-76Yc-asP1-cMyK-der4-pI9G-ZVyQGr. Jun 18 03:10:22 null-03 dmeventd[18342]: WARNING: Disabling lvmetad cache for repair command. Jun 18 03:10:27 null-03 dmeventd[18342]: Mirror status: 1 of 3 images failed. Jun 18 03:10:27 null-03 dmeventd[18342]: WARNING: Trying to up-convert to 3 images, 1 logs. Jun 18 03:10:27 null-03 dmeventd[18342]: /run/lvm/lvmpolld.socket: connect failed: No such file or directory Jun 18 03:10:27 null-03 dmeventd[18342]: WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld. Jun 18 03:10:27 null-03 dmeventd[18342]: WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state. Jun 18 03:10:27 null-03 dmeventd[18342]: cc/gv4: Converted: 1.17% Jun 18 03:10:42 null-03 dmeventd[18342]: cc/gv4: Converted: 69.92% Jun 18 03:10:57 null-03 dmeventd[18342]: cc/gv4: Converted: 100.00% Jun 18 03:11:01 null-03 dmeventd[18342]: Logical volume cc/gv4 converted. Jun 18 03:11:01 null-03 dmeventd[18342]: Repair of mirrored device cc-gv4 finished successfully. Jun 18 03:11:01 null-03 dmeventd[18342]: cc-gv4 is now in-sync. Looking back at these messages from previous tests, I do not see the lvmpolld warnnings, which means, I suspect, that previous instances were using lvmpolld. This could be a clue that using lvmpolld is a problem. Created attachment 1581749 [details]
clvmd debugging from null-02
for test in previous comment
Created attachment 1581750 [details]
clvmd debugging from null-03
I've switched over to the simpler test that Jon showed us, of just adding legs to an existing mirror, and looking for dm error messages in the kernel logs. This worked as a much simpler way of reproducing (assuming it's the same root issue .) [root@null-03 ~]# lvcreate --type mirror -n lv1 -L2G -m1 cc Logical volume "lv1" created. [root@null-03 ~]# lvconvert -m2 cc/lv1 Logical volume cc/lv1 being converted. /run/lvm/lvmpolld.socket: connect failed: No such file or directory WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld. WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state. cc/lv1: Converted: 0.59% cc/lv1: Converted: 25.00% cc/lv1: Converted: 49.22% cc/lv1: Converted: 73.24% cc/lv1: Converted: 97.66% cc/lv1: Converted: 100.00% Logical volume cc/lv1 converted. Jun 18 04:55:16 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv1 for events. Jun 18 04:55:16 null-03 kernel: [66824.862710] device-mapper: raid1: Unable to read primary mirror during recovery Jun 18 04:55:16 null-03 kernel: [66824.865747] device-mapper: raid1: Primary mirror (253:12) failed while out-of-sync: Reads may fail. Jun 18 04:55:16 null-03 kernel: [66824.870731] device-mapper: raid1: Primary mirror (253:12) failed while out-of-sync: Reads may fail. Jun 18 04:55:16 null-03 kernel: [66824.880482] device-mapper: raid1: Read failure on mirror device 253:12. Trying alternative device. Jun 18 04:55:16 null-03 kernel: [66824.880586] device-mapper: raid1: Mirror read failed. Jun 18 04:55:16 null-03 kernel: [66824.905576] Buffer I/O error on dev dm-11, logical block 524272, async page read Jun 18 04:55:19 null-03 dmeventd[18342]: Monitoring mirror device cc-lv1_mimagetmp_2 for events. Jun 18 04:55:19 null-03 dmeventd[18342]: Monitoring mirror device cc-lv1 for events. Jun 18 04:55:19 null-03 kernel: [66827.964811] device-mapper: raid1: Mirror read failed. Jun 18 04:55:19 null-03 kernel: [66827.970632] Buffer I/O error on dev dm-11, logical block 524272, async page read Jun 18 04:56:20 null-03 dmeventd[18342]: cc-lv1_mimagetmp_2 is now in-sync. Jun 18 04:56:21 null-03 dmeventd[18342]: cc-lv1 is now in-sync. Jun 18 04:56:39 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv1_mimagetmp_2 for events. Jun 18 04:56:39 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv1 for events. Jun 18 04:56:39 null-03 cmirrord[18281]: Log entry already exists: LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt9ejBHHU9CxFwcyRqzhdZQ0f9oaAOelKI Jun 18 04:56:39 null-03 cmirrord[18281]: clog_resume: Failed to create cluster CPG Jun 18 04:56:39 null-03 kernel: [66907.129116] device-mapper: raid1: log presuspend failed Jun 18 04:56:39 null-03 kernel: [66907.135853] device-mapper: raid1: log presuspend failed Jun 18 04:56:39 null-03 kernel: [66907.144906] device-mapper: raid1: log postsuspend failed Jun 18 04:56:39 null-03 dmeventd[18342]: Monitoring mirror device cc-lv1 for events. Jun 18 04:56:39 null-03 dmeventd[18342]: cc-lv1 is now in-sync. Jun 18 04:56:39 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv1 for events. Jun 18 04:56:42 null-03 dmeventd[18342]: Monitoring mirror device cc-lv1 for events. Jun 18 04:48:58 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1 for events. Jun 18 04:49:25 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv1 for events. Jun 18 04:49:25 null-02 kernel: [66516.889274] device-mapper: raid1: Unable to read primary mirror during recovery Jun 18 04:49:25 null-02 kernel: [66516.897430] device-mapper: raid1: Primary mirror (253:12) failed while out-of-sync: Reads may fail. Jun 18 04:49:25 null-02 kernel: [66516.907741] device-mapper: raid1: Mirror read failed. Jun 18 04:49:28 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1_mimagetmp_2 for events. Jun 18 04:49:28 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1 for events. Jun 18 04:49:28 null-02 kernel: [66520.035985] device-mapper: raid1: Mirror read failed. Jun 18 04:49:28 null-02 kernel: [66520.041901] Buffer I/O error on dev dm-11, logical block 524272, async page read Jun 18 04:49:28 null-02 kernel: [66520.061904] device-mapper: raid1: Mirror read failed. Jun 18 04:49:28 null-02 kernel: [66520.068474] device-mapper: raid1: Mirror read failed. Jun 18 04:49:28 null-02 kernel: [66520.074374] Buffer I/O error on dev dm-11, logical block 524272, async page read Jun 18 04:50:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv1_mimagetmp_2 for events. Jun 18 04:50:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv1 for events. Jun 18 04:50:47 null-02 kernel: [66599.133075] device-mapper: raid1: log presuspend failed Jun 18 04:50:47 null-02 kernel: [66599.151707] device-mapper: raid1: log presuspend failed Jun 18 04:50:47 null-02 kernel: [66599.157967] device-mapper: raid1: log postsuspend failed Jun 18 04:50:47 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1 for events. Jun 18 04:50:48 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv1 for events. Jun 18 04:50:51 null-02 dmeventd[18432]: Monitoring mirror device cc-lv1 for events. I have clvmd and cmirror debug logs that I will attach next. Created attachment 1581780 [details] clvmd debug output from null-02 related to comment 16 Created attachment 1581781 [details] clvmd debug output from null-03 related to comment 16 Created attachment 1581783 [details] cmirrord debug output from null-02 related to comment 16 Repeated comment 16 using lvm 2.02.176 so that working/non-working debug logs can be compared. [root@null-02 ~]# lvcreate --type mirror -n lv3 -L2G -m1 cc Logical volume "lv3" created. [root@null-02 ~]# lvs -a -o+devices cc LV VG Attr LSize Log Cpy%Sync Devices lv3 cc mwi-a-m--- 2.00g [lv3_mlog] 100.00 lv3_mimage_0(0),lv3_mimage_1(0) [lv3_mimage_0] cc iwi-aom--- 2.00g /dev/sdb(0) [lv3_mimage_1] cc iwi-aom--- 2.00g /dev/sdg(0) [lv3_mlog] cc lwi-aom--- 4.00m /dev/sde(0) [root@null-02 ~]# lvconvert -m2 cc/lv3 Logical volume cc/lv3 being converted. /run/lvm/lvmpolld.socket: connect failed: No such file or directory WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld. WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state. cc/lv3: Converted: 1.17% cc/lv3: Converted: 35.94% cc/lv3: Converted: 70.12% cc/lv3: Converted: 100.00% Logical volume cc/lv3 converted. [root@null-02 ~]# lvs -a -o+devices cc LV VG Attr LSize Log Cpy%Sync Devices lv3 cc mwi-a-m--- 2.00g [lv3_mlog] 100.00 lv3_mimage_0(0),lv3_mimage_1(0),lv3_mimage_2(0) [lv3_mimage_0] cc iwi-aom--- 2.00g /dev/sdb(0) [lv3_mimage_1] cc iwi-aom--- 2.00g /dev/sdg(0) [lv3_mimage_2] cc iwi-aom--- 2.00g /dev/sdf(0) [lv3_mlog] cc lwi-aom--- 4.00m /dev/sde(0) Jun 18 05:40:40 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3 for events. Jun 18 05:42:01 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv3 for events. Jun 18 05:42:01 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3_mimagetmp_2 for events. Jun 18 05:42:01 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3 for events. Jun 18 05:42:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv3_mimagetmp_2 for events. Jun 18 05:42:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv3 for events. Jun 18 05:42:47 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3 for events. Jun 18 05:42:47 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv3 for events. Jun 18 05:42:50 null-02 dmeventd[18432]: Monitoring mirror device cc-lv3 for events. Jun 18 05:46:31 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3 for events. Jun 18 05:47:33 null-03 dmeventd[18342]: cc-lv3 is now in-sync. Jun 18 05:47:52 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv3 for events. Jun 18 05:47:52 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3_mimagetmp_2 for events. Jun 18 05:47:53 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3 for events. Jun 18 05:48:36 null-03 dmeventd[18342]: cc-lv3 is now in-sync. Jun 18 05:48:38 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv3_mimagetmp_2 for events. Jun 18 05:48:38 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv3 for events. Jun 18 05:48:38 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3 for events. Jun 18 05:48:38 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv3 for events. Jun 18 05:48:41 null-03 dmeventd[18342]: Monitoring mirror device cc-lv3 for events. Attaching clvmd debug logs next. Created attachment 1581799 [details] clvmd debug output from null-02 related to comment 20 Created attachment 1581800 [details] clvmd debug output from null-03 related to comment 20 Looks like I uploaded the wrong files in comments 17 and 18, I'll have to rerun the test and collect the output again. Repeating the test in comment 16 to recollect clvmd debug logs. It noticed that the kernel dm errors appear immediately after running the lvconvert -m2, so something is going wrong right at the start of the command. [root@null-02 ~]# lvcreate --type mirror -n lv4 -L2G -m1 cc Logical volume "lv4" created. [root@null-02 ~]# lvs -a -o+devices cc LV VG Attr LSize Log Cpy%Sync Devices lv4 cc mwi-a-m--- 2.00g [lv4_mlog] 100.00 lv4_mimage_0(0),lv4_mimage_1(0) [lv4_mimage_0] cc iwi-aom--- 2.00g /dev/sdb(0) [lv4_mimage_1] cc iwi-aom--- 2.00g /dev/sdg(0) [lv4_mlog] cc lwi-aom--- 4.00m /dev/sde(0) [root@null-02 ~]# dmsetup ls --tree cc-lv4 (253:6) ├─cc-lv4_mimage_1 (253:5) │ └─ (8:96) ├─cc-lv4_mimage_0 (253:4) │ └─ (8:16) └─cc-lv4_mlog (253:3) └─ (8:64) [root@null-02 ~]# dmsetup table cc-lv4_mimage_1: 0 4194304 linear 8:96 384 cc-lv4: 0 4194304 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt1oI7Pq0KGDfvmvpS09DS7nop9GbSKP01 clustered-disk 253:3 1024 2 253:4 0 253:5 0 1 handle_errors cc-lv4_mimage_0: 0 4194304 linear 8:16 384 cc-lv4_mlog: 0 8192 linear 8:64 384 [root@null-02 ~]# lvconvert -m2 cc/lv4 Logical volume cc/lv4 being converted. /run/lvm/lvmpolld.socket: connect failed: No such file or directory WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld. WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state. cc/lv4: Converted: 0.59% cc/lv4: Converted: 34.96% cc/lv4: Converted: 68.75% cc/lv4: Converted: 100.00% Logical volume cc/lv4 converted. [root@null-02 ~]# dmsetup status rhel_null--02-swap: 0 16416768 linear rhel_null--02-root: 0 104857600 linear cc-lv4_mimage_2: 0 4194304 linear cc-lv4_mimage_1: 0 4194304 linear cc-lv4: 0 4194304 mirror 3 253:4 253:5 253:8 4096/4096 1 AAA 3 clustered-disk 253:3 A cc-lv4_mimage_0: 0 4194304 linear cc-lv4_mlog: 0 8192 linear rhel_null--02-home: 0 854466560 linear [root@null-02 ~]# dmsetup table rhel_null--02-swap: 0 16416768 linear 8:2 2048 rhel_null--02-root: 0 104857600 linear 8:2 870885376 cc-lv4_mimage_2: 0 4194304 linear 8:80 384 cc-lv4_mimage_1: 0 4194304 linear 8:96 384 cc-lv4: 0 4194304 mirror userspace 4 LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt1oI7Pq0KGDfvmvpS09DS7nop9GbSKP01 clustered-disk 253:3 1024 3 253:4 0 253:5 0 253:8 0 1 handle_errors cc-lv4_mimage_0: 0 4194304 linear 8:16 384 cc-lv4_mlog: 0 8192 linear 8:64 384 rhel_null--02-home: 0 854466560 linear 8:2 16418816 Jun 18 06:06:23 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4 for events. Jun 18 06:08:08 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv4 for events. Jun 18 06:08:08 null-02 kernel: [71239.777158] device-mapper: raid1: Unable to read primary mirror during recovery Jun 18 06:08:08 null-02 kernel: [71239.785568] device-mapper: raid1: Primary mirror (253:7) failed while out-of-sync: Reads may fail. Jun 18 06:08:08 null-02 kernel: [71239.797233] device-mapper: raid1: Mirror read failed. Jun 18 06:08:08 null-02 kernel: [71239.803463] device-mapper: raid1: Mirror read failed. Jun 18 06:08:08 null-02 kernel: [71239.809550] Buffer I/O error on dev dm-6, logical block 524272, async page read Jun 18 06:08:11 null-02 kernel: [71242.905454] device-mapper: raid1: Mirror read failed. Jun 18 06:08:11 null-02 kernel: [71242.912274] device-mapper: raid1: Mirror read failed. Jun 18 06:08:11 null-02 kernel: [71242.918414] Buffer I/O error on dev dm-6, logical block 524272, async page read Jun 18 06:08:11 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4_mimagetmp_2 for events. Jun 18 06:08:11 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4 for events. Jun 18 06:08:57 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv4_mimagetmp_2 for events. Jun 18 06:08:57 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv4 for events. Jun 18 06:08:57 null-02 kernel: [71288.625293] device-mapper: raid1: log presuspend failed Jun 18 06:08:57 null-02 kernel: [71288.632268] device-mapper: raid1: log presuspend failed Jun 18 06:08:57 null-02 kernel: [71288.638662] device-mapper: raid1: log postsuspend failed Jun 18 06:08:57 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4 for events. Jun 18 06:08:58 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv4 for events. Jun 18 06:09:01 null-02 dmeventd[18432]: Monitoring mirror device cc-lv4 for events. Jun 18 06:12:14 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4 for events. Jun 18 06:13:16 null-03 dmeventd[18342]: cc-lv4 is now in-sync. Jun 18 06:14:00 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv4 for events. Jun 18 06:14:00 null-03 kernel: [71548.124980] device-mapper: raid1: Unable to read primary mirror during recovery Jun 18 06:14:00 null-03 kernel: [71548.128941] device-mapper: raid1: Primary mirror (253:7) failed while out-of-sync: Reads may fail. Jun 18 06:14:00 null-03 kernel: [71548.132884] device-mapper: raid1: Primary mirror (253:7) failed while out-of-sync: Reads may fail. Jun 18 06:14:00 null-03 kernel: [71548.142573] device-mapper: raid1: Read failure on mirror device 253:7. Trying alternative device. Jun 18 06:14:00 null-03 kernel: [71548.162409] device-mapper: raid1: Mirror read failed. Jun 18 06:14:00 null-03 kernel: [71548.168098] Buffer I/O error on dev dm-6, logical block 524272, async page read Jun 18 06:14:03 null-03 kernel: [71551.224224] device-mapper: raid1: Mirror read failed. Jun 18 06:14:03 null-03 kernel: [71551.230041] Buffer I/O error on dev dm-6, logical block 524272, async page read Jun 18 06:14:03 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4_mimagetmp_2 for events. Jun 18 06:14:03 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4 for events. Jun 18 06:14:47 null-03 dmeventd[18342]: cc-lv4 is now in-sync. Jun 18 06:14:48 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv4_mimagetmp_2 for events. Jun 18 06:14:48 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv4 for events. Jun 18 06:14:48 null-03 cmirrord[18281]: Log entry already exists: LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDt1oI7Pq0KGDfvmvpS09DS7nop9GbSKP01 Jun 18 06:14:48 null-03 cmirrord[18281]: clog_resume: Failed to create cluster CPG Jun 18 06:14:48 null-03 kernel: [71596.993354] device-mapper: raid1: log presuspend failed Jun 18 06:14:48 null-03 kernel: [71597.007352] device-mapper: raid1: log presuspend failed Jun 18 06:14:49 null-03 kernel: [71597.017959] device-mapper: raid1: log postsuspend failed Jun 18 06:14:49 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4 for events. Jun 18 06:14:49 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv4 for events. Jun 18 06:14:52 null-03 dmeventd[18342]: Monitoring mirror device cc-lv4 for events. Created attachment 1581804 [details] clvmd debug output from null-02 related to comment 24 (using lv4) Created attachment 1581805 [details] clvmd debug output from null-03 related to comment 24 (using lv4) We can now compare clvmd debug from comment 21 (old and working) vs clvmd debug from comment 25 (current unworking). Others will have better insight into the differences than I do. I'm just looking for differences without much understanding of what's happening, and notice what seems to be a significant difference here: old: #activate/activate.c:1986 Monitoring cc/lv3 #activate/activate.c:1809 Monitored LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtUkZ6bljf2derSOBNfYrzL3XfXd52KIF9 for events new: #activate/activate.c:2015 Monitoring cc/lv4 with libdevmapper-event-lvm2mirror.so. #activate/activate.c:1824 Monitored LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtIABo3wqHQ4X2IlVFykgFEh466n32unJc for ev #activate/dev_manager.c:3230 Creating ACTIVATE origin-only tree for cc/lv4. (followed by a significant amount of activation code being run that does not appear in the old code) then later the same kind of thing, old: #activate/activate.c:1986 Monitoring cc/lv3_mimagetmp_2 #activate/activate.c:1809 Monitored LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtBZzuiGmdDcdhmygdrvdq3wnnW99x9t42 for events new: #activate/activate.c:2015 Monitoring cc/lv4_mimagetmp_2 with libdevmapper-event-lvm2mirror.so. #activate/activate.c:1824 Monitored LVM-oDBxuMigltJwjQtrHUMLSoz0XRbYMmDtkWX68H0Li66mMWTc39XtKRGRzwgADs2d for eve #activate/dev_manager.c:3230 Creating ACTIVATE origin-only tree for cc/lv4_mimagetmp_2. (followed by a significant amount of activation code being run that does not appear in the old code) Each time that Monitoring occurs, the old code seems to do nothing, but the new code executes some significant activation-related code. Following instructions from Zdenek, I reverted commit a8921be641afe865c177e11b8859f4b937f76995 and applied the patch from https://bugzilla.redhat.com/show_bug.cgi?id=1628529#c54 The problem appears to be gone. [root@null-02 ~]# lvcreate --type mirror -n lv5 -L2G -m1 cc Logical volume "lv5" created. [root@null-02 ~]# lvs -a -o+devices cc LV VG Attr LSize Log Cpy%Sync Devices lv5 cc mwi-a-m--- 2.00g [lv5_mlog] 100.00 lv5_mimage_0(0),lv5_mimage_1(0) [lv5_mimage_0] cc iwi-aom--- 2.00g /dev/sdb(0) [lv5_mimage_1] cc iwi-aom--- 2.00g /dev/sdg(0) [lv5_mlog] cc lwi-aom--- 4.00m /dev/sde(0) [root@null-02 ~]# lvconvert -m2 cc/lv5 Logical volume cc/lv5 being converted. /run/lvm/lvmpolld.socket: connect failed: No such file or directory WARNING: Failed to connect to lvmpolld. Proceeding with polling without using lvmpolld. WARNING: Check global/use_lvmpolld in lvm.conf or the lvmpolld daemon state. cc/lv5: Converted: 1.56% cc/lv5: Converted: 35.74% cc/lv5: Converted: 69.92% cc/lv5: Converted: 100.00% Logical volume cc/lv5 converted. [root@null-02 ~]# lvs -a -o+devices cc LV VG Attr LSize Log Cpy%Sync Devices lv5 cc mwi-a-m--- 2.00g [lv5_mlog] 100.00 lv5_mimage_0(0),lv5_mimage_1(0),lv5_mimage_2(0) [lv5_mimage_0] cc iwi-aom--- 2.00g /dev/sdb(0) [lv5_mimage_1] cc iwi-aom--- 2.00g /dev/sdg(0) [lv5_mimage_2] cc iwi-aom--- 2.00g /dev/sdf(0) [lv5_mlog] cc lwi-aom--- 4.00m /dev/sde(0) Jun 18 08:13:25 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5 for events. Jun 18 08:16:16 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv5 for events. Jun 18 08:16:16 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5_mimagetmp_2 for events. Jun 18 08:16:16 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5 for events. Jun 18 08:17:03 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv5_mimagetmp_2 for events. Jun 18 08:17:03 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv5 for events. Jun 18 08:17:03 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5 for events. Jun 18 08:17:03 null-02 dmeventd[18432]: No longer monitoring mirror device cc-lv5 for events. Jun 18 08:17:06 null-02 dmeventd[18432]: Monitoring mirror device cc-lv5 for events. Jun 18 08:19:16 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5 for events. Jun 18 08:20:17 null-03 dmeventd[18342]: cc-lv5 is now in-sync. Jun 18 08:22:08 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv5 for events. Jun 18 08:22:08 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5_mimagetmp_2 for events. Jun 18 08:22:08 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5 for events. Jun 18 08:22:52 null-03 dmeventd[18342]: cc-lv5 is now in-sync. Jun 18 08:22:54 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv5_mimagetmp_2 for events. Jun 18 08:22:54 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv5 for events. Jun 18 08:22:54 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5 for events. Jun 18 08:22:54 null-03 dmeventd[18342]: No longer monitoring mirror device cc-lv5 for events. Jun 18 08:22:57 null-03 dmeventd[18342]: Monitoring mirror device cc-lv5 for events. Created attachment 1581876 [details] clvmd debug output from null-02 related to comment 28 (lv5) Created attachment 1581877 [details] clvmd debug output from null-03 related to comment 28 (lv5) Pushed 'small' diff patch upstream: https://www.redhat.com/archives/lvm-devel/2019-June/msg00069.html I've tested this on my systems with the latest stable branch (top commit f3a87a2c2e339328ea4d7448f43d5317806a6b24) and it works again. No more 'primary read failure's. The fix in place for bug 1711427 addresses this bug. Things are getting stuck because of the 'primary read failure'. This isssue is distilled a bit better in bug 1711427. We will fix it there. *** This bug has been marked as a duplicate of bug 1711427 *** |