Bug 2203265 - protection against splitting off mirror legs during up conversion appears to be missing now when sync is close to completion with fast nvme devices
Summary: protection against splitting off mirror legs during up conversion appears to ...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: lvm2
Version: 9.3
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Heinz Mauelshagen
QA Contact: cluster-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-11 16:49 UTC by Corey Marthaler
Modified: 2023-08-10 15:40 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-156992 0 None None None 2023-05-11 16:52:37 UTC

Description Corey Marthaler 2023-05-11 16:49:40 UTC
Description of problem:
This sort of operation used to be protected against, which makes me believe this is a regression.

[root@grant-03 ~]# lvcreate --yes -n fs_to_mirror -L 1.5G mirror_sanity
  Logical volume "fs_to_mirror" created.
[root@grant-03 ~]# lvs -a -o +devices
  LV           VG            Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices
  fs_to_mirror mirror_sanity -wi-a-----    1.50g                                                     /dev/nvme0n1p1(0)

[root@grant-03 ~]# lvconvert --type mirror -m 1 mirror_sanity/fs_to_mirror &
[1] 1560199
[root@grant-03 ~]#   
  Logical volume mirror_sanity/fs_to_mirror being converted.
  mirror_sanity/fs_to_mirror: Converted: 3.12%
  ABORTING: Mirror percentage check failed.

[1]+  Exit 5                  lvconvert --type mirror -m 1 mirror_sanity/fs_to_mirror

<WHILE ABOVE CMD IS RUNNING>
[root@grant-03 ~]# lvconvert --splitmirrors 1 --name new mirror_sanity/fs_to_mirror
  Logical volume mirror_sanity/fs_to_mirror converted.

[root@grant-03 ~]# lvs -a -o +devices
  LV           VG            Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices
  fs_to_mirror mirror_sanity -wi-a-----    1.50g                                                     /dev/nvme0n1p1(0)
  new          mirror_sanity -wi-a-----    1.50g                                                     /dev/nvme1n1p1(0)


May 11 18:31:15 grant-03 systemd[1]: Started LVM2 poll daemon.
May 11 18:31:16 grant-03 dmeventd[1559481]: mirror_sanity-fs_to_mirror is now in-sync.
May 11 18:31:20 grant-03 dmeventd[1559481]: No longer monitoring mirror device mirror_sanity-fs_to_mirror for events.
May 11 18:31:30 grant-03 lvmpolld[1560234]: W: #011LVPOLL: PID 1560237: STDERR: '  ABORTING: Mirror percentage check failed.'
May 11 18:31:30 grant-03 lvmpolld[1560234]: LVMPOLLD: lvm2 cmd (PID 1560237) failed (retcode: 5)
May 11 18:32:30 grant-03 systemd[1]: lvm2-lvmpolld.service: Deactivated successfully.


Version-Release number of selected component (if applicable):
kernel-5.14.0-284.11.1.el9_2    BUILT: Wed Apr 12 04:16:55 PM CEST 2023
lvm2-2.03.21-1.el9    BUILT: Fri Apr 21 02:33:33 PM CEST 2023
lvm2-libs-2.03.21-1.el9    BUILT: Fri Apr 21 02:33:33 PM CEST 2023


How reproducible:
Everytime

Comment 1 Corey Marthaler 2023-05-11 17:05:34 UTC
This appears to only happen with fast nvme devices. I haven't been able to reproduce this with SSDs. Also, if I bump the size of the mirror way up with the nvme devices, then I will see the "Unable to split mirror mirror_sanity/fs_to_mirror that is not in-sync." messages again. 

So this appears to be a timing issue where LVM thinks the sync is complete, or close to complete, but it's not really.

Comment 2 Heinz Mauelshagen 2023-05-22 12:34:01 UTC
(In reply to Corey Marthaler from comment #1)
> This appears to only happen with fast nvme devices. I haven't been able to
> reproduce this with SSDs. Also, if I bump the size of the mirror way up with
> the nvme devices, then I will see the "Unable to split mirror
> mirror_sanity/fs_to_mirror that is not in-sync." messages again. 
> 
> So this appears to be a timing issue where LVM thinks the sync is complete,
> or close to complete, but it's not really.

So it seems.

FYI: we've seen similar issues with dm-raid on (small) LVs backed by fast PVs whch have been fixed a few years ago.

No reproducer in VM here (virtio-scsi -> host thin -> host PCIe4 NVMe):

# lvconvert -y --ty mirror -m1 t/t&lvconvert -y --splitmirrors 1 --name n t/t
[1] 5089
  Logical volume t/t being converted.
  Unable to split mirror t/t that is not in-sync.
  t/t: Converted: 75.00%

# lvs
  LV   VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log      Cpy%Sync Convert
  root rhel_vm45 -wi-ao---- <17.00g                                                         
  swap rhel_vm45 -wi-ao----   2.00g                                                         
  t    t         mwi-a-m--- 128.00m                                [t_mlog] 100.00          

# lvconvert -m0 t  t/t: Converted: 100.00%
/t
  Logical volume t/t converted.
[root@vm45 ~]# lvs
  LV   VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root rhel_vm45 -wi-ao---- <17.00g                                                    
  swap rhel_vm45 -wi-ao----   2.00g                                                    
  t    t         -wi-a----- 128.00m                                                    

# lvconvert -y --ty mirror -m1 t/t&sleep 0.1;lvconvert -y --splitmirrors 1 --name n t/t
[2] 5404
  Logical volume t/t being converted.
[1]   Done                    lvconvert -y --ty mirror -m1 t/t
  Unable to split mirror t/t that is not in-sync.
  t/t: Converted: 71.88%


Note You need to log in before you can comment on or make changes to this bug.