RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1295562 - LVM snapshot does not get deleted after merging the snapshot on LVs that could not be unmounted and system needs to be rebooted for the snapshot to get merge.
Summary: LVM snapshot does not get deleted after merging the snapshot on LVs that coul...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.2
Hardware: All
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Ondrej Kozina
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1203710 1295577 1313485 1328799
TreeView+ depends on / blocked
 
Reported: 2016-01-04 21:07 UTC by Nitin Yewale
Modified: 2021-09-03 12:51 UTC (History)
16 users (show)

Fixed In Version: lvm2-2.02.152-1.el7
Doc Type: Bug Fix
Doc Text:
Due to a bug (regression), the lvm2 was unable to remove sucessfully merged snapshot LVs during autoactivation of logical volumes. Typically this occurred on system boot when lvmetad caching daemon was enabled (which is by default). With this fix applied snapshot LVs are again correctly removed and workaround mentioned in the bugzilla is no longer needed.
Clone Of:
: 1328799 (view as bug list)
Environment:
Last Closed: 2016-11-04 04:13:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2111861 0 None None None 2016-01-04 21:31:20 UTC
Red Hat Product Errata RHBA-2016:1445 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2016-11-03 13:46:41 UTC

Description Nitin Yewale 2016-01-04 21:07:51 UTC
Description of problem:
-------------------------

LVM snapshot does not get deleted after merging the snapshot on LVs that could not be mounted and system needs to be rebooted for the snapshot to get merge.

For example `/var` LV. 

We need to restart `lvm2-monitor.service` service to remove the snapshot. Merging is ok though. 

Version-Release number of selected component (if applicable):
-------------------------

# uname -a
Linux dhcp223.example.com 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
# rpm -qa |grep lvm2
lvm2-2.02.130-5.el7.x86_64
lvm2-libs-2.02.130-5.el7.x86_64

How reproducible:
-------------------------

Every time

Steps to Reproduce:
-------------------------

# mkdir /var/testdata

# cp /etc/a*  /etc/b* /var/testdata/

# ls -l /var/testdata/
total 32
-rw-r--r--. 1 root root    16 Jan  4 13:33 adjtime
-rw-r--r--. 1 root root  1518 Jan  4 13:33 aliases
-rw-r--r--. 1 root root 12288 Jan  4 13:33 aliases.db
-rw-------. 1 root root   541 Jan  4 13:33 anacrontab
-rw-r--r--. 1 root root    55 Jan  4 13:33 asound.conf
-rw-r--r--. 1 root root  2835 Jan  4 13:34 bashrc
# 


# lvcreate --size 300M --name snap --snapshot rhel/var


Copied some data to /var/testdata


# lvs -a -o +devices
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root rhel -wi-ao----  24.41g                                                     /dev/sda2(0)   
  snap rhel swi-a-s--- 300.00m      var    0.58                                    /dev/sda3(0)   
  swap rhel -wi-ao----   1.00g                                                     /dev/sda2(8250)
  var  rhel owi-aos---   7.81g                                                     /dev/sda2(6250)



# lvconvert --merge rhel/snap
  Logical volume rhel/var contains a filesystem in use.
  Can't merge over open origin volume.
  Merging of snapshot rhel/snap will occur on next activation of rhel/var.


# lvs -a -o +devices
  LV     VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root   rhel -wi-ao----  24.41g                                                     /dev/sda2(0)   
  [snap] rhel Swi-a-s--- 300.00m      var    100.00                                  /dev/sda3(0)   
  swap   rhel -wi-ao----   1.00g                                                     /dev/sda2(8250)
  var    rhel Owi-aos---   7.81g                                                     /dev/sda2(6250)
# 

After reboot

# lvs -a -o +devices
  LV     VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root   rhel -wi-ao----  24.41g                                                     /dev/sda2(0)   
  [snap] rhel Swi-a-s--- 300.00m      var    0.00                                    /dev/sda3(0)   
  swap   rhel -wi-ao----   1.00g                                                     /dev/sda2(8250)
  var    rhel Owi-aos---   7.81g                                                     /dev/sda2(6250)


# ls -l /var/testdata/
total 32
-rw-r--r--. 1 root root    16 Jan  4 13:33 adjtime
-rw-r--r--. 1 root root  1518 Jan  4 13:33 aliases
-rw-r--r--. 1 root root 12288 Jan  4 13:33 aliases.db
-rw-------. 1 root root   541 Jan  4 13:33 anacrontab
-rw-r--r--. 1 root root    55 Jan  4 13:33 asound.conf
-rw-r--r--. 1 root root  2835 Jan  4 13:34 bashrc


# systemctl restart lvm2-monitor.service

# lvs -a -o +devices
  LV   VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root rhel -wi-ao---- 24.41g                                                     /dev/sda2(0)   
  swap rhel -wi-ao----  1.00g                                                     /dev/sda2(8250)
  var  rhel -wi-ao----  7.81g                                                     /dev/sda2(6250)



Actual results:

When we do `lvconvert --merge rhel/snap` and reboot the server, snapshot LV does not get removed and we have to restart lvm2-monitor.service to remove the same.

Expected results:

When we do `lvconvert --merge rhel/snap` and reboot the server, snapshot LV should get removed

Additional info:

Similar issue is not seen in RHEL7.1

RHEL7.1 

# rpm -qa |grep lvm2
lvm2-2.02.115-3.el7.x86_64
lvm2-libs-2.02.115-3.el7.x86_64

Linux dhcp162.example.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux


# mkdir /var/testdata

# cp /etc/c* /etc/d* /var/testdata/

# lvs -a -o +devices
  LV   VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root rhel -wi-ao---- 19.53g                                                     /dev/sda2(0)   
  swap rhel -wi-ao----  1.00g                                                     /dev/sda2(6250)
  var  rhel -wi-ao----  4.88g                                                     /dev/sda2(5000)

# ls -l /var/testdata/
total 44
-rw-------. 1 root root     0 Jan  4 15:14 cron.deny
-rw-r--r--. 1 root root   451 Jan  4 15:14 crontab
-rw-------. 1 root root     0 Jan  4 15:14 crypttab
-rw-r--r--. 1 root root  1602 Jan  4 15:14 csh.cshrc
-rw-r--r--. 1 root root   841 Jan  4 15:14 csh.login
-rw-r--r--. 1 root root 25213 Jan  4 15:14 dnsmasq.conf
-rw-r--r--. 1 root root  1285 Jan  4 15:14 dracut.conf



# lvcreate --size 300M --name snap --snapshot rhel/var
  Logical volume "snap" created.

# lvs -a -o +devices
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root rhel -wi-ao----  19.53g                                                     /dev/sda2(0)   
  snap rhel swi-a-s--- 300.00m      var    0.00                                    /dev/sda3(0)   
  swap rhel -wi-ao----   1.00g                                                     /dev/sda2(6250)
  var  rhel owi-aos---   4.88g                                                     /dev/sda2(5000)


# cp -avr /etc/e* /etc/f* /etc/g* /etc/h* /var/testdata/


# ls /var/testdata/
cron.deny  csh.cshrc     dracut.conf  ethertypes   filesystems  gcrypt  group      grub.d    gss        hosts
crontab    csh.login     e2fsck.conf  exports      firewalld    gnupg   group-     gshadow   host.conf  hosts.allow
crypttab   dnsmasq.conf  environment  favicon.png  fstab        groff   grub2.cfg  gshadow-  hostname   hosts.deny


# lvs -a -o +devices
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root rhel -wi-ao----  19.53g                                                     /dev/sda2(0)   
  snap rhel swi-a-s--- 300.00m      var    0.11                                    /dev/sda3(0)   
  swap rhel -wi-ao----   1.00g                                                     /dev/sda2(6250)
  var  rhel owi-aos---   4.88g                                                     /dev/sda2(5000)



# lvconvert --merge rhel/snap
  Logical volume rhel/var contains a filesystem in use.
  Can't merge over open origin volume.
  Merging of snapshot rhel/snap will occur on next activation of rhel/var.


# lvs -a -o +devices
  LV     VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root   rhel -wi-ao----  19.53g                                                     /dev/sda2(0)   
  [snap] rhel Swi-a-s--- 300.00m      var    100.00                                  /dev/sda3(0)   
  swap   rhel -wi-ao----   1.00g                                                     /dev/sda2(6250)
  var    rhel Owi-aos---   4.88g                                                     /dev/sda2(5000)



After reboot


# lvs -a -o +devices
  LV   VG   Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices        
  root rhel -wi-ao---- 19.53g                                                     /dev/sda2(0)   
  swap rhel -wi-ao----  1.00g                                                     /dev/sda2(6250)
  var  rhel -wi-ao----  4.88g                                                     /dev/sda2(5000)


So this looks to be regression.

Comment 1 Nitin Yewale 2016-01-04 21:22:56 UTC
Description of problem:
-------------------------

LVM snapshot does not get deleted after merging the snapshot on LVs that could not be ***unmounted*** and system needs to be rebooted for the snapshot to get merge.

For example `/var` LV. 

-------------------
s/mounted/unmounted

Comment 2 Mike Snitzer 2016-01-04 21:57:58 UTC
Could it be that the lvm2-monitor service wasn't running until after the merge completed?

Comment 3 Ondrej Kozina 2016-01-05 15:27:39 UTC
Hi Nitin,

Coul you please verify that 'vgchange -ay rhel', or 'lvchange -ay rhel/var' is enough to fix this issue after the reboot (or unmounting fs residing on top of the origin volume rhel/var)? It may be that lvm2-monitor service restart fixes it only as a side effect of actually rerunning vgchange/lvchange command internally.

Also could you try to reproduce it (the whole reproducer) with 'use_lvmpolld = 0' in /etc/lvm/lvm.conf file?

(anyway I'm going to try to reproduce it locally myself)

Comment 4 Ondrej Kozina 2016-01-05 17:22:12 UTC
Reproduced locally. lvm command fails to query status of kernel target in a case when actual snapshot merge had to be postponed until the origin LV was unmounted (or origin LV open count equals 0).

If you're not comfortable with lvm2-monitor service restart you can trigger the snapshot lv cleanup if you deactivate and reactivate again the origin lv (with lvchange -an, lvchange -ay). What's not yet clear to me is why this doesn't work after full system restart.

Using lvmpolld or not, the bug manifests with or without it.

I'll add full analysis tomorrow.

Comment 5 Corey Marthaler 2016-01-05 18:15:00 UTC
We have this very test case as apart of our snapshot regression suite, however we are masking/hacking around this problem by preforming a refresh to remove the merged snapshot.

[root@host-109 ~]# lvs -a -o +devices
  LV             VG       Attr       LSize   Pool Origin Data% Devices        
  [merge_reboot] snapper  Swi-a-s---   1.00g      origin 0.00  /dev/sde1(1024)
  origin         snapper  Owi-a-s---   4.00g                   /dev/sde1(0)   

[root@host-109 ~]# vgchange --refresh snapper

[root@host-109 ~]# lvs -a -o +devices
  LV     VG       Attr       LSize   Pool Origin Data% Devices       
  origin snapper  -wi-a-----   4.00g                   /dev/sde1(0)  

We'll have to test w/o that once this gets fixed?


3.10.0-327.el7.x86_64
lvm2-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-libs-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-cluster-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015

Comment 6 Ondrej Kozina 2016-01-06 16:27:39 UTC
Hi,

it's more complicated than I though in the beginning. First of all, I found the commit responsible for the regression:

-----
commit c26d81d6e6939906729d91fae83cd8bbdd743bb7
Author: Ondrej Kozina <okozina>      <----!!!------
Date:   Wed Apr 8 12:05:14 2015 +0200

    toollib: do not spawn polling in lv_change_activate
    
    spawning a background polling from within the lv_change_activate
    fn went to two problems:
    
    1) vgchange should not spawn any background polling until after
       the whole activation process for a VG is finished. Otherwise
       it could lead to a duplicite request for spawning background
       polling. This statement was alredy true with one exception of
       mirror up-conversion polling (fixed by this commit).
    
    2) due to current conditions in lv_change_activate lvchange cmd
       couldn't start background polling for pvmove LVs if such LV was
       about to get activated by the command in the same time.
    
    This commit however doesn't alter the lvchange cmd so that it works same as
    vgchange with regard to not to spawn duplicate background pollings per
    unique LV.
----

Unfortunately I can't simply revert it because I would reintroduce the bug I it was supposed to fix.

What went wrong: This commit breaks snapshot merge on autoactivation during device discovery on boot. (This is the reason snapshot will not get removed after reboot). The autoactivation works only with lvmetad enabled. To test this regression you can simply run following:

0) have lvmetad enabled in lvm.conf
1) create VG on single device (i.e.: sdx)
2) create origin lv
3) mount lv
4) create snapshot 'snap'
5) write some data to mounted origin lv
6) call lvconvert --merge vg/snap (you'll get the warning about deferred merge until open count == 0)
7) umount origin lv
8) deactivate whole vg
9) call pvscan --cache -aay major:minor (of sdx)

this will simulate the bug on autoactivation the customer has experienced.

expected result: origin lv in a VG is active and snapshot lv is removed after some time.

Now the harder thing. I strongly suspect it's not the only bug related to snapshot merge. For example. when I call vgchange -ay vg while the 'vg' is still active I'll receive errors in lvmpolld log about not being able to to query snapshot merge state.

And yes the lvchange --refresh vg/origin is much saner workaround for the time being. Thanks Corey!

Comment 9 Roman Bednář 2016-02-10 13:46:46 UTC
Adding QA ACK for 7.3. 

Once verified the test case might be modified to not use 'vgchange --refresh' as mentioned in Comment #5.

Comment 10 Mike McCune 2016-03-28 23:14:23 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 12 Edu Alcaniz 2016-05-03 07:39:42 UTC
It happens if you reboot as well. 

lvm2-2.02.130-5.el7.x86_64
and kernel lvm2-2.02.130-5.el7.x86_64

Comment 17 Roman Bednář 2016-08-03 12:14:33 UTC
Verified with latest rpms. 

Also tested manually with real reboot, since the scenario shown below simulates it by vgchange --sysinit and --refresh.
The fix does not allow us to remove the 'vgchange --refresh' part (mentioned above).


Automated:

SCENARIO - [reboot_before_thin_snap_merge_starts]
Attempt to merge an inuse snapshot, then "reboot" the machine before the merge can take place
Making pool volume
lvcreate  --thinpool POOL -L 4G --profile thin-performance --zero y --poolmetadatasize 4M snapper_thinp

Sanity checking pool device (POOL) metadata
examining superblock
examining devices tree
examining mapping tree
checking space map counts


Making origin volume
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n origin
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other1
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other2
lvcreate  -V 1G -T snapper_thinp/POOL -n other3
lvcreate  -V 1G -T snapper_thinp/POOL -n other4
  WARNING: Sum of all thin volume sizes (5.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (4.00 GiB)!
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other5
  WARNING: Sum of all thin volume sizes (6.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (4.00 GiB)!
Placing an xfs filesystem on origin volume
Mounting origin volume

Making snapshot of origin volume
lvcreate  -k n -s /dev/snapper_thinp/origin -n merge_reboot
Mounting snap volume

Attempt to merge snapshot snapper_thinp/merge_reboot
lvconvert --merge snapper_thinp/merge_reboot --yes
  Logical volume snapper_thinp/merge_reboot contains a filesystem in use.

umount and deactivate volume group
vgchange --sysinit -ay snapper_thinp
vgchange --refresh snapper_thinp
Check if snapshot merged successfully.
  Failed to find logical volume "snapper_thinp/merge_reboot"
OK. Snapshot is not present.
Removing thin origin and other virtual thin volumes
Removing thinpool snapper_thinp/POOL

=======================================
Manual:

Continue from point where vg is deactivated during snapshot merge.

# lvs -a
  LV              VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  ...
  [merge_reboot]  snapper_thinp Swi---t---   1.00g POOL origin                                        
  origin          snapper_thinp Owi---t---   1.00g POOL                                               
  ...
                                              
# reboot
...

# lvs -a
  LV              VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  ...                                   
  origin          snapper_thinp Vwi-a-t---   1.00g POOL        0.37   

                                
  ... 


Tested with:
3.10.0-475.el7.x86_64

lvm2-2.02.162-1.el7    BUILT: Fri Jul 29 09:26:36 CEST 2016
lvm2-libs-2.02.162-1.el7    BUILT: Fri Jul 29 09:26:36 CEST 2016
lvm2-cluster-2.02.162-1.el7    BUILT: Fri Jul 29 09:26:36 CEST 2016
device-mapper-1.02.132-1.el7    BUILT: Fri Jul 29 09:26:36 CEST 2016
device-mapper-libs-1.02.132-1.el7    BUILT: Fri Jul 29 09:26:36 CEST 2016
device-mapper-event-1.02.132-1.el7    BUILT: Fri Jul 29 09:26:36 CEST 2016
device-mapper-event-libs-1.02.132-1.el7    BUILT: Fri Jul 29 09:26:36 CEST 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 12:29:13 CEST 2016
cmirror-2.02.162-1.el7    BUILT: Fri Jul 29 09:26:36 CEST 2016

Comment 21 errata-xmlrpc 2016-11-04 04:13:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html


Note You need to log in before you can comment on or make changes to this bug.