Bug 799158 - merged snapshot volumes may never disappear
Summary: merged snapshot volumes may never disappear
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-01 22:27 UTC by Corey Marthaler
Modified: 2012-06-20 15:02 UTC (History)
12 users (show)

Fixed In Version: lvm2-2.02.95-1.el6
Doc Type: Bug Fix
Doc Text:
No technical note required.
Clone Of:
Environment:
Last Closed: 2012-06-20 15:02:22 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0962 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2012-06-19 21:12:11 UTC

Description Corey Marthaler 2012-03-01 22:27:22 UTC
Description of problem:

SCENARIO - [write_to_snap_merge]
Create snaps of origin with fs data, verify data on snaps, change data on snaps, merge data back to origin, verify origin data
Making origin volume
Placing an ext filesystem on origin volume
mke2fs 1.41.12 (17-May-2010)
Mounting origin volume

Writing files to /mnt/origin
Checking files on /mnt/origin

Making 5 snapshots of the origin volume, mounting, and verifying original data
lvcreate -s /dev/snapper/origin -c 128 -n merge1 -L 2G
+++ Mounting and verifying snapshot merge1 data +++
lvcreate -s /dev/snapper/origin -c 128 -n merge2 -L 2G
+++ Mounting and verifying snapshot merge2 data +++
lvcreate -s /dev/snapper/origin -c 128 -n merge3 -L 2G
+++ Mounting and verifying snapshot merge3 data +++
lvcreate -s /dev/snapper/origin -c 128 -n merge4 -L 2G
+++ Mounting and verifying snapshot merge4 data +++
lvcreate -s /dev/snapper/origin -c 128 -n merge5 -L 2G
+++ Mounting and verifying snapshot merge5 data +++

Writing new snapshot data and then merging back each of the snapshot volumes
+++ snapshot snapper/merge1 +++

Deactivating origin/snap volume(s)
Merge snapshot snapper/merge1 back into the origin
lvconvert --merge snapper/merge1
  Merging of volume merge1 started.
  Conversion starts after activation.

Activating origin/snap volume(s)
lvchange -ay snapper/origin

Waiting for the snap merge to complete...
snap merge vol is still around after 1.5 minutes


[root@taft-01 ~]# lvs -a -o +devices
  LV       VG        Attr     LSize  Origin Data%  Devices         
  [merge1] snapper   Swi-a-s-  2.00g origin   1.87 /dev/sdh1(1024) 
  merge2   snapper   swi-a-s-  2.00g origin   5.44 /dev/sdh1(1536) 
  merge3   snapper   swi-a-s-  2.00g origin   5.44 /dev/sdh1(2048) 
  merge4   snapper   swi-a-s-  2.00g origin   5.44 /dev/sdh1(2560) 
  merge5   snapper   swi-a-s-  2.00g origin   5.44 /dev/sdh1(3072) 
  origin   snapper   Owi-a-s-  4.00g               /dev/sdh1(0)    

[root@taft-01 ~]# lvs -a -o +devices
  LV       VG      Attr     LSize  Origin Data%  Devices         
  [merge1] snapper Swi-a-s-  2.00g origin   1.20 /dev/sdh1(1024) 
  merge2   snapper swi-a-s-  2.00g origin   6.10 /dev/sdh1(1536) 
  merge3   snapper swi-a-s-  2.00g origin   6.10 /dev/sdh1(2048) 
  merge4   snapper swi-a-s-  2.00g origin   6.10 /dev/sdh1(2560) 
  merge5   snapper swi-a-s-  2.00g origin   6.10 /dev/sdh1(3072) 
  origin   snapper Owi-a-s-  4.00g               /dev/sdh1(0)    

[root@taft-01 ~]# lvs -a -o +devices
  LV       VG      Attr     LSize  Origin Data%  Devices         
  [merge1] snapper Swi-a-s-  2.00g origin   0.00 /dev/sdh1(1024) 
  merge2   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(1536) 
  merge3   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(2048) 
  merge4   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(2560) 
  merge5   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(3072) 
  origin   snapper Owi-a-s-  4.00g               /dev/sdh1(0)    

# After 20 minutes it's still exists

[root@taft-01 ~]# lvs -a -o +devices
  LV       VG      Attr     LSize  Origin Data%  Devices         
  [merge1] snapper Swi-a-s-  2.00g origin   0.00 /dev/sdh1(1024) 
  merge2   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(1536) 
  merge3   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(2048) 
  merge4   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(2560) 
  merge5   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(3072) 
  origin   snapper Owi-a-s-  4.00g               /dev/sdh1(0)    

# I attempted to reactivate the VG and saw this error:

[root@taft-01 ~]# vgchange -an snapper
  0 logical volume(s) in volume group "snapper" now active

[root@taft-01 ~]# vgchange -ay snapper
  redirect: dup2 failed: No such device or address
  5 logical volume(s) in volume group "snapper" now active
  redirect: dup2 failed: No such device or address
  Background polling started for 1 logical volume(s) in volume group "snapper"

[root@taft-01 ~]# lvs -a -o +devices
  LV       VG      Attr     LSize  Origin Data%  Devices         
  [merge1] snapper Swi-a-s-  2.00g origin   0.00 /dev/sdh1(1024) 
  merge2   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(1536) 
  merge3   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(2048) 
  merge4   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(2560) 
  merge5   snapper swi-a-s-  2.00g origin   7.28 /dev/sdh1(3072) 
  origin   snapper Owi-a-s-  4.00g               /dev/sdh1(0)    


Version-Release number of selected component (if applicable):
2.6.32-236.el6.x86_64

lvm2-2.02.94-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
lvm2-libs-2.02.94-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
lvm2-cluster-2.02.94-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.73-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
device-mapper-libs-1.02.73-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
device-mapper-event-1.02.73-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
device-mapper-event-libs-1.02.73-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012
cmirror-2.02.94-0.61.el6    BUILT: Thu Mar  1 07:03:29 CST 2012


How reproducible:
Most of the time

Comment 1 Zdenek Kabelac 2012-03-01 23:28:52 UTC
Most probably result of wrong check for dup2  fixed by commit:
https://www.redhat.com/archives/lvm-devel/2012-March/msg00032.html

Comment 2 Mike Snitzer 2012-03-01 23:40:41 UTC
(In reply to comment #1)
> Most probably result of wrong check for dup2  fixed by commit:
> https://www.redhat.com/archives/lvm-devel/2012-March/msg00032.html

That would explain the failure when Corey took manual action.  BUT there would seem to be a regression on the automatic cleanup of the merged snapshot.  That work is achieved by the polldaemon code.

If the dup2 fix doesn't take care of the automatic cleanup of a merge I'd appreciate it if others could come to terms with what has regressed.

I'm way too far removed from lvm2 code these days to be effective at quickly tracking where we've gone wrong (nor do I have the interest in a context switch to lvm2 right now).

Comment 4 Alasdair Kergon 2012-03-02 02:04:46 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No technical note required.

Comment 5 Marian Csontos 2012-03-02 14:32:50 UTC
I have tired several times with latest nightly, and the merge is never around forever.

Though in one run the test failed as the merge until the snapshot LV is removed exceeded the 90 seconds hard-coded time-limit (Opened STS Bug 799339)

Comment 7 Nenad Peric 2012-04-05 11:55:29 UTC
Tested with 6 consecutive runs of the test. 

The merges were automatically cleaned up.



Packages installed:

lvm2-libs-2.02.95-3.el6.x86_64
lvm2-cluster-2.02.95-3.el6.x86_64
lvm2-2.02.95-3.el6.x86_64
cmirror-2.02.95-3.el6.x86_64
device-mapper-1.02.74-3.el6.x86_64
device-mapper-libs-1.02.74-3.el6.x86_64
device-mapper-event-1.02.74-3.el6.x86_64
device-mapper-event-libs-1.02.74-3.el6.x86_64

Comment 9 errata-xmlrpc 2012-06-20 15:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0962.html


Note You need to log in before you can comment on or make changes to this bug.