Bug 821372

Summary: not IO flush when creating thinp snapshot
Product: [Fedora] Fedora Reporter: Xiaowei Li <xiaoli>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: agk, bmarzins, bmr, dwysocha, heinzm, jonathan, lvm-team, msnitzer, prajnoha, prockai, qcai, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-26 13:47:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
snap.log
none
snap2.log
none
test.sh
none
thinp-snap.log
none
traditional-snap.log none

Description Xiaowei Li 2012-05-14 09:20:07 UTC
Description of problem:
after creating snapshot of thin LV and mounting it, don't see the previous created files in the thin LV.
This is caused by no IO flush when creating thinp snapshot.

"lvcreate -s vg/thin_lv" should invoke 'dmsetup suspend' to flush IO first then create the snapshot.

This issue can be worked around by invoking sync manually before creating snapshot.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce: 
#can be reproduced by the following shell script
vg=tsvg
lv_mnt=/tmp/lv1
snap1_mnt=/tmp/snap1
mkdir -p $lv_mnt
mkdir -p $snap1_mnt

lvcreate -V200m -l120 -T $vg/pool -n lv1
mkfs.ext4 /dev/$vg/lv1
mount /dev/$vg/lv1 $lv_mnt
dd if=/dev/urandom of=$lv_mnt/lv1 bs=1M count=4

lvcreate -s $vg/lv1 -n snap1
mount /dev/$vg/snap1 $snap1_mnt
ls $lv_mnt
ls $snap1_mnt
~                  
  
Actual results:
cannot see the file lv1 on the $snap1_mnt

Expected results:
should see the file lv1 on the $snap1_mnt

Additional info:

Comment 1 Zdenek Kabelac 2012-05-14 09:45:58 UTC
Can you attach lvcreate -s -vvvv trace?

On my system  lvcreate -vvvv  show it's running:

"Suspending vg-lvorigin (253:4) with device flush"

before taking a snapshot.

And I'm also see consistent filesystem.

Comment 2 Xiaowei Li 2012-05-14 10:32:37 UTC
Created attachment 584318 [details]
snap.log

Comment 3 Xiaowei Li 2012-05-14 10:41:34 UTC
(In reply to comment #1)
It's hard to reproduce it manually, but can be easily to reproduce it using the script.

attaching the script 'test.sh' and log 'snap2.log'

Comment 4 Xiaowei Li 2012-05-14 10:42:14 UTC
Created attachment 584323 [details]
snap2.log

Comment 5 Xiaowei Li 2012-05-14 10:42:46 UTC
Created attachment 584324 [details]
test.sh

Comment 6 Zdenek Kabelac 2012-05-14 12:15:54 UTC
Hmm - so from your   log  - the suspend with flush is being issued - but it seems like you do not have  dmeventd running/available ?

Could you provide   'lvs -a'  states before & after fail ?

It might be your pool is being exhausted and in case you are not running dmeventd, the pool should stay locked, thus any writer would stay delayed.

There is one extra case - if your cache is big enough to allow to eat all data from command - but pool is not big enough to provision all blocks.
In this case 'dd' exits normally (unless you use DIRECT) and some data are left unwritten in the queue. If you start to make snapshot in this case - you are in troubles with current thinp implementation.

lvs -a   should show fullness of your pool.

But I admit it doesn't explain why the external dmsetup suspend helps in this case. Since if the pool would be overfilled - you would get frozen command.

Comment 7 Xiaowei Li 2012-05-14 14:56:02 UTC
(In reply to comment #6)
> Hmm - so from your   log  - the suspend with flush is being issued - but it
> seems like you do not have  dmeventd running/available ?
No, the dmeventd is running.
> 
> Could you provide   'lvs -a'  states before & after fail ?
attaching the more verbose log and scripts
> 
> It might be your pool is being exhausted and in case you are not running
> dmeventd, the pool should stay locked, thus any writer would stay delayed.
the pool has free blocks.
> 
> There is one extra case - if your cache is big enough to allow to eat all data
> from command - but pool is not big enough to provision all blocks.
> In this case 'dd' exits normally (unless you use DIRECT) and some data are left
> unwritten in the queue. If you start to make snapshot in this case - you are in
> troubles with current thinp implementation.
> 
> lvs -a   should show fullness of your pool.
> 
> But I admit it doesn't explain why the external dmsetup suspend helps in this
> case. Since if the pool would be overfilled - you would get frozen command.

I have tried some other scenario. in summary:
1. create traditional snapshot of thick LV. --- passed
2. create traditional snapshot of thin LV. --- passed
3. create the new snapshot(thinp-snapshot) of thin LV. --- failed
4. create the new snapshot(thinp-snapshot) of thin LV after invoking sync/fsfreeze before . --- passed

So i doubt the fs IO flush not invoked in #3.
logs of #3 and #2 also proves this point:
#2 :  #libdm-deptree.c:1311 lvcreate    Suspending vg-lv1 (253:7) with filesystem sync with device flush

#3 : #libdm-deptree.c:1311 lvcreate    Suspending vg-lv1 (253:7) with device flush

please refer to thinp-snap.log & traditional-snap.log for details.

Comment 8 Xiaowei Li 2012-05-14 14:59:27 UTC
Created attachment 584382 [details]
thinp-snap.log

Comment 9 Xiaowei Li 2012-05-14 15:00:04 UTC
Created attachment 584383 [details]
traditional-snap.log

Comment 10 Alasdair Kergon 2012-05-14 15:12:03 UTC
If it's forgotten the "with filesystem sync" then we probably need to check the _lv_suspend() lockfs flag setting logic.

Comment 11 Alasdair Kergon 2012-05-14 15:15:07 UTC
        if (!laopts->origin_only &&
            (lv_is_origin(lv_pre) || lv_is_cow(lv_pre)))
                lockfs = 1;

So presumably that code is not catching the thin snap case?

Comment 12 Zdenek Kabelac 2012-05-14 15:18:27 UTC
Yep - in the middle of testing fix for this.

Comment 13 Zdenek Kabelac 2012-10-11 09:50:17 UTC
This will get fixed with next fc17 release.
Fixed by this commit:

https://www.redhat.com/archives/lvm-devel/2012-June/msg00030.html

Comment 14 Peter Rajnoha 2013-04-26 13:47:53 UTC
Fedora 17 is nearing its 'End of Life' in approx. two months from now, I don't expect any new updates for F17. This is fixed in F18 version of LVM2.