Bug 821372
Summary: | not IO flush when creating thinp snapshot | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Xiaowei Li <xiaoli> | ||||||||||||
Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> | ||||||||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 17 | CC: | agk, bmarzins, bmr, dwysocha, heinzm, jonathan, lvm-team, msnitzer, prajnoha, prockai, qcai, zkabelac | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2013-04-26 13:47:53 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Xiaowei Li
2012-05-14 09:20:07 UTC
Can you attach lvcreate -s -vvvv trace? On my system lvcreate -vvvv show it's running: "Suspending vg-lvorigin (253:4) with device flush" before taking a snapshot. And I'm also see consistent filesystem. Created attachment 584318 [details]
snap.log
(In reply to comment #1) It's hard to reproduce it manually, but can be easily to reproduce it using the script. attaching the script 'test.sh' and log 'snap2.log' Created attachment 584323 [details]
snap2.log
Created attachment 584324 [details]
test.sh
Hmm - so from your log - the suspend with flush is being issued - but it seems like you do not have dmeventd running/available ? Could you provide 'lvs -a' states before & after fail ? It might be your pool is being exhausted and in case you are not running dmeventd, the pool should stay locked, thus any writer would stay delayed. There is one extra case - if your cache is big enough to allow to eat all data from command - but pool is not big enough to provision all blocks. In this case 'dd' exits normally (unless you use DIRECT) and some data are left unwritten in the queue. If you start to make snapshot in this case - you are in troubles with current thinp implementation. lvs -a should show fullness of your pool. But I admit it doesn't explain why the external dmsetup suspend helps in this case. Since if the pool would be overfilled - you would get frozen command. (In reply to comment #6) > Hmm - so from your log - the suspend with flush is being issued - but it > seems like you do not have dmeventd running/available ? No, the dmeventd is running. > > Could you provide 'lvs -a' states before & after fail ? attaching the more verbose log and scripts > > It might be your pool is being exhausted and in case you are not running > dmeventd, the pool should stay locked, thus any writer would stay delayed. the pool has free blocks. > > There is one extra case - if your cache is big enough to allow to eat all data > from command - but pool is not big enough to provision all blocks. > In this case 'dd' exits normally (unless you use DIRECT) and some data are left > unwritten in the queue. If you start to make snapshot in this case - you are in > troubles with current thinp implementation. > > lvs -a should show fullness of your pool. > > But I admit it doesn't explain why the external dmsetup suspend helps in this > case. Since if the pool would be overfilled - you would get frozen command. I have tried some other scenario. in summary: 1. create traditional snapshot of thick LV. --- passed 2. create traditional snapshot of thin LV. --- passed 3. create the new snapshot(thinp-snapshot) of thin LV. --- failed 4. create the new snapshot(thinp-snapshot) of thin LV after invoking sync/fsfreeze before . --- passed So i doubt the fs IO flush not invoked in #3. logs of #3 and #2 also proves this point: #2 : #libdm-deptree.c:1311 lvcreate Suspending vg-lv1 (253:7) with filesystem sync with device flush #3 : #libdm-deptree.c:1311 lvcreate Suspending vg-lv1 (253:7) with device flush please refer to thinp-snap.log & traditional-snap.log for details. Created attachment 584382 [details]
thinp-snap.log
Created attachment 584383 [details]
traditional-snap.log
If it's forgotten the "with filesystem sync" then we probably need to check the _lv_suspend() lockfs flag setting logic. if (!laopts->origin_only && (lv_is_origin(lv_pre) || lv_is_cow(lv_pre))) lockfs = 1; So presumably that code is not catching the thin snap case? Yep - in the middle of testing fix for this. This will get fixed with next fc17 release. Fixed by this commit: https://www.redhat.com/archives/lvm-devel/2012-June/msg00030.html Fedora 17 is nearing its 'End of Life' in approx. two months from now, I don't expect any new updates for F17. This is fixed in F18 version of LVM2. |