Bug 845048
Summary: | LVM startup hangs when starting snapshot | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Nate Straz <nstraz> |
Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> |
Status: | CLOSED WONTFIX | QA Contact: | cluster-qe <cluster-qe> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.3 | CC: | agk, dwysocha, heinzm, jbrassow, mcsontos, msnitzer, prajnoha, prockai, rpeterso, thornber, webmaster, zkabelac |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-08-24 08:49:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nate Straz
2012-08-01 15:20:20 UTC
This is one I think where we may need to examine the actual Logical Volume contents. IOW whichever LV holds the COW area for the snapshot might need to be checked. - Is the device completely readable? - Is there any corruption? (A read-only version based on http://people.gnome.org/~markmc/code/merge-dm-snapshot.c perhaps.) (In reply to comment #2) > - Is the device completely readable? I've seen no indications that the device is having issues since it's being used by another node on the SAN for a test. > - Is there any corruption? > (A read-only version based on > http://people.gnome.org/~markmc/code/merge-dm-snapshot.c perhaps.) I'm running that now with the call the merge_chunk() commented out. There aren't many checks in the code for corruption. So far the output is looking sane, but the COW device is 500G (2T virtual size). I reinstalled the hanging system with RHEL6.2 and that system hangs for a while, but eventually continues. Going back to RHEL6.3 the system hangs for much longer. After a while sysrq-W doesn't produce a backtrace anymore. It shows the lvm command as runnable and using a lot of time. runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- R lvm 926 200754.808642 118199 102 200754.808642 3170374.196015 620922.885992 / I am able to recreate this on another system to a certain extent. I think the length of the hang is proportinal to the amount of space used in the cow volume. This is the orignal volume that causes the hang: LV VG Attr LSize Origin Snap% roth_sparse roth_vg swi-a- 500.00G [roth_sparse_vorigin] 25.45 I think this bug is the same I encountered. I detailed what was happening in the linux-lvm mailing list : https://www.redhat.com/archives/linux-lvm/2012-August/msg00018.html There are many additional informations for this bug report, such as : - Stall is with vgchange -a y - Last line before hang is #ioctl/libdm-iface.c:1628 dm reload (253:5) NF [16384] Volume is read, but very slowly. (In reply to comment #5) > I think this bug is the same I encountered. > I detailed what was happening in the linux-lvm mailing list : > https://www.redhat.com/archives/linux-lvm/2012-August/msg00018.html > There are many additional informations for this bug report, such as : > - Stall is with vgchange -a y > - Last line before hang is > #ioctl/libdm-iface.c:1628 dm reload (253:5) NF [16384] > > Volume is read, but very slowly. lvs shows volumes correctly : LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert backup_filer backup owi-i-s- 4.00t backup_filer_20120611 backup swi-a- 450.00g backup_filer 63.40 backup_filer_20120703 backup swi-a- 400.00g backup_filer 31.87 backup_filer_20120804 backup swi-a- 400.00g backup_filer 0.89 Hmmm - using 283GB of exceptions - this is not really going to work well with old snapshots. The exception list must take a lot of your RAM, and reading list of exception blocks from disc must be very slow (thought 1 hour looks weird?? - unless a lot of swapping were involved - how much memory you have?) Old-style snapshot were not designed for multi-GB list of exceptions - and having even 3 such backups.... My best advice here is - to use old snaps in the way they were designed - i.e. short term and small living static snapshot of filesystem. If you want to have long-term backup - I may suggest to experiment with thin provisioning which will handle such workload much more efficiently (On the other hand, it may have numerous problems especially in recover scenarios).
> runnable tasks:
> task PID tree-key switches prio exec-runtime
> sum-exec sum-sleep
> -----------------------------------------------------------------------------
> -----------------------------
> R lvm 926 200754.808642 118199 102 200754.808642
> 3170374.196015 620922.885992 /
>
>
> I am able to recreate this on another system to a certain extent. I think
> the length of the hang is proportinal to the amount of space used in the cow
> volume.
>
> This is the orignal volume that causes the hang:
>
> LV VG Attr LSize Origin Snap%
> roth_sparse roth_vg swi-a- 500.00G [roth_sparse_vorigin] 25.45
Hmm also a lot of exceptions - 125GB
How much RAM do you have in your system ?
I'm afraid using such huge snapshot is not going to give good performance.
(In reply to comment #7) > > LV VG Attr LSize Origin Snap% > > roth_sparse roth_vg swi-a- 500.00G [roth_sparse_vorigin] 25.45 > > Hmm also a lot of exceptions - 125GB > > How much RAM do you have in your system ? > > I'm afraid using such huge snapshot is not going to give good performance. I think the system we were having trouble from only has 2GB of RAM. We were using a sparse device which I believe uses the same copy-on-write layer underneth as a snapshot uses. We have 16GB of memory on the server. As I understand, this hang is not a bug, but the result of how snaps are implemented today, right ? I understand that the exception list is stored in memory, is that correct ? You talk about "old-style snapshot". What is new style ? Using thin provisioning ? But tools are far from ready (forced to use dmsetup), right ? (In reply to comment #9) > We have 16GB of memory on the server. As I understand, this hang is not a > bug, but the result of how snaps are implemented today, right ? > I understand that the exception list is stored in memory, is that correct ? The old-snaps we designed to keep GB, but not 100 of GBs. > > You talk about "old-style snapshot". What is new style ? Using thin > provisioning ? But tools are far from ready (forced to use dmsetup), right ? if you have sufficiently new lvm tools you should be able to create thin pool, thin volumes and and thin snapshots. But yes, it's for now considered experimental (since especially the recovery when something goes wrong is not really addressed well yet) Basic usage goes like this - more in manpages. lvcreate -T vg/thinpoolname -Lsize lvcreate -Vthinvolsize -n thinvolname -T vg/thinpoolname lvcreate -s -n thinsnapname vg/thinvolname Do we have here any regression in terms of 'startup hang' with some earlier version. If not - I'm going to close this BZ - since we cannot improve dramatically multi snapshots with XXX GB size - it's annoyingly slow with old-style snaps, but it's never been designed for such operations. Is this a documentation bug? The memory implications though "obvious" are not documented anywhere. We should document some sane level for expected maximal snapshot size / chunksize vs. available memory. Workaround: use lvconvert to increase chunksize on existing snapshots and use much larger chunksize (1024) when creating snapshots. Cons: negative impacting write performance until the snapshot "stabilizes" and stops growing. Tried it out, and though man page suggests it might be possible, it's not working for existing snapshots, only when converting normal lv into one. 1. To activate an old-style snapshot, a significant fraction of the device has to be read and processed (every Nth chunk) to generate the block index which is stored entirely in memory. 2. You can't change the chunk size of an existing snapshot because the on-disk layout depends on the chunk size, so the whole device would have to be rewritten. 3. We could add a sentence to the lvcreate man page about the limitations. |