Bug 725338

Summary: avoid trashing lvm archive files with 1G lvextends
Product: [Retired] oVirt Reporter: Haim <hateya>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, agk, amureini, bazulay, danken, dwysocha, ewarszaw, fsimonce, heinzm, iheim, jbrassow, jkt, mgoldboi, prajnoha, prockai, pvrabec, thornber, yeylon, zkabelac
Target Milestone: ---   
Target Release: 3.3.4   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-08 06:53:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 756082, 773650, 773651, 773665, 773677, 773696    

Description Haim 2011-07-25 08:44:31 UTC
Description of problem:

lvm by default preserve backup of metadata for every logical volume and volume group in /etc/lvm/{archive,backup}.

in case lots of lvm operations being held, those files gets very big, which consumes local disk space, in our case (vdsm POV), I think we should handle those dirs and configure some kind of log rotate for them.

Metadata backups and archives are automatically created on every volume group and logical volume configuration change unless disabled in the lvm.conf file. By default, the metadata backup is stored in the /etc/lvm/backup file and the metadata archives are stored in the /etc/lvm/archive file. How long the the metadata archives stored in the /etc/lvm/archive file are kept and how many archive files are kept is determined by parameters you can set in the lvm.conf file. A daily system backup should include the contents of the /etc/lvm directory in the backup.

repro steps:

1) 10,000 lv extends leads to 20G archive file.

[root@nott-vds5 ~]# du -hcs /etc/lvm/archive/
18G	/etc/lvm/archive/

Comment 2 Federico Simoncelli 2011-07-25 15:09:43 UTC
The lvm archive files are extremely important to recover from possible issues and the lvm.conf file already provides the parameters "retain_min" and "retain_days" to limit the number of the archived files.
Those values are per vg and are enforced when a lvm command is run, this means that the files of old non-existent vgs won't be removed.

This is probably an evident issue only on QA machines where a large number of vgs is constantly created and removed.

Reassigning to lvm component.

Comment 3 Zdenek Kabelac 2011-07-27 20:04:34 UTC
This is not a solution for this BZ - however it's nice workaround which should significantly speedup some workloads.

Synchronization and backups for each new LV takes a lot of time - so here is my tip:

lvcreate --driverload n --zero n  --autobackup nn

This will not create activated LV - it will just modify metadata and make a new LV ready there. Of course one must be aware of the fact, the LV header is not zeroed so it might have some consequences.  But if you need to create thousands LVs quickly and you do not care about having mda backups for this scripted operation. Once LVs are created they can be activated via one vgchange command.
Also improvements from bug 658639 helps here as well.

Comment 4 Zdenek Kabelac 2011-07-27 20:05:49 UTC
(In reply to comment #3)
> This is not a solution for this BZ - however it's nice workaround which should
> significantly speedup some workloads.
> 
> Synchronization and backups for each new LV takes a lot of time - so here is my
> tip:
> 
> lvcreate --driverload n --zero n  --autobackup nn
> 
                                                 n  (typo)

Comment 5 Alasdair Kergon 2011-07-27 20:45:02 UTC
If you're doing a high volume of LVM operations, it probably makes sense to manage the backups yourself. I.e. disable them by default, and run 'vgcfgbackup' to create checkpoints at appropriate times.

I don't think this is anything we need to change in LVM itself - there are enough features already to hand control of backup files to the user of the tools, I think.

Comment 6 Zdenek Kabelac 2011-08-02 09:15:15 UTC
I think there could be some bug in not respecting lvm.conf settings for keeping defined size of archived files. It might be related to added random number suffix after the vg revision. So some testing could be needed here. Passing it to Peter.

Comment 7 Peter Rajnoha 2011-08-02 13:20:37 UTC
(In reply to comment #6)
> I think there could be some bug in not respecting lvm.conf settings

That's working fine. The only thing is that archive handling is done per VG, not globally, so we end up with old archives being stacked if the VG is removed (so nobody removes old archives after VG removal) as pointed in comment #2.

Anyway, I'd lean to Alasdair's proposal in comment #5. Adding more features to archive handling would make it uselessly complex.

Haim, would it be acceptable to handle the archives manually as suggested in comment #5?

Comment 8 Haim 2011-08-02 15:38:27 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > I think there could be some bug in not respecting lvm.conf settings
> 
> That's working fine. The only thing is that archive handling is done per VG,
> not globally, so we end up with old archives being stacked if the VG is removed
> (so nobody removes old archives after VG removal) as pointed in comment #2.
> 
> Anyway, I'd lean to Alasdair's proposal in comment #5. Adding more features to
> archive handling would make it uselessly complex.
> 
> Haim, would it be acceptable to handle the archives manually as suggested in
> comment #5?

actually, we need to ask someone from vdsm team (rhev hypervisor management layer) - Federico ? could\should vdsm handle this ?

Comment 9 Haim 2011-08-03 08:28:35 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > I think there could be some bug in not respecting lvm.conf settings
> > 
> > That's working fine. The only thing is that archive handling is done per VG,
> > not globally, so we end up with old archives being stacked if the VG is removed
> > (so nobody removes old archives after VG removal) as pointed in comment #2.
> > 
> > Anyway, I'd lean to Alasdair's proposal in comment #5. Adding more features to
> > archive handling would make it uselessly complex.
> > 
> > Haim, would it be acceptable to handle the archives manually as suggested in
> > comment #5?
> 
> actually, we need to ask someone from vdsm team (rhev hypervisor management
> layer) - Federico ? could\should vdsm handle this ?

after 21000 extends, lvm-archive folder consumed 42G.

Comment 10 Peter Rajnoha 2011-08-30 11:35:12 UTC
So, do we have a decision? Can you handle those archives manually (as noted in comment #5). If yes, please close this bug then.

Comment 11 Federico Simoncelli 2011-08-30 11:59:03 UTC
Sadly this is really low priority. New deadlines are approaching and this is not a blocker. As I stated before lvm archives are extremely important to recover from possible issues and this is not the right moment to remove them. Compared to the severity of other open bugs in my opinion this could become an RFE for future releases.

Comment 12 Dan Kenigsberg 2011-08-30 12:40:59 UTC
I do not when we (rhev) can be certain that backups can be dropped, or even when is a good time to set a checkpoint (every lvcreate? lvextend? vgexetend?)

Anyway, if we have no concrete request from lvm2, let's own this bug.

Comment 13 Ayal Baron 2011-09-05 08:11:11 UTC
(In reply to comment #12)
> I do not when we (rhev) can be certain that backups can be dropped, or even
> when is a good time to set a checkpoint (every lvcreate? lvextend? vgexetend?)

We need the opposite.   When running lvextend, we should prevent lvm from writing a backup of the md to disk.
This should be easy enough to do even for current version and would solve Haim's issue.

For 6.3 we should review the operations and determine which ones should cause a backup and which shouldn't.

In addition, we should probably delete any *domain* backup which is older than 1 month.  This would either be a heuristic or would require parsing the files for rhev domain tag to make sure it is indeed a VG that used to belong to vdsm (backups of VGs which vdsm does not manage should not be deleted).

> 
> Anyway, if we have no concrete request from lvm2, let's own this bug.

Moving to vdsm

Comment 15 Dan Kenigsberg 2012-01-05 12:35:12 UTC
Eduardo suggests not to fix anything here. lvextend is potentially dangerous action and avoiding vg metadata backup for all lvextend may cause problems for a customer that somehow killed his vg.

Our current problem, of multiple lvextend's hiding older backups is not necessarily frequent on non-testing setups.

We could have an interesting feature of discerning optimal lv extend size from this data, or another heuristic for using more than 1GB per extend.

Either way, this is not ready for downstream.