1305327 – [RFE] - LVM commands called with autoback

Bug 1305327 - [RFE] - LVM commands called with autoback

Summary: [RFE] - LVM commands called with autoback

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.6.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nir Soffer
QA Contact:	Raz Tamir
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	CEECIR_RHV43_proposed
TreeView+	depends on / blocked

Reported:	2016-02-07 07:52 UTC by Pavel Zhukov
Modified:	2022-04-21 06:38 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-02-01 22:25:15 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:
Flags:	ratamir: testing_plan_complete-

Attachments	(Terms of Use)
example of lvm backup from the pv (8.82 KB, application/x-xz) 2018-02-01 03:23 UTC, Marina Kalinin	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1513960	medium	CLOSED	[RFE] Increase resilience of disk layout in block based storage domains against accidental wiping	2021-12-10 15:47:03 UTC
Red Hat Issue Tracker	RHV-43294	None	None	None	2021-08-30 13:08:19 UTC
Red Hat Knowledge Base (Solution)	2210531	None	None	None	2018-02-01 03:26:31 UTC

Internal Links: 1513960

Description Pavel Zhukov 2016-02-07 07:52:05 UTC

Description of problem:
Most of lvm commands are called from within vdsm code with disabled autoback. This leads missed LV or corrupted VM images if VG metadata was restored as result of recovery procedure. 

Version-Release number of selected component (if applicable):
vdsm-4.17.18-0.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create new disk
2. Check if LV of the disk is under /etc/lvm

Actual results:
LV is not in metadata backup and will not be restored with vgcfgrestore

Additional info:
adcb6c26-2652-4cef-82d5-804e711e8824::DEBUG::2016-02-07 02:41:06,521::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm lvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405ba299a3fb8524452890938e29|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --contiguous n --size 2048m --addtag OVIRT_VOL_INITIALIZING --name a45899e9-016d-4790-b6b3-5336f4b0b7af db10eec1-fd6b-46e6-8bdf-467657e58b91 (cwd None)

Comment 2 Nir Soffer 2016-02-07 17:10:42 UTC

There is no bug here, we do not expect users restoring our vgs. 

I don't think we can use host based metadata backup, as there is no way to ensure that this backup is correct.

Comment 7 Marina Kalinin 2016-02-08 18:16:23 UTC

I went back to 3.0 code and I see it has no change - it has been that way always.
I can see we use no backup option for all lv operations. Thinking back, it was always a concern to restore from too old of an lvm backup, if customer has introduced any changes to his environment since then.

Actually, I must say, this is a reasonable behavior from vdsm - there maybe hundreds of lvs created (if not more) during a single day. Take, for example, customers that are using big pools. Creating lvm backup with each such change would be non-feasible and would create a lot of overhead - time, i/o and disk space.
Thus, I'll reduce the severity of the bug back to medium.

What is interesting to me is how we do get lvs information restored now, when using vgcfgrestore. When does the archive get updated with lv information? 
And maybe we can find a mid term solution, when vdsm would issue lvm backup, let's say every 100[configurable] lvs? 
Pavel, would it help in your case?

Comment 9 Pavel Zhukov 2016-02-08 21:44:31 UTC

(In reply to Marina from comment #7)
> I went back to 3.0 code and I see it has no change - it has been that way
> always.
> I can see we use no backup option for all lv operations. Thinking back, it
> was always a concern to restore from too old of an lvm backup, if customer
> has introduced any changes to his environment since then.
Right. That's why we want to have most recent backup if possible. 
> 
> Actually, I must say, this is a reasonable behavior from vdsm - there maybe
> hundreds of lvs created (if not more) during a single day. Take, for
> example, customers that are using big pools. Creating lvm backup with each
> such change would be non-feasible and would create a lot of overhead - time,
> i/o and disk space.
Not sure about overhead. LVM writes metadata to many places in disk (for backup purposes) every single change. One more place (file) should not bring visible impact especially taking into account it's another block device (not SD). Disk space is not an issue here. It's configurable and 50 latest backups are kept in current implementation. But LVM developers can say more about.
> Thus, I'll reduce the severity of the bug back to medium.
> 
> What is interesting to me is how we do get lvs information restored now,
> when using vgcfgrestore. When does the archive get updated with lv
> information? 
vgchange (addtag/deltag) is called without nobackup as well as vgs from within _reloadvgs
Examples:
description = "Created *after* executing 'vgchange ///  <== addtag/deltag
OR
description = "Created *after* executing 'vgdisplay 
OR
description = "Created *after* executing '/sbin/vgs /// <== _reloadVGs

So in current implementation backup is created *only* if vgchange/reloadvgs called. I don't know the reason (probably because LVM_NOBACKUP was forgotten here. But thanks to this we have backups (adding/removing/attaching/detaching SD, master reconstructing and vdsm restart).
So if user added let's say 100 VMs and for whatever reason LVM metadata was corrupted before storage layout changed all 100 VMs disks will be lost. If qcow image was extended old size will be restored from backup (lvresize). 
> And maybe we can find a mid term solution, when vdsm would issue lvm backup,
> let's say every 100[configurable] lvs? 
If user has 50 LVs they will loose all. Not good... And logic to determinate if backup should be written or not may take more resources than writing itself (see explanation above).
> Pavel, would it help in your case?

Comment 10 Marina Kalinin 2016-02-26 19:58:13 UTC

Allon,
What do you think? 
Should we move it into RFE?

I believe this bug definitely deserves at least a discussion, if we can and when we can to back up the lvs lvm information, since it is definitely beneficial for our customers in case they overwrite their storage by mistake(a pretty common mistake).

Comment 11 Nir Soffer 2016-02-26 20:46:20 UTC

Having lvm backups sounds useful, but we cannot use host based backups, since
you will different backups on different hosts (spm moved from one host to
another).

This will be worse in 4.x, when there is no spm - lvs would be created on 
any host, so every host would have different backup.

If we want this, we will have to copy the backups to engine, or to other place
that can store these backups.

I don't expect a performance issue for creating these backups, since we don't 
created lot of lvs, but I did not measure this.

Comment 12 Marina Kalinin 2016-02-26 21:51:30 UTC

Good points.
We are planning to open RFE to enable centralized logging to RHEV-M, so this can go there as well.

thank you!

Comment 16 Yaniv Lavi 2016-05-09 11:06:27 UTC

oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

Comment 23 Yaniv Kaul 2016-11-21 10:43:03 UTC

Duplicate (sort of) of bug 1380698 ?

Comment 26 Nir Soffer 2016-12-28 16:30:27 UTC

(In reply to Yaniv Kaul from comment #23)
> Duplicate (sort of) of bug 1380698 ?

Not really, that bug is just a minor cleanup, avoid using backup when the command
does not support it.

This bug is about having backups, the main issue is where to keep the backups.
With single host, local backups are fine, when working with shared storage, 
local backups are meaningless. You may have different backups on every host,
depennding where the spm was running when lvm command generated a backup.

This can be RFE for 4.2.

Comment 27 Yaniv Lavi 2016-12-28 16:46:10 UTC

(In reply to Nir Soffer from comment #26)
> (In reply to Yaniv Kaul from comment #23)
> > Duplicate (sort of) of bug 1380698 ?
> 
> Not really, that bug is just a minor cleanup, avoid using backup when the
> command
> does not support it.
> 
> This bug is about having backups, the main issue is where to keep the
> backups.
> With single host, local backups are fine, when working with shared storage, 
> local backups are meaningless. You may have different backups on every host,
> depennding where the spm was running when lvm command generated a backup.
> 
> This can be RFE for 4.2.

We can have local backups and recommend to use the one from SPM. It at least can help CEE with the restore process, even if it won't be complete.

Comment 28 Nir Soffer 2016-12-28 16:58:25 UTC

(In reply to Yaniv Dary from comment #27)
> We can have local backups and recommend to use the one from SPM. It at least
> can help CEE with the restore process, even if it won't be complete.

But the spm may change any time, the system can pick any host as the spm.

For example this flow:

1. SPM running on host A
2. Change storage, backup...
3. SPM switch to host B
4. disaster
5. CEE uses backup from host B instead of host A

I think we need a way to send the backups to external system that does not
depend on block storage this backup describes.

Comment 29 Yaniv Lavi 2016-12-28 17:06:37 UTC

(In reply to Nir Soffer from comment #28)
> (In reply to Yaniv Dary from comment #27)
> > We can have local backups and recommend to use the one from SPM. It at least
> > can help CEE with the restore process, even if it won't be complete.
> 
> But the spm may change any time, the system can pick any host as the spm.
> 
> For example this flow:
> 
> 1. SPM running on host A
> 2. Change storage, backup...
> 3. SPM switch to host B
> 4. disaster
> 5. CEE uses backup from host B instead of host A
> 
> I think we need a way to send the backups to external system that does not
> depend on block storage this backup describes.

I was suggesting a local back on each if the hosts and then using the current SPM backup for the initial restore process. The backup doesn't need to be perfect, but allow CEE to start the restore process. We can find longer term solution to an external system.

Comment 30 Yaniv Kaul 2017-06-06 18:55:46 UTC

Does it make sense to externalize it? Have an Ansible script that runs (for example) every hour, goes to the current SPM and saves a backup on the local host? Sounds like a good enough solution to me.

Comment 31 Allon Mureinik 2017-06-06 21:27:02 UTC

(In reply to Yaniv Kaul from comment #30)
> Does it make sense to externalize it? Have an Ansible script that runs (for
> example) every hour, goes to the current SPM and saves a backup on the local
> host? Sounds like a good enough solution to me.

That sounds like a way to automate a really bad idea (see, e.g., comment 28). I wouldn't go that way.

Comment 45 Marina Kalinin 2018-02-01 03:22:43 UTC

Hi Ben,
I would like to get your blessing on closing this bug.
This bug is talking about backups on the host, to be able to recover lvm metadata, accidentally wiped off by some sort of human error. Until now we were using lvm backup located on a host under /etc/lvm/backup.

However, seems like there is another way to do it, by using the lvm backup located at the beginning and the end of the first PV in a VG(aka RHV SD). Those backups are frequent enough and always present due to the way RHV creates the VG: "--metadatasize 128M --metadatacopies 2". And we should use that backup for recovering lvm metadata in the future. And thus this bug can be closed.
I will attach an example of such backup shortly.
Will appreciate your ack on that.

Comment 46 Marina Kalinin 2018-02-01 03:23:56 UTC

Created attachment 1389293 [details]
example of lvm backup from the pv

Comment 47 Ben Marzinski 2018-02-01 17:48:43 UTC

I don't do much LVM work, so I can't offer a definitive answer to your question, but I know that certainly in some cases, if you've overwritten a PV, you can restore it using the metadata off a different PV in the VG, with something like

# pvcreate --norestorefile --uuid=<UUID_of_the_device>

Dave, do you want to take a crack at giving them a more definitive answer here?

Comment 48 David Teigland 2018-02-01 19:42:06 UTC

The metadata area on the PV will usually hold a number of previous versions of the VG metadata, but it is not guaranteed, e.g. if the VG metadata is large enough you're not guaranteed to find the full previous version.  There are also no tools to automatically use or extract the old copies of metadata from the PV areas.  The backup files stored on the file system are always complete copies, and can be used by lvm tools like vgcfgrestore.  I might suggest that RHV/vdsm use vgcfgbackup directly to capture full copies of the metadata at important points in time.

Comment 49 loberman 2018-02-01 19:57:34 UTC

Closing this bug should depend on if there will be some new method to capture lvm backups in /etc/lvm/backups.
If I read this correctly, vdsm is bypassing autoback.

Clearly it seems to be important for many situations CEE gets into with customers.

Thanks
Laurence

Comment 50 loberman 2018-02-01 20:01:11 UTC

Hello Germano, perhaps I misunderstood your explanation.
I just know GFW fights often with this so feel free to correct me and override.
Regards
Laurence

Comment 51 Nir Soffer 2018-02-01 20:06:44 UTC

(In reply to David Teigland from comment #48)
> The backup files stored on the file system are always
> complete copies, and can be used by lvm tools like vgcfgrestore.

There is no "file system" in RHV. The operation that create backups are performed
only the SPM host, and the SPM host can move any time to another host. So the 
complete backup is available only if you combine the backup files from all hosts 
in the cluster that were used as SPM in the past, including hosts which are not
in the cluster any more.

> I might
> suggest that RHV/vdsm use vgcfgbackup directly to capture full copies of the
> metadata at important points in time.

But were do we store the complete backup in the cluster?

I think our best backup is what we have on shared storage in the PV metadata
area.

This was useful to detect a user error in this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1500395#c3

What happen when the vg metadata area become full? does LVM delete old metadata
to make room for new versions?

I don't think we need complete backup of all versions. What we need is X last 
versions to allow recovery of X last states.

Comment 52 Nir Soffer 2018-02-01 20:15:11 UTC

(In reply to loberman from comment #49)
> Closing this bug should depend on if there will be some new method to
> capture lvm backups in /etc/lvm/backups.

There is no /etc/lvm/backups in a cluster.

> If I read this correctly, vdsm is bypassing autoback.
> 
> Clearly it seems to be important for many situations CEE gets into with
> customers.

I think we should first check how pv metadata can be used, since it seems to be
the same data that you normally have in /etc/lvm/backups, and it is available
in a known location in a cluster.

If we want a backup of the pv metadata, backing up the metadata areas is easier
compared to collecting the data from all hosts.

Comment 53 David Teigland 2018-02-01 20:35:44 UTC

> I think our best backup is what we have on shared storage in the PV metadata
> area.

It sounds like it.

> What happen when the vg metadata area become full? does LVM delete old
> metadata to make room for new versions?

metadata versions are written sequentially into the metadata area and when it reaches the end, it goes back to the beginning and overwrites older versions.

> I don't think we need complete backup of all versions. What we need is X
> last versions to allow recovery of X last states.

lvm isn't using the previous copies of metadata in the PV metadata areas, doesn't give you direct access to it, and doesn't guarantee anything about it.  But, for all practical purposes it should work for what you need to do for now.  I think we need to come up with some better solutions for you though.

Comment 54 Marina Kalinin 2018-02-01 22:25:15 UTC

I will close this bug wontfix for now and I suggest moving this discussion toward this RFE:
bz#1541165 - Provide a way to extract lvm metadata backups from a PV

Comment 55 Franta Kust 2019-05-16 12:55:16 UTC

BZ<2>Jira re-sync

Note You need to log in before you can comment on or make changes to this bug.

ahoness
amureini
bazulay
bcholler
bhaubeck
bmarzins
gchakkar
gveitmic
loberman
lsurette
mkalinin
molasaga
mwest
nashok
nsoffer
pzhukov
srevivo
teigland
tnisan
ycui
ykaul
ylavi