617972 – vgck does not fail when PV (which contains no mda areas) is missing

Bug 617972 - vgck does not fail when PV (which contains no mda areas) is missing

Summary: vgck does not fail when PV (which contains no mda areas) is missing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Petr Rockai
QA Contact:	Corey Marthaler
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	560608 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-07-25 11:03 UTC by Daniel Shoshan
Modified:	2011-01-13 22:41 UTC (History)
CC List:	11 users (show)
Fixed In Version:	lvm2-2.02.73-1.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-01-13 22:41:57 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
vgs -vvvv --config... (100.42 KB, application/octet-stream) 2010-07-26 06:12 UTC, Ayal Baron	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0052	0	normal	SHIPPED_LIVE	lvm2 bug fix and enhancement update	2011-01-12 17:15:25 UTC

Description Daniel Shoshan 2010-07-25 11:03:34 UTC

Description of problem:

I created a VG on two PVs (each one on a different iscsi server). The metadata is limited to one PV. When there were network problems to one of the storage servers - there were no warnings in lvm commands. 

How reproducible:
always

Steps to Reproduce:
1. create VG on two PVs (each one on a different iscsi server)
2. disconnect each time one storage server from host

  
Actual results:
In one of the times lvm commands won't return warnings (when disconnecting the PV that doesn't contain the metadata).

Expected results:
In both time there should be warnings in lvm commands.

Comment 1 Milan Broz 2010-07-25 12:05:22 UTC

If I understand it correctly, you created one PV with metadata and another one without (--metadatacopies=0) and used these two in one VG.

If the only PV with metadata disappears then lvm have no chance to check that there is VG at all - you will see that remaining PV as not belongs to any VG.
(In this case it is not bug.)

If disappears only PV without metadata you should see warnings.

Are you sure that there is no warning if the PV with metadata is visible to system? Even after vgscan? Please attach that command with -vvvv added.

Anyway, in your config you should use PV with metadata on both PVs.

Comment 2 Milan Broz 2010-07-25 12:45:34 UTC

> Anyway, in your config you should use PV with metadata on both PVs.    
(Ah, not applies for RHEV where one metadata area workaround another problem with shared storage and not usind clvmd:-)

Comment 3 Ayal Baron 2010-07-26 06:12:50 UTC

Created attachment 434342 [details]
vgs -vvvv --config...

Comment 4 Ayal Baron 2010-07-26 06:22:03 UTC

(In reply to comment #1)
> If I understand it correctly, you created one PV with metadata and another one
> without (--metadatacopies=0) and used these two in one VG.
> 
> If the only PV with metadata disappears then lvm have no chance to check that
> there is VG at all - you will see that remaining PV as not belongs to any VG.
> (In this case it is not bug.)
This is the case and it is really problematic.  The fact that the PV does not keep a copy of the VG metadata does not mean it should have no inclination of the fact that it is part of a VG (e.g. you could store vg uuid in pv md)

Comment 5 Ayal Baron 2010-07-26 06:22:30 UTC

(please ignore attachment)

Comment 6 Alasdair Kergon 2010-07-26 10:14:38 UTC

Well we don't.  Rightly or wrongly, it was a basic design decision that metadata does not have to be stored on the actual device itself.  Choosing where to place metadata is always a trade-off.

If you lose your VG metadata, you've lost your device.  That's why we write plenty of copies of it for redundancy.  If you're doing all your metadata modification from a single machine always, you can even have a copy of your metadata stored in the local filesystem to cope with a situation such as that you seem to be describing.  (metadata/dirs in lvm.conf).  The trade-off is that this is not guaranteed not to lock up if your machine runs out of memory, and of course if you change to a different 'master writer' machine, you'd need to copy the directory across (or regenerate it).  Something to experiment with, anyway, if you're particularly concerned to avoid the situation described on this bugzilla.

Comment 7 Ayal Baron 2010-07-26 10:22:48 UTC

We do have backups but that does not solve the issue at hand which is being alerted that there is a problem to begin with.
Nor, does it alert us that the PVs are not eligible to be used for another VG (if they were marked as belonging to the VG then there would be no problem).

The other issue is that rc = 0 when any pv other than the one with the MD is missing (there is a missing uuid warning, but programmatically we can't rely on that).  When running commands which are VG specific (not general vgs or vgscan) we should receive an error (this should only be allowed if we supply the -p parameter which we do not).

Comment 8 Alasdair Kergon 2010-07-26 10:26:58 UTC

(By the way, storing the VG uuid on the PV without the VG metadata would create more complexity for the lvm2 tools than if the whole metadata was stored there.  That's why we don't do it.  I haven't ruled out a 1-bit "belongs to some VG" flag though that would effectively lock it against change.)

Comment 9 Alasdair Kergon 2010-07-26 10:33:19 UTC

I wasn't talking about  backups in comment 6, but a *live* copy of the metadata in the filesystem - try out that lvm.conf setting.

Comment 10 Alasdair Kergon 2010-07-26 10:49:11 UTC

The rc=0 thing - well it depends on what the command is, as several commands can work fine while a PV is missing.

Comment 12 Ayal Baron 2010-07-26 12:35:56 UTC

(In reply to comment #10)
> The rc=0 thing - well it depends on what the command is, as several commands
> can work fine while a PV is missing.    
Anything that queries the MD can work just fine, but the question is, should it?
What command could we run that is guaranteed to fail but wouldn't try to make changes?  So we could at least monitor the VG state.

Comment 13 Alasdair Kergon 2010-07-26 12:45:44 UTC

I've asked Peter to check this still works, but 'vgck' is supposed to do that.

Comment 14 Ayal Baron 2010-07-26 12:50:00 UTC

it doesn't:

[root@figo ~]# vgck --config " devices { preferred_names = [\"^/dev/mapper/\"] write_cache_state=0 filter = [ \"a%/dev/mapper/3600144f076de060000004c0b98dd000e%\", \"r%.*%\" ] } "
  Couldn't find device with uuid CNArt4-DmB7-xe0L-D5bC-ixKL-WZSF-m8qZ60.
[root@figo ~]# echo $?
0

Comment 15 Alasdair Kergon 2010-07-26 13:48:44 UTC

Then that needs to be fixed: man page says:

SYNOPSIS
       vgck [-d|--debug] [-h|-?|--help] [-v|--verbose] [VolumeGroupName...]

DESCRIPTION
       vgck checks LVM metadata for each named volume group for consistency.


Arguably that tool might want additional exit codes to distinguish between different sorts of consistency.

Comment 16 Petr Rockai 2010-07-27 20:21:59 UTC

Fixed in CVS (vgck now issues an error message and exits with non-0 status upon encountering missing PVs).

Comment 17 Ayal Baron 2010-08-10 04:39:42 UTC

*** Bug 560608 has been marked as a duplicate of this bug. ***

Comment 18 Milan Broz 2010-08-30 10:39:31 UTC

Fix in lvm2-2.02.73-1.el5.

Comment 21 Corey Marthaler 2010-11-08 23:47:53 UTC

I failed the PV w/o metadata and verified that vgck realized it was missing and returned non zero. Marking verified in lvm2-2.02.74-1.el5.

[root@grant-01 tmp]#  pvcreate --metadatacopies=1 /dev/sdb1
  Physical volume "/dev/sdb1" successfully created
[root@grant-01 tmp]#  pvcreate --metadatacopies=0 /dev/sdc1
  Physical volume "/dev/sdc1" successfully created

[root@grant-01 tmp]# vgcreate VG /dev/sd[bc]1
  Volume group "VG" successfully created

[root@grant-01 tmp]# vgs
  VG         #PV #LV #SN Attr   VSize  VFree
  VG           2   0   0 wz--n- 95.36G 95.36G
  VolGroup00   1   2   0 wz--n- 74.38G     0

[root@grant-01 tmp]# echo offline > /sys/block/sdc/device/state

[root@grant-01 tmp]# vgs
  /dev/sdc1: open failed: No such device or address
  /dev/sdc2: open failed: No such device or address
  /dev/sdc3: open failed: No such device or address
  /dev/sdc5: open failed: No such device or address
  /dev/sdc6: open failed: No such device or address
  Couldn't find device with uuid jU1NvL-EEhm-aR4T-9XAs-573y-vEUt-s6i0KW.
  VG         #PV #LV #SN Attr   VSize  VFree
  VG           2   0   0 wz-pn- 95.36G 95.36G
  VolGroup00   1   2   0 wz--n- 74.38G     0
[root@grant-01 tmp]# echo $?
0

[root@grant-01 tmp]# vgck VG
  Couldn't find device with uuid jU1NvL-EEhm-aR4T-9XAs-573y-vEUt-s6i0KW.
  The volume group is missing 1 physical volumes.
[root@grant-01 tmp]# echo $?
5

Comment 23 errata-xmlrpc 2011-01-13 22:41:57 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0052.html

Note You need to log in before you can comment on or make changes to this bug.