Bug 957692 - LVM-on-SW-RAID is not activated on boot if using lvmetad and automatic volume activation
LVM-on-SW-RAID is not activated on boot if using lvmetad and automatic volume...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: lvm2 (Show other bugs)
19
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Peter Rajnoha
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-29 05:21 EDT by Ferry Huberts
Modified: 2013-05-21 23:15 EDT (History)
19 users (show)

See Also:
Fixed In Version: lvm2-2.02.98-8.fc19
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1038917 (view as bug list)
Environment:
Last Closed: 2013-05-21 23:15:37 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
boot log (12.90 KB, text/x-log)
2013-04-29 09:16 EDT, Ferry Huberts
no flags Details
dmesg (72.97 KB, application/octet-stream)
2013-04-29 09:16 EDT, Ferry Huberts
no flags Details
lspci (11.76 KB, application/octet-stream)
2013-04-29 09:17 EDT, Ferry Huberts
no flags Details
lvm conf (35.59 KB, application/octet-stream)
2013-04-29 09:17 EDT, Ferry Huberts
no flags Details
partitions (471 bytes, application/octet-stream)
2013-04-29 09:17 EDT, Ferry Huberts
no flags Details
hdparm sda - sde (3.16 KB, application/octet-stream)
2013-04-29 09:19 EDT, Ferry Huberts
no flags Details
pvscan (3.04 KB, text/plain)
2013-04-29 10:07 EDT, Ferry Huberts
no flags Details

  None (edit)
Description Ferry Huberts 2013-04-29 05:21:46 EDT
Description of problem:
Since I installed updates on thu 25 apr 2013, my LVM metadata is destroyed on boot. This only happens to LVM-on-SW-RAID.

Version-Release number of selected component (if applicable):
kernel 3.9.0-0.rc8.git0.2.fc20.x86_64

How reproducible:
always

Steps to Reproduce:
1. create sw-raid array, raid 5
2. create a pv+vg+lv on it
3. mount the new lv by default
4. reboot
  
Actual results:
destroyed lvm metadata on vg_home, other lvm (those not on sw-raid) unaffected.
I can manually restore the metadata and have a correct vg/lv again (even fsck says it's ok)

Expected results:
duh

Additional info:


/etc/fstab (I've switched /home to noauto)
==========================================
/dev/mapper/vg_paul-lv_root /                       ext4    defaults        1 1
UUID=078909b0-991a-4197-ba9e-7dabded585e8 /boot                   ext4    defaults        1 2
/dev/mapper/vg_paul-lv_swap swap                    swap    defaults        0 0
/dev/mapper/vg_home-lv_home /home ext4 defaults,noauto 1 2


/proc/mdstat
============
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[4] sdd1[1] sda1[0] sdc1[2]
      937316352 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/3 pages [0KB], 65536KB chunk

unused devices: <none>


/etc/lvm/backup/vg_home
=======================
# Generated by LVM2 version 2.02.98(2) (2012-10-15): Mon Apr 29 11:04:56 2013

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing 'vgs'"

creation_host = "paul.internal.Hupie.com"	# Linux paul.internal.Hupie.com 3.9.0-0.rc8.git0.2.fc20.x86_64 #1 SMP Mon Apr 22 21:26:31 UTC 2013 x86_64
creation_time = 1367226296	# Mon Apr 29 11:04:56 2013

vg_home {
	id = "zFrPhC-6TYC-uCaj-28N1-BDHH-M5qx-IQIgif"
	seqno = 4
	format = "lvm2" # informational
	status = ["RESIZEABLE", "READ", "WRITE"]
	flags = []
	extent_size = 8192		# 4 Megabytes
	max_lv = 0
	max_pv = 0
	metadata_copies = 0

	physical_volumes {

		pv0 {
			id = "dsTImS-53ru-LCmJ-SfO9-70PV-QtFN-6gM8N7"
			device = "/dev/md0"	# Hint only

			status = ["ALLOCATABLE"]
			flags = []
			dev_size = 1874632704	# 893.895 Gigabytes
			pe_start = 3072
			pe_count = 228836	# 893.891 Gigabytes
		}
	}

	logical_volumes {

		lv_home {
			id = "yw5W2T-fnpb-jJ99-Zjab-FgSn-JKpO-1xuYbo"
			status = ["READ", "WRITE", "VISIBLE"]
			flags = []
			creation_host = "paul.internal.Hupie.com"
			creation_time = 1352899831	# 2012-11-14 14:30:31 +0100
			segment_count = 1

			segment1 {
				start_extent = 0
				extent_count = 228836	# 893.891 Gigabytes

				type = "striped"
				stripe_count = 1	# linear

				stripes = [
					"pv0", 0
				]
			}
		}
	}
}
Comment 1 Ferry Huberts 2013-04-29 05:28:46 EDT
metadata is destroyed even when the volume was not mounted when rebooting.


manual restore procedure
========================

pvcreate --uuid "dsTImS-53ru-LCmJ-SfO9-70PV-QtFN-6gM8N7" \
         --restorefile /etc/lvm/backup/vg_home \
         /dev/md0
vgcfgrestore vg_home
vgchange -ay vg_home
fsck.ext4 /dev/vg_home/lv_home -f
Comment 2 Alasdair Kergon 2013-04-29 08:40:38 EDT
What leads you to think it is 'destroyed'?

Please provide the actual boot messages and error messages you are seeing when you try to look for it manually after booting and activate it.

Also give the versions of other packages involved like lvm2, udev, mdadm, dracut etc. and provide a copy of lvm.conf.  Is lvmetad running?
Comment 3 Ferry Huberts 2013-04-29 09:16:03 EDT
(In reply to comment #2)
> What leads you to think it is 'destroyed'?
> 

Because on _every_ reboot the PV on /dev/md0 is gone
and a pvscan /dev/md0 doesn't bring it back.

> Please provide the actual boot messages and error messages you are seeing
> when you try to look for it manually after booting and activate it.
> 
> Also give the versions of other packages involved like lvm2, udev, mdadm,
> dracut etc. and provide a copy of lvm.conf.  Is lvmetad running?

Going to attach it. The attached files are generated by a boot that has /home mounted on boot.

dracut.x86_64          027-36.git20130418.fc19
lvm2.x86_64            2.02.98-7.fc19
lvm2-libs.x86_64       2.02.98-7.fc19
mdadm.x86_64           3.2.6-15.fc19
systemd.x86_64         202-3.fc19
systemd-devel.x86_64   202-3.fc19
systemd-libs.i686      202-3.fc19
systemd-libs.x86_64    202-3.fc19
systemd-python.x86_64  202-3.fc19
systemd-sysv.x86_64    202-3.fc19

lvmetad is running
Comment 4 Ferry Huberts 2013-04-29 09:16:34 EDT
Created attachment 741478 [details]
boot log
Comment 5 Ferry Huberts 2013-04-29 09:16:55 EDT
Created attachment 741479 [details]
dmesg
Comment 6 Ferry Huberts 2013-04-29 09:17:12 EDT
Created attachment 741480 [details]
lspci
Comment 7 Ferry Huberts 2013-04-29 09:17:33 EDT
Created attachment 741481 [details]
lvm conf
Comment 8 Ferry Huberts 2013-04-29 09:17:53 EDT
Created attachment 741482 [details]
partitions
Comment 9 Ferry Huberts 2013-04-29 09:19:17 EDT
Created attachment 741483 [details]
hdparm sda - sde
Comment 10 Alasdair Kergon 2013-04-29 09:37:54 EDT
Try the pvscan with --cache and add -vvvv to watch what it is doing.
Comment 11 Alasdair Kergon 2013-04-29 09:38:59 EDT
If still no luck, eliminate lvmetad: set use_lvmetad to 0 instead of 1 in lvm.conf and kill the daemon.
Comment 12 Ferry Huberts 2013-04-29 10:07:17 EDT
Created attachment 741500 [details]
pvscan

(In reply to comment #10)
> Try the pvscan with --cache and add -vvvv to watch what it is doing.

Ok, that works.

the restore procedure now is

pvscan --cache -vvv /dev/md0
vgchange -ay vg_home
fsck.ext4 /dev/vg_home/lv_home -f

mount /home
Comment 13 Marian Csontos 2013-04-29 10:32:15 EDT
Pretty likely just another instance of Bug 952782 bug-family.
Comment 14 Marian Csontos 2013-04-29 10:37:51 EDT
Disabling lvmetad is a workaround as confirmed here:

https://bugzilla.redhat.com/show_bug.cgi?id=952782#c3

Shout please if not working for you.
Comment 15 Ferry Huberts 2013-04-29 10:38:20 EDT
Well,
My /home is not encrypted and I have lvm2 2.02.98-7.fc19.
So my report looks entirely different IMHO
Comment 16 Ferry Huberts 2013-04-29 10:44:35 EDT
(In reply to comment #14)
> Disabling lvmetad is a workaround as confirmed here:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=952782#c3
> 
> Shout please if not working for you.

I does work :-)
tnx!
Comment 17 Alasdair Kergon 2013-04-30 12:36:01 EDT
So you have a workaround, but you are using the latest package and so we still need to understand why it's not working out-of-the-box.
Comment 18 Ferry Huberts 2013-05-01 03:02:44 EDT
Powered on the system after being away for the weekend and *BOOM*, same problem :-(

Even though being on the latest lvm2 and having  use_lvmetad = 0

Restore procedure as mentioned in comment 12 still works
Comment 19 Ferry Huberts 2013-05-01 03:13:56 EDT
Then ran updates, got kernel 3.10.rc0-nodebug, rebooted, and it works ok.
Heisenbug?
Comment 20 Peter Rajnoha 2013-05-02 09:36:46 EDT
OK, we've nailed down the problem - we need to recognize the CHANGE uevent that reports the MD device coming from inactive to active state.

LVM2 volumes are now activated automatically as PVs appear in the system (very similar to MD's incremental mode). These volumes are autoactivated with the help of lvmetad that collects metadata information from newly appeared devices. Fedora 19 will use this new activation scheme by default.

With that LVM2 update, the LVM2 udev rules were modified a bit, so the autoactivation happens only on ADD events for any devices other than device-mapper ones (it was activated on ADD and CHANGE before the update). So for MD, we expected the device to be usable after ADD event as well, which needs to be fixed!

That update was necessary, as otherwise, the LVM2 volumes would be activated even after the CHANGE event that is triggered because of the WATCH udev rule (which happens on each close of the device opened for read-write) or any other spurious CHANGE event in general. So the LVM2 volumes were activated when they should not actually! This update fixed that.

However, the MD is special and we need to recognize the exact CHANGE event that makes the device usable/active from other CHANGE events. We need to add this hook to lvm2-lvmetad udev rules... I'll try to provide the update with the fix asap (this might probably require some coordination with md udev rules).
Comment 21 Peter Rajnoha 2013-05-02 09:47:12 EDT
Changing the description of the problem to "activation" instead of "metadata loss" problem as metadata are not destroyed, only the LVM volumes are not activated if lvmetad is used (and so the automatic/event-based activation of LVM2 volumes).
Comment 22 Fedora Update System 2013-05-03 08:48:09 EDT
lvm2-2.02.98-8.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/lvm2-2.02.98-8.fc19
Comment 23 Peter Rajnoha 2013-05-03 09:01:38 EDT
I've done an update which should resolve this issue. If you still encounter a problem with MD+LVM with lvmetad, feel free to reopen this bug report. Thanks.
Comment 24 Fedora Update System 2013-05-03 11:19:25 EDT
Package lvm2-2.02.98-8.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing lvm2-2.02.98-8.fc19'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-7336/lvm2-2.02.98-8.fc19
then log in and leave karma (feedback).
Comment 25 Ferry Huberts 2013-05-06 14:09:23 EDT
installed the koji rpms and re-enabled use_lvmetad, a few reboots later and everything looks good.
fixed for me.

tnx!
Comment 26 Fedora Update System 2013-05-21 23:15:37 EDT
lvm2-2.02.98-8.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.