Bug 620745 - pvs fails to read vg info on first run on mdraid partition
Summary: pvs fails to read vg info on first run on mdraid partition
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2   
(Show other bugs)
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Peter Rajnoha
QA Contact: Corey Marthaler
URL:
Whiteboard:
Keywords: RHELNAK
: 620467 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-03 11:54 UTC by Hans de Goede
Modified: 2010-11-10 21:08 UTC (History)
10 users (show)

Fixed In Version: lvm2-2.02.72-6.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-10 21:08:33 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
output of first run with -vvvv (80.44 KB, text/plain)
2010-08-03 12:18 UTC, Hans de Goede
no flags Details
output of second run with -vvvv (11.10 KB, text/plain)
2010-08-03 12:18 UTC, Hans de Goede
no flags Details
lvmdump (28.34 KB, application/x-gzip)
2010-08-03 13:50 UTC, Hans de Goede
no flags Details

Description Hans de Goede 2010-08-03 11:54:49 UTC
To reproduce create a partitioned mdraid mirror (in my case it was an Intel Firmware RAID set), where one of the partitions is a PV of a single disk VG.
Running lvm pvs on this partition results in pvs not reading the VG info:

  Found duplicate PV 2RoSUZ1tCw9lj5921GIak2KBzfpP13Dd: using /dev/sdb2 not /dev/md127p2
  Found duplicate PV 2RoSUZ1tCw9lj5921GIak2KBzfpP13Dd: using /dev/sdc2 not /dev/sdb2
  get_pv_from_vg_by_id: vg_read_internal failed to read VG
  LVM2_PV_NAME=/dev/sdc2
  LVM2_PV_UUID=2RoSUZ-1tCw-9lj5-921G-Iak2-KBzf-pP13Dd
  LVM2_PV_SIZE=77634560.00
  LVM2_PV_PE_COUNT=0
  LVM2_PV_PE_ALLOC_COUNT=0
  LVM2_PE_START=192.00
  LVM2_VG_NAME=
  LVM2_VG_UUID=
  LVM2_VG_SIZE=0
  LVM2_VG_FREE=0
  LVM2_VG_EXTENT_SIZE=0
  LVM2_VG_EXTENT_COUNT=0
  LVM2_VG_FREE_COUNT=0
  LVM2_PV_COUNT=0

A second run directly after the first one does find the VG info:
  Found duplicate PV 2RoSUZ1tCw9lj5921GIak2KBzfpP13Dd: using /dev/sdb2 not /dev/md127p2
  Found duplicate PV 2RoSUZ1tCw9lj5921GIak2KBzfpP13Dd: using /dev/sdc2 not /dev/sdb2
  LVM2_PV_NAME=/dev/sdc2
  LVM2_PV_UUID=2RoSUZ-1tCw-9lj5-921G-Iak2-KBzf-pP13Dd
  LVM2_PV_SIZE=77631488.00
  LVM2_PV_PE_COUNT=18953
  LVM2_PV_PE_ALLOC_COUNT=18953
  LVM2_PE_START=192.00
  LVM2_VG_NAME=VolGroup
  LVM2_VG_UUID=ibJr4t-w0nr-REJ6-zvSB-C05R-9gcJ-ab4dcF
  LVM2_VG_SIZE=77631488.00
  LVM2_VG_FREE=0
  LVM2_VG_EXTENT_SIZE=4096.00
  LVM2_VG_EXTENT_COUNT=18953
  LVM2_VG_FREE_COUNT=0
  LVM2_PV_COUNT=1

Removing /etc/lvm/cache/.cache makes lvm pvs behave as in the first run again.

This issue is the root cause of:
[Bug 620467] Rescue mode does not find an installation on Intel BIOS RAID

Besides the not reading of the VG info I'm also surprised that lvm pvs seems to think it knows best which PV to use even though I explicitly specified one on the cmdline!

Comment 1 Hans de Goede 2010-08-03 11:59:21 UTC
Some extra info, the exact command used to generate the 2 outputs above is:
lvm pvs --ignorelockingfailure --units k --nosuffix --nameprefixes --rows --unquoted --noheadings -opv_name,pv_uuid,pv_size,pv_pe_count,pv_pe_alloc_count,pe_start,vg_name,vg_uuid,vg_size,vg_free,vg_extent_size,vg_extent_count,vg_free_count,pv_count /dev/md127p2

Comment 2 Alasdair Kergon 2010-08-03 12:02:44 UTC
Package N-V-R ?

Comment 3 Alasdair Kergon 2010-08-03 12:06:32 UTC
If it's after 2.02.67, repeat the test with -vvvv and post the output together with a copy of /etc/lvm/lvm.conf.  (I.e. Output from the two runs - the one that fails followed by the one that works)

Comment 4 Hans de Goede 2010-08-03 12:08:46 UTC
lvm2-2.02.72-3.el6.i686.rpm, getting -vvvv output now.

Comment 6 Hans de Goede 2010-08-03 12:18:17 UTC
Created attachment 436259 [details]
output of first run with -vvvv

Comment 7 Hans de Goede 2010-08-03 12:18:41 UTC
Created attachment 436260 [details]
output of second run with -vvvv

Comment 8 RHEL Product and Program Management 2010-08-03 12:27:39 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 9 Alasdair Kergon 2010-08-03 12:48:17 UTC
Thanks - I can see a couple of things going wrong here.

The basic problem is that lvm2 doesn't properly recognise the device as belonging to md - if we can fix that, the problem should go away.

Comment 10 Milan Broz 2010-08-03 12:55:04 UTC
(In reply to comment #9)
> The basic problem is that lvm2 doesn't properly recognise the device as
> belonging to md - if we can fix that, the problem should go away.

Well, I expect that there is not mdraid superblock but Intel RAID metadata, right?

Comment 11 Hans de Goede 2010-08-03 13:06:50 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > The basic problem is that lvm2 doesn't properly recognise the device as
> > belonging to md - if we can fix that, the problem should go away.
> 
> Well, I expect that there is not mdraid superblock but Intel RAID metadata,
> right?    

Neither is present on the block device which is the PV, as that is a partition on top of mdraid. IOW this has to do with partitioned mdraid. If lvm is smart enough to find the "disk" the partition is on and then looks for superblocks then yes, the disk(s) have Intel RAID metadata not mdraid superblocks, I doubt you will be able to see the metatdata though as Intel Firmware RAID has the following layout:

/dev/md0: container device, this has a size of 0, and owns the disks. AFAIK this is because Intel Firmware RAID allows one disk (different parts of it) to be used for multiple sets.

/dev/md127: a raid set within the container in my case this happens to span almost the entire disk, but as said I don't think it allows access to the superblock.

/dev/md127p2 a partition on the raidset, which is a PV in my case and causing the troubles.

May I say that I find it weird to go look for mdraid superblocks to identify if a device is a mdraid device or not. Using major numbers or in the case device names seems much more suitable for that and less error prone.

Comment 12 Alasdair Kergon 2010-08-03 13:18:52 UTC
That's right - we only detect the original md, dm and drbd major numbers at the moment.

Comment 13 Milan Broz 2010-08-03 13:28:18 UTC
Can you attach lvmdump so we have full view of device number pairs etc?

Comment 14 Hans de Goede 2010-08-03 13:50:55 UTC
Created attachment 436276 [details]
lvmdump

As you can see in the dump, mdraid partitions are using the blkext major (259) this major is used when there are not enough minors reserved for the number of partitions ons a disk, for example when a scsi disk has more then 15 partitions, or in case of mdraid when there are any partitions at all.

So you cannot simply use the major number here to figure out the kind of device. Options I can think of are going through sysfs to find the major of the disk the partition is on, or using the device name rather then the major.

Comment 15 Bill Nottingham 2010-08-04 17:30:21 UTC
(In reply to comment #14)
> So you cannot simply use the major number here to figure out the kind of
> device. Options I can think of are going through sysfs to find the major of the
> disk the partition is on, or using the device name rather then the major.    

Is that actually a viable change at this stage of the release?

Comment 16 Hans de Goede 2010-08-04 18:31:55 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > So you cannot simply use the major number here to figure out the kind of
> > device. Options I can think of are going through sysfs to find the major of the
> > disk the partition is on, or using the device name rather then the major.    
> 
> Is that actually a viable change at this stage of the release?    

A good question, which is best answered by the lvm team. Note I'm with PTO starting tomorrow and I return to work Tuesday Aug. 10th.

Comment 19 Peter Rajnoha 2010-08-11 12:16:01 UTC
The patch is in upstream now (2.02.73).

Comment 20 Hans de Goede 2010-08-11 15:36:35 UTC
*** Bug 620467 has been marked as a duplicate of this bug. ***

Comment 22 Hans de Goede 2010-08-13 09:45:08 UTC
Note to QA to verify this one do an installation onto an Intel BIOS RAID set, then start the installer again and see if it finds the old installation as an upgrade target (or start it in rescue mode and let it mount the old install).

Comment 23 Corey Marthaler 2010-08-25 20:57:58 UTC
Marking this verified, regression tests run only.

Comment 24 releng-rhel@redhat.com 2010-11-10 21:08:33 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.