Bug 1589444 - Cannot activate LVs in VG xxx while PVs appear on duplicate devices.
Summary: Cannot activate LVs in VG xxx while PVs appear on duplicate devices.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 28
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: David Teigland
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1576830 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-09 14:14 UTC by Wolfgang Denk
Modified: 2018-12-14 20:41 UTC (History)
18 users (show)

Fixed In Version: lvm2-2.02.177-5.fc28
Clone Of:
Environment:
Last Closed: 2018-06-26 17:34:59 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/etc/lvm/lvm.conf (from lvm2-2.02.177-4.fc28.x86_64) (92.22 KB, text/plain)
2018-06-11 06:43 UTC, Wolfgang Denk
no flags Details
Output of "vgchange -ay -vvvv" running under Fedora 27 (61.56 KB, text/plain)
2018-06-11 06:59 UTC, Wolfgang Denk
no flags Details
Outpput of "lvm vgchange -ay -vvvv" under Fedora 28 dracut emergency shell (123.12 KB, text/plain)
2018-06-11 07:16 UTC, Wolfgang Denk
no flags Details
Output of "lvm vgchange -ay -vvvv" under Fedora 28 dracut emergency shell with global_filter set (121.65 KB, text/plain)
2018-06-12 08:03 UTC, Wolfgang Denk
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Debian BTS 870692 0 None None None 2018-06-11 07:22:57 UTC
Red Hat Bugzilla 1575762 0 unspecified CLOSED In Fedora 28, OS is unbootable after using dracut to create new initramfs 2023-09-14 04:27:47 UTC

Internal Links: 1575762

Description Wolfgang Denk 2018-06-09 14:14:22 UTC
Description of problem:

After upgrading (using dnf system-upgrade) a system from F27 to F28, it fails to boot because it cannot find the root file system. The root file system is in a logical volume, and the needed volume group has not been activated. The system drops me into the dracut shell. Trying to enable the VG's manually fails as well.
 

Version-Release number of selected component (if applicable):

current F28

How reproducible:

Always/

Steps to Reproduce:
1. Take F27 system with root file system in a VG on a RAID1 array with 2 disks.
2. Update to F28 using dnf system-upgrade
3. Try to reboot

Actual results:

...
[  325.773834] dracut-initqueue[388]: Warning: dracut-initqueue timeout - starting timeout scripts
[  326.301089] dracut-initqueue[388]: Warning: dracut-initqueue timeout - starting timeout scripts
[  326.827869] dracut-initqueue[388]: Warning: dracut-initqueue timeout - starting timeout scripts
[  326.827980] dracut-initqueue[388]: Warning: Could not boot.
         Starting Setup Virtual Console...
         Starting Dracut Emergency Shell...
Warning: /dev/mapper/atlas1-root does not exist
!!!!! This is the needed/missing root file system !!!!!
dracut:/# ls /dev/mapper
atlas0-home  atlas0-media  atlas0-virtual  control
atlas0-mail  atlas0-mp3    atlas0-work
dracut:/# cat /proc/mdstat 
Personalities : [raid10] [raid1] [raid6] [raid5] [raid4] 
md0 : active raid6 sdk[10] sdh[13] sdg[11] sdi[8] sdl[14] sdm[15] sde[12] sdj[9]
      11720301024 blocks super 1.2 level 6, 16k chunk, algorithm 2 [8/8] [UUUUUUUU]
      
md1 : active raid1 sda3[1] sdb3[0]
      484118656 blocks [2/2] [UU]
      
md2 : active raid1 sda1[1] sdb1[0]
      255936 blocks [2/2] [UU]
      
md3 : active raid10 sdd1[1] sdc1[0]
      234878976 blocks 512K chunks 2 far-copies [2/2] [UU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

unused devices: <none>
dracut:/# lvm vgchange -ay
  WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx on /dev/sda3 was already found on /dev/md1.
  WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx on /dev/sdb3 was already found on /dev/md1.
  WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx prefers device /dev/md1 because device size is correct.
  WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx prefers device /dev/md1 because device size is correct.
  Cannot activate LVs in VG atlas1 while PVs appear on duplicate devices.
  Cannot activate LVs in VG atlas1 while PVs appear on duplicate devices.
  Cannot activate LVs in VG atlas1 while PVs appear on duplicate devices.
  Cannot activate LVs in VG atlas1 while PVs appear on duplicate devices.
  0 logical volume(s) in volume group "atlas1" now active
  6 logical volume(s) in volume group "atlas0" now active



Expected results:

Activation of the LVs in VG "atlas1"

Additional info:

Comment 1 Zdenek Kabelac 2018-06-09 17:32:50 UTC
Please provide/attach  '/etc/lvm/lvm.conf'

And  'vgchange -ay -vvvv' trace.

Where does have your mdraid  keeps metadata - is it at the end of device ?

It looks like there is wrogly recognized  MD leg and it's misinterpreted as PV ??

Comment 2 Wolfgang Denk 2018-06-11 06:43:27 UTC
Created attachment 1449909 [details]
/etc/lvm/lvm.conf (from lvm2-2.02.177-4.fc28.x86_64)

Comment 3 Wolfgang Denk 2018-06-11 06:59:37 UTC
Created attachment 1449910 [details]
Output of "vgchange -ay -vvvv" running under Fedora 27

Comment 4 Wolfgang Denk 2018-06-11 07:16:57 UTC
Created attachment 1449911 [details]
Outpput of "lvm vgchange -ay -vvvv" under Fedora 28 dracut emergency shell

Comment 5 Wolfgang Denk 2018-06-11 07:17:39 UTC
1) Re: '/etc/lvm/lvm.conf' please see the attachment "lvm.conf".

   Note: this file is unmodified by me, i. e. it is what has been
   installed by the Fedora 28 package lvm2-2.02.177-4.fc28.x86_64

2) Running 'vgchange -ay -vvvv' when booting successfully using a
   Fedora 27 kernel (4.16.13-200.fc27.x86_64) I get:
   the outout attachment "vgchange-ay-vvvv-F27.txt"

   With the Fedora 28 kernel, I have only the dracut emergency
   shell, so I'm providing the output of the
   "lvm vgchange -ay -vvvv" instead in attachment
   "vgchange-ay-vvvv-F28.txt".

   Note that in both cases the same root file system has been used -
   but the Fedora 27 kernel/ramdisk still use the older 
   (lvm2-2.02.175-1.fc27.x86_64 based) /etc/lvm/lvm.conf - though I
   cannot see any relevant differences:

--- /etc/lvm/lvm.conf.F27   2017-10-26 12:41:54.000000000 +0200
+++ /etc/lvm/lvm.conf.F28 2018-06-11 08:41:28.164312713 +0200
@@ -611,9 +611,9 @@
        # Select log messages by class.
        # Some debugging messages are assigned to a class and only appear in
        # debug output if the class is listed here. Classes currently
-       # available: memory, devices, activation, allocation, lvmetad,
+       # available: memory, devices, io, activation, allocation, lvmetad,
        # metadata, cache, locking, lvmpolld. Use "all" to see everything.
-       debug_classes = [ "memory", "devices", "activation", "allocation", "lvmetad", "metadata", "cache", "locking", "lvmpolld", "dbus" ]
+       debug_classes = [ "memory", "devices", "io", "activation", "allocation", "lvmetad", "metadata", "cache", "locking", "lvmpolld", "dbus" ]
 }
 
 # Configuration section backup.


3) Here is the complete overview of RAID arrays on this system:

# mdadm -v  --detail --scan
ARRAY /dev/md3 level=raid10 num-devices=2 metadata=0.90 UUID=11f8cff2:51f02ed1:2e6b9cd9:22ffab23
   devices=/dev/sdc1,/dev/sdd1
ARRAY /dev/md1 level=raid1 num-devices=2 metadata=0.90 UUID=edd5525b:0bf6f4ab:cb468022:f81162c3
   devices=/dev/sda3,/dev/sdb3
ARRAY /dev/md2 level=raid1 num-devices=2 metadata=0.90 UUID=e1e7a050:1641069a:2e6b9cd9:22ffab23
   devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md0 level=raid6 num-devices=8 metadata=1.2 name=atlas.denx.de:0 UUID=4df90724:87913791:1700bb31:773735d0
   devices=/dev/sde,/dev/sdg,/dev/sdh,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm

   /dev/md1, which is causing the trouble here, has metadata=0.90,
   which means it uses original 0.90 format superblock, so yes, it
   is at the end of the device.

Comment 6 Zdenek Kabelac 2018-06-11 08:34:52 UTC
Passing to Dave - as it misidentifies raid leg as duplicated PV.

Comment 7 David Teigland 2018-06-11 14:13:18 UTC
There's a bug in 2.02.177 (F28) which causes lvm to scan md components even after lvm's md filter sees they are md components:

#filters/filter-md.c:35            /dev/sdb3: Skipping md component device
#device/dev-cache.c:1536          /dev/sdb3: Using device (8:19)
#label/label.c:286           Reading label from device /dev/sdb3
#device/dev-io.c:599           Opened /dev/sdb3 RO O_DIRECT
#device/dev-io.c:168           /dev/sdb3: Block size is 1024 bytes
#device/dev-io.c:179           /dev/sdb3: Physical block size is 512 bytes
#device/dev-io.c:96            Read  /dev/sdb3:    2048 bytes (sync) at 0 (for PV labels)
#label/label.c:167         /dev/sdb3: lvm2 label detected at sector 1
#cache/lvmcache.c:2171    WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx on /dev/sdb3 was already found on /dev/md1.

This causes lvm to incorrectly think these devices are duplicates, and lvm tries to protect you from using the wrong storage by refusing to activate LVs until the duplicates are resolved.  Until there's an lvm update to fix this, there are a couple ways to avoid this problem:

1. you can add add the md components to the global_filter in lvm.conf:
   global_filter = [ "r|/dev/sda3|", "r|/dev/sdb3|" ]

2. you can disable lvm's protection against changes with duplicates (since these are not actual duplicates):
   allow_changes_with_duplicate_pvs=1

I'd start with 1 and see if that works.  With 2 you'd still see the annoying duplicate warnings.

Comment 8 Wolfgang Denk 2018-06-12 08:03:31 UTC
Created attachment 1450350 [details]
Output of "lvm vgchange -ay -vvvv" under Fedora 28 dracut emergency shell with global_filter set

Comment 9 Wolfgang Denk 2018-06-12 08:07:18 UTC
Thanks, David!

Workaround 1 is not working for me.  I verified that the setting is copied into the ramdisk image, and you can actually see it in the "vgchange -ay -vvvv" output; see attachment 1450350 [details] .  However, the end result is still this:

dracut:/# lvm vgchange -ay
  WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx on /dev/sda3 was already found on /dev/md1.
  WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx on /dev/sdb3 was already found on /dev/md1.
  WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx prefers device /dev/md1 because device size is correct.
  WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx prefers device /dev/md1 because device size is correct.
  Cannot activate LVs in VG atlas1 while PVs appear on duplicate devices.
  Cannot activate LVs in VG atlas1 while PVs appear on duplicate devices.
  Cannot activate LVs in VG atlas1 while PVs appear on duplicate devices.
  Cannot activate LVs in VG atlas1 while PVs appear on duplicate devices.
  0 logical volume(s) in volume group "atlas1" now active
  6 logical volume(s) in volume group "atlas0" now active


Workaround 2 works for me.

Comment 10 Wolfgang Denk 2018-06-12 08:14:56 UTC
Re workaround 1: this part of the vgchange output looks pretty suspicious to me:
...
#filters/filter-regex.c:172           /dev/sda3: Skipping (regex)
#device/dev-cache.c:1536          /dev/sda3: Using device (8:3)
#label/label.c:286           Reading label from device /dev/sda3
#device/dev-io.c:599           Opened /dev/sda3 RO O_DIRECT
#device/dev-io.c:168           /dev/sda3: Block size is 1024 bytes
#device/dev-io.c:179           /dev/sda3: Physical block size is 512 bytes
#device/dev-io.c:96            Read  /dev/sda3:    2048 bytes (sync) at 0 (for PV labels)
#label/label.c:167         /dev/sda3: lvm2 label detected at sector 1
#cache/lvmcache.c:2171    WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx on /dev/sda3 was already found on /dev/md1.

It says it is skipping /dev/sda3 - but it doesn't!!

Ditto for the other partition:

#filters/filter-regex.c:172           /dev/sdb3: Skipping (regex)
#device/dev-cache.c:1536          /dev/sdb3: Using device (8:19)
#label/label.c:286           Reading label from device /dev/sdb3
#device/dev-io.c:599           Opened /dev/sdb3 RO O_DIRECT
#device/dev-io.c:168           /dev/sdb3: Block size is 1024 bytes
#device/dev-io.c:179           /dev/sdb3: Physical block size is 512 bytes
#device/dev-io.c:96            Read  /dev/sdb3:    2048 bytes (sync) at 0 (for PV labels)
#label/label.c:167         /dev/sdb3: lvm2 label detected at sector 1
#cache/lvmcache.c:2171    WARNING: PV ljJ72U-Nfie-xcXH-cObX-Sh2W-vDLZ-aR05lx on /dev/sdb3 was already found on /dev/md1.

Comment 11 Zdenek Kabelac 2018-06-13 09:31:13 UTC
Hmmm this mystical filtering issue  ring some bell here -  bug 1575762.


Thought ATM I'm not sure - if this gcc has been already fixed and just rebuilding lvm2 with more recent version fixes it - or something more needs to be done here.

Comment 12 Wolfgang Denk 2018-06-24 14:09:01 UTC
I confirm that lvm2-2.02.177-5.fc28 (currently in the Fedora 28 testing repository) fixes the problem for me - thanks!

Comment 13 Fedora Update System 2018-06-25 08:12:02 UTC
lvm2-2.02.177-5.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-4fe9bf6535

Comment 14 Fedora Update System 2018-06-26 17:34:59 UTC
lvm2-2.02.177-5.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Norbert Jurkeit 2018-07-08 09:44:16 UTC
*** Bug 1576830 has been marked as a duplicate of this bug. ***

Comment 16 Karl Kowallis 2018-11-20 07:32:27 UTC
This bug recurred when upgrading from Fedora 28 to 29.
Workaround 2 allowed the system to boot but I still got the errors about duplicate volumes.

After booting successfully I reverted the workaround 2 change and then rebooted. It failed again. Now the workaround 2 solution doesn't work.

Where should this be reported?

Comment 17 Marian Csontos 2018-11-20 10:31:08 UTC
Are you using md version 1.0 with the header at the end of the device? There is  a known issue affecting filtering of these md devices:

  https://www.redhat.com/archives/linux-lvm/2018-October/msg00001.html

IIUC at the moment the only solution is to filter out the disks under such MD devices using global_filter, and is preferred over the workaround 2. Does that help?

If global_filter does not work, open a new bug, please. Attach a log from the failing command with `-vvvv` option. Output of lsblk and blkid would be helpful too.

Comment 18 Karl Kowallis 2018-12-05 04:56:00 UTC
I was using 0.90 metadata. I had an old raid 1 set that was previously set up to boot even with a failed disk. It isn't my boot device any more, but it is still the same array. 

The Fedora live boot didn't properly detect the volumes. I was able to edit lvm.conf using the global_filter setting to get the volumes recognized and then I went through the blivit-gui setup.

On reboot everything failed. The lvm.conf did not have a global_filter setting and so the same problem that had occurred with the live CD image happened with my rebooted upgrade. However, because it was a first boot the root account was locked and I had to boot into the live image to edit the lvm.conf and the mdadm.conf files. I kept having problems and finally gave up and just reinstalled but without using the pre-existing /home or /var partitions on my problematic raid1 lvm setup. Once I was up and running I was able to configure them to work by properly editing the setup files.

I can understand the problem with the old 0.90 metadata and the new detection methods. However, a bit of warning would have been nice. If I was less technically savvy I might have been completely stuck.

In hindsight I eventually abandoned my raid1/lvm setup and am now using a two disk btrfs in raid1 setup. It is less headache and will be easier when I modify my hardware in the future. My existing disks are getting a bit old.

Since this is based on a known issue I don't think there is more I need to provide. However if I need to I can recreate my environment in a vm and collect any needed info.

Comment 19 David Teigland 2018-12-05 16:21:25 UTC
Sorry about the trouble from this, there's a big gap in our testing when it comes to system startup/booting, and that's where bugs can be especially hard to work around and debug.  I've made some more recent improvements for detecting md components which I'm hoping will close the remaining gaps.

Comment 20 Fedora Update System 2018-12-07 15:09:41 UTC
lvm2-2.02.183-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-4f678211c1

Comment 21 Randy Barlow 2018-12-11 17:04:09 UTC
A Fedora update associated with this bug has been pushed to the stable repository.

Comment 22 Randy Barlow 2018-12-14 20:41:19 UTC
A Fedora update associated with this bug has been pushed to the stable repository.


Note You need to log in before you can comment on or make changes to this bug.