Description of problem: I upgraded my system from Fedora 16 to Fedora 19. While running Fedora 16, I created new logical volumes to contain Fedora 19 /, /usr and /var. I installed Fedora 19 into those new logical volumes, and then rebooted into Fedora 19. Under Fedora 19, I get errors from all the lvm2 commands, and they appear not to see some of the logical volumes. However, the system is able to mount those volumes. Here are the errors: [root@ti124 ~]# pvs No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw. PV VG Fmt Attr PSize PFree /dev/md127 vg_sys lvm2 a-- 14.50t 3.63t [root@ti124 ~]# vgs No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw. VG #PV #LV #SN Attr VSize VFree vg_sys 1 10 0 wz--n- 14.50t 3.63t [root@ti124 ~]# lvs No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw. No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw. LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert extra_disk vg_sys -wi-----p 23.38g root vg_sys -wi-----p 1.12g usr vg_sys -wi-----p 20.00g var vg_sys -wi-----p 1.12g I rebooted back into Fedora 16, and there is no problem when using Fedora 16: [root@ti124 ~]# uname -r 3.6.11-4.fc16.x86_64 [root@ti124 ~]# rpm -q lvm2 lvm2-2.02.86-6.fc16.x86_64 [root@ti124 ~]# pvs PV VG Fmt Attr PSize PFree /dev/md125 vg_sys lvm2 a-- 14.50t 3.63t [root@ti124 ~]# vgs VG #PV #LV #SN Attr VSize VFree vg_sys 1 10 0 wz--n- 14.50t 3.63t [root@ti124 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert archive vg_sys -wi-ao 10.15t extra_disk vg_sys -wi-ao 73.38g f16_pgsql vg_sys -wi-ao 411.00g f16_root vg_sys -wi-ao 1.62g f16_usr vg_sys -wi-ao 22.00g f16_var vg_sys -wi-ao 1.62g f19_root vg_sys -wi-a- 1.62g f19_usr vg_sys -wi-a- 22.00g f19_var vg_sys -wi-a- 1.62g mirror vg_sys -wi-ao 200.00g Version-Release number of selected component (if applicable): lvm2-2.02.98-12.fc19.x86_64 How reproducible: I am not certain. I followed the same upgrade procedure on over 10 other machines, and none of them shows this problem. I am not sure what went wrong here. Steps to Reproduce: 1. While running Fedora 16, add some new logical volumes. 2. Install Fedora 19 into the new logical volumes. 3. Reboot into Fedora 19 and run pvs, vgs, and lvs. Actual results: [root@ti124 ~]# pvs No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw. PV VG Fmt Attr PSize PFree /dev/md127 vg_sys lvm2 a-- 14.50t 3.63t [root@ti124 ~]# vgs No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw. VG #PV #LV #SN Attr VSize VFree vg_sys 1 10 0 wz--n- 14.50t 3.63t [root@ti124 ~]# lvs No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw. No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw. LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert extra_disk vg_sys -wi-----p 23.38g root vg_sys -wi-----p 1.12g usr vg_sys -wi-----p 20.00g var vg_sys -wi-----p 1.12g Expected results: [root@ti124 ~]# pvs PV VG Fmt Attr PSize PFree /dev/md125 vg_sys lvm2 a-- 14.50t 3.63t [root@ti124 ~]# vgs VG #PV #LV #SN Attr VSize VFree vg_sys 1 10 0 wz--n- 14.50t 3.63t [root@ti124 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert archive vg_sys -wi-ao 10.15t extra_disk vg_sys -wi-ao 73.38g f16_pgsql vg_sys -wi-ao 411.00g f16_root vg_sys -wi-ao 1.62g f16_usr vg_sys -wi-ao 22.00g f16_var vg_sys -wi-ao 1.62g f19_root vg_sys -wi-a- 1.62g f19_usr vg_sys -wi-a- 22.00g f19_var vg_sys -wi-a- 1.62g mirror vg_sys -wi-ao 200.00g Additional info:
F19 uses lvmetad by default (enabled by global/use_lvmetad=1 option in /etc/lvm/lvm.conf) - please check this is also your case. Also, while using lvmetad, LVM switches to event-based activation of the volumes which means that the VGs/LVs are activated once all the PVs that belong to the VG are present in the system. A few questions: - was the missing PV an MD device? (please, include cat /proc/mdstat) - what is the actual device layout (please, include lsblk output for *working* scenario, if possible) - if global/use_lvmetad=1 is used, does calling "pvscan --cache" help? We already have some bug reports for LVM over MD at the moment for which there should be a new package released this week (for both dracut and lvm2). But first let's see if this is also an instance of the existing problem...
Yes, global/use_lvmetad is set to 1: bash-4.2$ grep 'use_lvmetad =' /etc/lvm/lvm.conf use_lvmetad = 1 Yes, the PV is an MD device: bash-4.2$ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md125 : active raid6 sdg3[9] sdh3[6] sdj3[8] sdi3[7] sdc3[2] sdd3[3] sdb3[1] sda3[0] sdf3[5] sde3[4] 15570423808 blocks level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU] md126 : active (auto-read-only) raid6 sdg2[9] sdh2[6] sdj2[8] sdi2[7] sdc2[2] sdb2[1] sdf2[5] sdd2[3] sda2[0] sde2[4] 49278976 blocks level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU] md127 : active raid1 sdg1[9] sdh1[6] sdj1[8] sdi1[7] sdc1[2] sde1[4] sda1[0] sdd1[3] sdb1[1] sdf1[5] 1049536 blocks [10/10] [UUUUUUUUUU] unused devices: <none> The lsblk output appears to be correct. As I mentioned, some of the logical volumes that are missing in the "lvs" output were successfully mounted: [root@ti124 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sda2 8:2 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sda3 8:3 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var sdb 8:16 0 1.8T 0 disk ├─sdb1 8:17 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sdb2 8:18 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sdb3 8:19 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var sdc 8:32 0 1.8T 0 disk ├─sdc1 8:33 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sdc2 8:34 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sdc3 8:35 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var sdd 8:48 0 1.8T 0 disk ├─sdd1 8:49 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sdd2 8:50 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sdd3 8:51 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var sde 8:64 0 1.8T 0 disk ├─sde1 8:65 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sde2 8:66 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sde3 8:67 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var sdf 8:80 0 1.8T 0 disk ├─sdf1 8:81 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sdf2 8:82 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sdf3 8:83 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var sdg 8:96 0 1.8T 0 disk ├─sdg1 8:97 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sdg2 8:98 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sdg3 8:99 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var sdh 8:112 0 1.8T 0 disk ├─sdh1 8:113 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sdh2 8:114 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sdh3 8:115 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var sdi 8:128 0 1.8T 0 disk ├─sdi1 8:129 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sdi2 8:130 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sdi3 8:131 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var sdj 8:144 0 1.8T 0 disk ├─sdj1 8:145 0 1G 0 part │ └─md127 9:127 0 1G 0 raid1 /boot ├─sdj2 8:146 0 5.9G 0 part │ └─md126 9:126 0 47G 0 raid6 [SWAP] └─sdj3 8:147 0 1.8T 0 part └─md125 9:125 0 14.5T 0 raid6 ├─vg_sys-f19_root 253:0 0 1.6G 0 lvm / ├─vg_sys-f19_usr 253:1 0 22G 0 lvm /usr ├─vg_sys-extra_disk 253:2 0 73.4G 0 lvm /extra_disk ├─vg_sys-archive 253:3 0 10.2T 0 lvm /extra_disk/archive ├─vg_sys-mirror 253:4 0 200G 0 lvm /nfs/.mirror ├─vg_sys-f16_root 253:5 0 1.6G 0 lvm ├─vg_sys-f16_usr 253:6 0 22G 0 lvm ├─vg_sys-f16_var 253:7 0 1.6G 0 lvm ├─vg_sys-f16_pgsql 253:8 0 411G 0 lvm /var/lib/pgsql └─vg_sys-f19_var 253:9 0 1.6G 0 lvm /var I tried running "pvscan --cache": [root@ti124 ~]# pvscan --cache WARNING: Duplicate VG name vg_sys: Existing 8uJjK1-L1fe-DvsY-XsTv-hsq7-SXrl-pqK7Rt (created here) takes precedence over vH9L43-qz3Y-HqXo-nNPv-0IhI-NmMw-1NFtsF That did not fix the errors.
(In reply to Andrew J. Schorr from comment #2) > I tried running "pvscan --cache": > > [root@ti124 ~]# pvscan --cache > WARNING: Duplicate VG name vg_sys: Existing > 8uJjK1-L1fe-DvsY-XsTv-hsq7-SXrl-pqK7Rt (created here) takes precedence over > vH9L43-qz3Y-HqXo-nNPv-0IhI-NmMw-1NFtsF > I'd say this is the source of the problem here - there seem to be two VGs detected with the same name (though I can't tell now directly why it's seen only on F16, but not F19 - we'll need to inspect more here). Please, attach also the debug output for pvscan --cache command: pvscan --cache -vvvv Also the output of: "pvs -o+vg_uuid -vvvv" Run these on F16 as well as F19 that is failing. We can compare then and also we should see where the metadata are found exactly on disks. Thanks.
> Please, attach also the debug output for pvscan --cache command: > pvscan --cache -vvvv > > Also the output of: > "pvs -o+vg_uuid -vvvv" > > Run these on F16 as well as F19 that is failing. We can compare then and > also we should see where the metadata are found exactly on disks. Thanks. (well, the pvscan --cache is not available on F16 as there's no lvmetad, so please do pvscan and pvs on F19 and only that pvs on F16)
Created attachment 812563 [details] F16 output of "pvs -o+vg_uuid -vvvv"
Created attachment 812564 [details] F19 output of "pvs -o+vg_uuid -vvvv"
Created attachment 812565 [details] F19 output of "pvscan --cache -vvvv"
I just noticed that this "No device found for PV" is also occurring on 2 other systems running F19. So it's happening on a total of 3 out of 19 systems running F19 at my site. On the other 2 systems, it doesn't seem to be impacting the output of lvs, so it is somehow not as severe. Is there any way to fix this corruption? I have no idea where these hidden PVs came from...
Seems there some stale metadata that the newer version of LVM see and old LVM not. I need to have a more closer look (I couldn't today as I was busy with something else), but I'll dig deeper tomorrow surely. What I see from the logs is that the other metadata that the new LVM sees is: #format_text/format-text.c:1190 /dev/sdb3: Found metadata at 329728 size 3426 (in area at 4096 size 4190208) for vg_sys (8uJjK1-L1fe-DvsY-XsTv-hsq7-SXrl-pqK7Rt) #format_text/format-text.c:1190 /dev/md127: Found metadata at 329728 size 3426 (in area at 4096 size 4190208) for vg_sys (8uJjK1-L1fe-DvsY-XsTv-hsq7-SXrl-pqK7Rt) #format_text/format-text.c:1190 /dev/sdj3: Found metadata at 9728 size 1663 (in area at 4096 size 4190208) for vg_sys (vH9L43-qz3Y-HqXo-nNPv-0IhI-NmMw-1NFtsF) The /dev/md127 is the proper one, and /dev/sdj3 and /dev/sdb3 are incorrect ones as they're MD components and they should have been filtered out. You may try using "global/use_lvmetad=0" if that helps in any way (since filtering is processed a bit differently when lvmetad is used). Anyway, this is a bug and need to be resolved... I'll have a more closer look tomorrow. I'll let you know about my findings (I'll try to reproduce).
FYI, if I follow your suggestion and set "global/use_lvmetad=0", the problem disappears. I hope that helps with troubleshooting. Does that mean it's a bug in lvmetad?
I installed lvm2-2.02.98-13.fc19.x86_64.rpm, but that did not help. Have you had any luck chasing down this problem? I upgraded a total of 32 hosts to Fedora 19, and 3 have this problem. Thanks, Andy
I'm not sure this matters, but after upgrading to lvm2-2.02.98-13.fc19.x86_64, I see a warning message like this on every system when I run pvs for the first time: Found duplicate PV 3kiIlUfm9gkIdRYSW8SC2hTJlyZccMCG: using /dev/sdb3 not /dev/sda3 If I run it again, the warning message is not repeated. It appears to be some kind of initialization issue with lvmetad. Regards, Andy
I had a similar problem on my machine running F20. The problem in my case appeared when I adjusted the filters. As I'm using this machine to run VMs, LVM was scanning the LVs inside the VMS (/dev/mapper/vg-lv*). Once I adjusted the filters to exclude these, I started to see the message: No device found for PV suykEq-TTrk-eH0c-VnEL-BjX3-tWS7-TpQndc. No device found for PV w255qb-cxYd-QY5l-jdSy-Cih7-6uSy-9oO3Tx. No device found for PV AfNjWm-X1hk-ZM4R-aag6-WLnz-ja9V-GYfDXH. No device found for PV EssdTJ-nzJD-dHuF-WHLB-FgBz-76Qf-5JNwqt. [...] The solution was to adjust the global_filter to match the one in 'filter'. Once this was set, I no more got this issue. I hope this helps.
Andrew, sorry for the long delay, but somehow I didn't get to this... However, recently, I've found a problem when using MD metadata version 1.0 in conjunction with lvmetad. What is (was) the MD metadata version used in your case? (mdadm --detail <md_dev>) Is it 1.0? If it's 1.0, I think it's the same situations as described in bug #1139216.
Hi Peter, So far, I have seen this problems on 2 of my systems. Here are the metadata versions on those 2 systems: Version : 0.90 Version : 1.2 Regards, Andrew
It's actually also a problem on another system, but it has 2 PV volumes, and I don't recall which one was causing trouble. One is version 0.90, and the other is version 1.2.
Please, try these packages if possible: https://prajnoha.fedorapeople.org/rpms/bz1018852/x86_64/ It's the latest upstream git head (test build), including fixes for 1139216 which could help here too I think. Please, give it a try and let me know if it helps.
I set "use_lvmetad = 1" on one of the problem hosts. I then ran pvs to confirm that the problem still exists with the current software: [root@ti5 ~]# pvs Found duplicate PV dCufSyINaVhF3FscyKCZZp1yCbgyqO03: using /dev/sdb3 not /dev/sda3 Incorrect metadata area header checksum on /dev/sdc1 at offset 4096 Incorrect metadata area header checksum on /dev/sdc1 at offset 4096 Found duplicate PV ATG7FD8U723qNiYpYrxzb8fzLT2vpUcm: using /dev/sdd1 not /dev/sdc1 Incorrect metadata area header checksum on /dev/sdd1 at offset 4096 Incorrect metadata area header checksum on /dev/sdd1 at offset 4096 No device found for PV ATG7FD-8U72-3qNi-YpYr-xzb8-fzLT-2vpUcm. PV VG Fmt Attr PSize PFree /dev/md127 vg_os lvm2 a-- 144.50g 85.88g /dev/md128 vg_ext lvm2 a-- 1.36t 927.25g Note that /dev/md127 has metadata version 0.90 and contains sda3 and sdb3. But md128 has version 1.2 and devices sdc1 and sdd1. I then reran pvs: [root@ti5 ~]# pvs No device found for PV ATG7FD-8U72-3qNi-YpYr-xzb8-fzLT-2vpUcm. PV VG Fmt Attr PSize PFree /dev/md127 vg_os lvm2 a-- 144.50g 85.88g /dev/md128 vg_ext lvm2 a-- 1.36t 927.25g This is with lvm2-2.02.98-13.fc19.x86_64. I then upgraded to lvm2-2.02.98-15.fc19.x86_64 to see if that helped, since that has already been pushed for F19. It gives the same error: [root@ti5 ~]# pvs Found duplicate PV dCufSyINaVhF3FscyKCZZp1yCbgyqO03: using /dev/sdb3 not /dev/sda3 Incorrect metadata area header checksum on /dev/sdc1 at offset 4096 Incorrect metadata area header checksum on /dev/sdc1 at offset 4096 Found duplicate PV ATG7FD8U723qNiYpYrxzb8fzLT2vpUcm: using /dev/sdd1 not /dev/sdc1 Incorrect metadata area header checksum on /dev/sdd1 at offset 4096 Incorrect metadata area header checksum on /dev/sdd1 at offset 4096 No device found for PV ATG7FD-8U72-3qNi-YpYr-xzb8-fzLT-2vpUcm. PV VG Fmt Attr PSize PFree /dev/md127 vg_os lvm2 a-- 144.50g 85.88g /dev/md128 vg_ext lvm2 a-- 1.36t 927.25g I then installed these rpms from the link you supplied: device-mapper-1.02.90-0.1.fc19.x86_64.rpm device-mapper-devel-1.02.90-0.1.fc19.x86_64.rpm device-mapper-event-1.02.90-0.1.fc19.x86_64.rpm device-mapper-event-libs-1.02.90-0.1.fc19.x86_64.rpm device-mapper-libs-1.02.90-0.1.fc19.x86_64.rpm lvm2-2.02.111-0.1.fc19.x86_64.rpm lvm2-libs-2.02.111-0.1.fc19.x86_64.rpm And now the problem is gone: [root@ti5 ~]# pvs PV VG Fmt Attr PSize PFree /dev/md127 vg_os lvm2 a-- 144.50g 85.88g /dev/md128 vg_ext lvm2 a-- 1.36t 927.25g I then decided to reboot it to make sure it works OK across a reboot. Unfortunately, it did not. I haven't had a chance to troubleshoot the system yet, because I am out of the office, but the boot error messages include these: md128: unknown partition table ... Timed out waiting for device dev-disk-by\x2duuid-85d4ae4d\x2d8ae3\x2d4165\x2db... I will update when I am able to recover the system. -Andy
(...also try regenerating initramfs image - by calling "dracut")
That was my first idea as well -- to run dracut in a rescue shell and hope that it would fix the problem. Unfortunately, it does not. For some reason, it is unable to find the /var filesystem (which is on an LV), and it cannot find another local filesystem as well. In the emergency shell, I edited /etc/lvm/lvm.conf and changed "use_lvmetad" from 1 to 0. I rebooted, and everything worked fine. FYI, I see these errors in the journal: Sep 18 10:23:11 ti5 systemd[1]: [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue 'RemoveOnStop' in section 'Socket' Sep 18 10:23:11 ti5 systemd[1]: [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop' in section 'Socket' I don't know if this matters, but I doubt it, since systemctl status for lvm2-lvmetad.socket looks good. I'm not sure how to troubleshoot further, but there's definitely a problem here. When I ran "lvs" in the rescue shell, I think it saw all the LVs. So I'm not sure why they weren't found. The error message was "Timed out waiting for device dev-disk-by\x2duuid-<hex junk>". That appeared twice -- once for /var, and once for my other local scratch filesystem. There did not seem to be any problem with the / and /usr LVs. Regards, Andy
(In reply to Andrew J. Schorr from comment #20) > That was my first idea as well -- to run dracut in a rescue shell and hope > that it would fix the problem. > Hmm, that's odd. Could you grab some more info for me (having this new version installed) - put back "use_lvmetad=1" and then reboot - if the problem/timeout appears and you're dropped to the emergency shell, try running "lvmdump -l -s -u" and attach the lvmdump-....tgz here (or mail it to me directly). Thanks.
Created attachment 939389 [details] tarball from "lvmdump -l -s -u" in the emergency shell It may be odd, but it is very repeatable. It has happened at least 5 or 6 times. It never boots successfully with "use_lvmetad = 1". I hope the tarball helps.
Ah, sorry - the systemctl is a bit older in F19, so I had to modify that a bit so it logs things properly. Could you please rerun the lvmdump (comment #21) with these new packages? (sorry for the wrong packages before) https://prajnoha.fedorapeople.org/rpms/bz1018852/x86_64/
Created attachment 940883 [details] new tarball from "lvmdump -l -s -u" in the emergency shell I installed these rpms: device-mapper-1.02.90-0.3.fc19.x86_64.rpm device-mapper-devel-1.02.90-0.3.fc19.x86_64.rpm device-mapper-event-1.02.90-0.3.fc19.x86_64.rpm device-mapper-event-libs-1.02.90-0.3.fc19.x86_64.rpm device-mapper-libs-1.02.90-0.3.fc19.x86_64.rpm lvm2-2.02.111-0.3.fc19.x86_64.rpm lvm2-libs-2.02.111-0.3.fc19.x86_64.rpm I enabled use_lvmetad and then rebooted. It hung as usual. This is the tarball from "lvmdump -l -s -u". I hope this helps.
So, based on the logs: - udev database content: MD array "md128": ================= P: /devices/virtual/block/md128 N: md128 ... E: ID_FS_TYPE=LVM2_member ... E: LVM_MD_PV_ACTIVATED=1 E: SYSTEMD_WANTS=lvm2-pvscan@9:128.service MD array "md126": ================= P: /devices/virtual/block/md126 N: md126 ... ID_FS_TYPE=LVM2_member SYSTEMD_READY=0 From above it's clear that both md128 as well as md126 are properly identified as PVs. However, while there's pvscan service instantiated for md128 (which is also visible in systemd logs), the other one - md126 - this one is marked as SYSTEMD_READY=0 and there's no "LVM_MD_PV_ACTIVATED=1" which results in no pvscan service to be instantiated. That's the reason why LVs on md128 PV are not activated. (With lvmetad, this activation is automatic based on incoming device events, while withou lvmetad, there's rough "vgchange -ay" call during boot process that activates what is visible at the time of that call - that's the reason why it doesn't work with lvmetad and it works wihtout it). So we need to find out *why* md126 is not marked as activated MD array. The /lib/udev/rules.d/69-dm-lvm-metad.rules is responsible for this: # MD device: LABEL="next" KERNEL!="md[0-9]*", GOTO="next" IMPORT{db}="LVM_MD_PV_ACTIVATED" ACTION=="add", ENV{LVM_MD_PV_ACTIVATED}=="1", GOTO="lvm_scan" ACTION=="change", ENV{LVM_MD_PV_ACTIVATED}!="1", TEST=="md/array_state",ENV{LVM_MD_PV_ACTIVATED}="1", GOTO="lvm_scan" ACTION=="add", KERNEL=="md[0-9]*p[0-9]*", GOTO="lvm_scan" ENV{LVM_MD_PV_ACTIVATED}!="1", ENV{SYSTEMD_READY}="0" GOTO="lvm_end" The line that is important for us is this one exactly: ACTION=="change", ENV{LVM_MD_PV_ACTIVATED}!="1", TEST=="md/array_state",ENV{LVM_MD_PV_ACTIVATED}="1", GOTO="lvm_scan" ...which checks for the /sys/block/<md_name>/md/array_state existence and if it exists, the MD array is activated. Clearly, this is not the case for some reason. Hence, this rule is applied then: ENV{LVM_MD_PV_ACTIVATED}!="1", ENV{SYSTEMD_READY}="0" GOTO="lvm_end" ...and nothing gets activated then of course. It seems I'll need more debug info from udev daemon to see why this rule failed, but I'll try a few things first. I'll then ask you for more info later on...
Could please try this? (with use_lvmetad=0) - running the "lvmdump -l -s -u" when the machine is booted completely (with use_lvmetad=1) - adding "--debug" to /lib/systemd/system/systemd-udevd.service's Exec Start line: ExecStart=/usr/lib/systemd/systemd-udevd --debug - adding "debug" option to kernel cmd line Then, when the failure happens, try to get the logs and, please, attach it here: systemctl status --full -n 1000000 systemd-udevd.service journalctl -b --full -n 1000000 === Also, it would be great if you could try this in addition: (with use_lvmetad=0 at first) - boot the system - umount the mountpoint (if possible) - the one which is on the MD that caused problems with use_lvmetad=0 - deactivate any LVs on top of the MD (vgchange -an ...) - stop the MD array (mdadm -S) - change lvm.conf to use_lvmetad=1 - run pvscan --cache - run mdadm -I <dev> for each dev that is making up the raid array (the one that makes problems) - check whether LVs get activated automatically after last "mdadm -I <dev>" (Sorry - it's a bit fiddly, I know, but unfortunately I can't reproduce this in any way - so either this is a very tight race and you're "happy" to have it on your system or some other problem - I hope to see more from the additional debug logs) Thanks!
(In reply to Peter Rajnoha from comment #26) > Also, it would be great if you could try this in addition: > > (with use_lvmetad=0 at first) > - boot the system > - umount the mountpoint (if possible) - the one which is on the MD that > caused problems with use_lvmetad=0 I meant "...on the MD that caused problems with_use_lvmetad=1"
Created attachment 945130 [details] "lvmdump -l -s -u" from running system with use_lvmetad=0 Hi Peter, Here is the first item you requested: (with use_lvmetad=0) - running the "lvmdump -l -s -u" when the machine is booted completely I'll see if I can get further items. This is unfortunately a desktop system, not a rack-mount server. So it's rather painful to conduct these experiments. Regards, Andy
Created attachment 945148 [details] systemctl status --full -n 1000000 systemd-udevd.service I set use_lvmetad=1 and rebooted. This is the output in the emergency shell of "systemctl status --full -n 1000000 systemd-udevd.service"
Created attachment 945149 [details] journalctl -b --full -n 1000000 I rebooted with use_lvmetad=1. Here is the output in the emergency shell from "journalctl -b --full -n 1000000".
Hi Peter, I'm not sure I can follow the rest of your instructions. You asked me to umount the problematic mount point. But one of those is /var. So how will that work? I could maybe do it with the other mount point, but I'm not sure whether that will be of value. Please advise. Regards, Andy
Hi, I have a similar problem on Fedora 20. My configuration : 2 identical hard disks with partitionned 2 md volumes on those disks. 2 volumes groups named raid0 and raid1. Those are OK. One SSD disks, with a volume group inside a partition (sdb2), named ssd. This volume group contains a LV named root (of course root of the system), which contains /boot. Problem : ssd volume is detected by grub and mounted during the boot process. However, the physical volume sdb2 and the volume group ssd aren't accessibles, whilst the logical volume ssd-root remains mounted under /dev/mapper. sudo pvck /dev/sdb2 -> Device /dev/sdb2 not found (or ignored by filtering). If I use rescuecd, the ssd volume is accessible. Thank you for your help, Regards,
Shame on me, the problem was my fault. I was modifying lvm. conf to fet rid of duplicate PV, and I filtered sd[abc] instead of sd[ac]. Could you send me a brown bag ? I still have a Warning : lvmetad is running but disabled. Restart lvmetad before enabling it! that occurs when the system shuts down. I think that's link with the fact that a disk remains busy.. Oct 11 02:05:42 pierre umount: umount: /usr/local : cible occupée Oct 11 02:05:42 pierre umount: (Dans certains cas, des renseignements sur les processus utilisant Oct 11 02:05:42 pierre umount: le périphérique sont accessibles avec lsof(8) ou fuser(1).) Oct 11 02:05:42 pierre lvm: WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it! Regards,
This message is a notice that Fedora 19 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 19. It is Fedora's policy to close all bug reports from releases that are no longer maintained. Approximately 4 (four) weeks from now this bug will be closed as EOL if it remains open with a Fedora 'version' of '19'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 19 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.