1018852 – lvm2 commands reporting "No device found for PV" and cannot find some logical volumes

Bug 1018852 - lvm2 commands reporting "No device found for PV" and cannot find some logical volumes

Summary: lvm2 commands reporting "No device found for PV" and cannot find some logical...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	lvm2
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Peter Rajnoha
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-10-14 14:20 UTC by Andrew J. Schorr
Modified:	2015-02-18 11:36 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-02-18 11:36:23 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
F16 output of "pvs -o+vg_uuid -vvvv" (66.65 KB, text/plain) 2013-10-15 14:47 UTC, Andrew J. Schorr	no flags	Details
F19 output of "pvs -o+vg_uuid -vvvv" (32.61 KB, text/plain) 2013-10-15 14:48 UTC, Andrew J. Schorr	no flags	Details
F19 output of "pvscan --cache -vvvv" (60.79 KB, text/plain) 2013-10-15 14:48 UTC, Andrew J. Schorr	no flags	Details
tarball from "lvmdump -l -s -u" in the emergency shell (148.32 KB, application/octet-stream) 2014-09-19 15:55 UTC, Andrew J. Schorr	no flags	Details
new tarball from "lvmdump -l -s -u" in the emergency shell (173.40 KB, application/x-gzip) 2014-09-24 19:55 UTC, Andrew J. Schorr	no flags	Details
"lvmdump -l -s -u" from running system with use_lvmetad=0 (224.04 KB, application/x-gzip) 2014-10-08 18:40 UTC, Andrew J. Schorr	no flags	Details
systemctl status --full -n 1000000 systemd-udevd.service (53.06 KB, text/plain) 2014-10-08 19:45 UTC, Andrew J. Schorr	no flags	Details
journalctl -b --full -n 1000000 (1.78 MB, text/plain) 2014-10-08 19:46 UTC, Andrew J. Schorr	no flags	Details
View All

Description Andrew J. Schorr 2013-10-14 14:20:21 UTC

Description of problem: I upgraded my system from Fedora 16 to Fedora 19.  While running Fedora 16, I created new logical volumes to contain Fedora 19 /, /usr and /var.  I installed Fedora 19 into those new logical volumes, and then rebooted
into Fedora 19.  Under Fedora 19, I get errors from all the lvm2 commands, and
they appear not to see some of the logical volumes.  However, the system is able
to mount those volumes.  Here are the errors:

[root@ti124 ~]# pvs
  No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw.
  PV         VG     Fmt  Attr PSize  PFree
  /dev/md127 vg_sys lvm2 a--  14.50t 3.63t
[root@ti124 ~]# vgs
  No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw.
  VG     #PV #LV #SN Attr   VSize  VFree
  vg_sys   1  10   0 wz--n- 14.50t 3.63t
[root@ti124 ~]# lvs
  No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw.
  No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw.
  LV         VG     Attr      LSize  Pool Origin Data%  Move Log Copy%  Convert
  extra_disk vg_sys -wi-----p 23.38g
  root       vg_sys -wi-----p  1.12g
  usr        vg_sys -wi-----p 20.00g
  var        vg_sys -wi-----p  1.12g

I rebooted back into Fedora 16, and there is no problem when using Fedora 16:

[root@ti124 ~]# uname -r
3.6.11-4.fc16.x86_64
[root@ti124 ~]# rpm -q lvm2
lvm2-2.02.86-6.fc16.x86_64
[root@ti124 ~]# pvs
  PV         VG     Fmt  Attr PSize  PFree
  /dev/md125 vg_sys lvm2 a--  14.50t 3.63t
[root@ti124 ~]# vgs
  VG     #PV #LV #SN Attr   VSize  VFree
  vg_sys   1  10   0 wz--n- 14.50t 3.63t
[root@ti124 ~]# lvs
  LV         VG     Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  archive    vg_sys -wi-ao  10.15t                                      
  extra_disk vg_sys -wi-ao  73.38g                                      
  f16_pgsql  vg_sys -wi-ao 411.00g                                      
  f16_root   vg_sys -wi-ao   1.62g                                      
  f16_usr    vg_sys -wi-ao  22.00g                                      
  f16_var    vg_sys -wi-ao   1.62g                                      
  f19_root   vg_sys -wi-a-   1.62g                                      
  f19_usr    vg_sys -wi-a-  22.00g                                      
  f19_var    vg_sys -wi-a-   1.62g                                      
  mirror     vg_sys -wi-ao 200.00g                                      



Version-Release number of selected component (if applicable):
lvm2-2.02.98-12.fc19.x86_64


How reproducible: I am not certain.  I followed the same upgrade procedure
on over 10 other machines, and none of them shows this problem.  I am
not sure what went wrong here.


Steps to Reproduce:
1. While running Fedora 16, add some new logical volumes.
2. Install Fedora 19 into the new logical volumes.
3. Reboot into Fedora 19 and run pvs, vgs, and lvs.

Actual results:

[root@ti124 ~]# pvs
  No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw.
  PV         VG     Fmt  Attr PSize  PFree
  /dev/md127 vg_sys lvm2 a--  14.50t 3.63t
[root@ti124 ~]# vgs
  No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw.
  VG     #PV #LV #SN Attr   VSize  VFree
  vg_sys   1  10   0 wz--n- 14.50t 3.63t
[root@ti124 ~]# lvs
  No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw.
  No device found for PV 3oDlQ9-VdGf-ZDsg-ADZs-pz0K-rNWx-vnaYmw.
  LV         VG     Attr      LSize  Pool Origin Data%  Move Log Copy%  Convert
  extra_disk vg_sys -wi-----p 23.38g
  root       vg_sys -wi-----p  1.12g
  usr        vg_sys -wi-----p 20.00g
  var        vg_sys -wi-----p  1.12g


Expected results:
[root@ti124 ~]# pvs
  PV         VG     Fmt  Attr PSize  PFree
  /dev/md125 vg_sys lvm2 a--  14.50t 3.63t
[root@ti124 ~]# vgs
  VG     #PV #LV #SN Attr   VSize  VFree
  vg_sys   1  10   0 wz--n- 14.50t 3.63t
[root@ti124 ~]# lvs
  LV         VG     Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  archive    vg_sys -wi-ao  10.15t                                      
  extra_disk vg_sys -wi-ao  73.38g                                      
  f16_pgsql  vg_sys -wi-ao 411.00g                                      
  f16_root   vg_sys -wi-ao   1.62g                                      
  f16_usr    vg_sys -wi-ao  22.00g                                      
  f16_var    vg_sys -wi-ao   1.62g                                      
  f19_root   vg_sys -wi-a-   1.62g                                      
  f19_usr    vg_sys -wi-a-  22.00g                                      
  f19_var    vg_sys -wi-a-   1.62g                                      
  mirror     vg_sys -wi-ao 200.00g                                      



Additional info:

Comment 1 Peter Rajnoha 2013-10-14 15:17:17 UTC

F19 uses lvmetad by default (enabled by global/use_lvmetad=1 option in /etc/lvm/lvm.conf) - please check this is also your case. Also, while using lvmetad, LVM switches to event-based activation of the volumes which means that the VGs/LVs are activated once all the PVs that belong to the VG are present in the system.

A few questions:
 - was the missing PV an MD device? (please, include cat /proc/mdstat)
 - what is the actual device layout (please, include lsblk output for *working* scenario, if possible)
 - if global/use_lvmetad=1 is used, does calling "pvscan --cache" help?

We already have some bug reports for LVM over MD at the moment for which there should be a new package released this week (for both dracut and lvm2).  But first let's see if this is also an instance of the existing problem...

Comment 2 Andrew J. Schorr 2013-10-14 15:29:09 UTC

Yes, global/use_lvmetad is set to 1:

bash-4.2$ grep 'use_lvmetad =' /etc/lvm/lvm.conf
    use_lvmetad = 1

Yes, the PV is an MD device:

bash-4.2$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] 
md125 : active raid6 sdg3[9] sdh3[6] sdj3[8] sdi3[7] sdc3[2] sdd3[3] sdb3[1] sda3[0] sdf3[5] sde3[4]
      15570423808 blocks level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
      
md126 : active (auto-read-only) raid6 sdg2[9] sdh2[6] sdj2[8] sdi2[7] sdc2[2] sdb2[1] sdf2[5] sdd2[3] sda2[0] sde2[4]
      49278976 blocks level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU]
      
md127 : active raid1 sdg1[9] sdh1[6] sdj1[8] sdi1[7] sdc1[2] sde1[4] sda1[0] sdd1[3] sdb1[1] sdf1[5]
      1049536 blocks [10/10] [UUUUUUUUUU]
      
unused devices: <none>

The lsblk output appears to be correct.  As I mentioned, some of the logical volumes that are missing in the "lvs" output were successfully mounted:

[root@ti124 ~]# lsblk
NAME                    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                       8:0    0  1.8T  0 disk  
├─sda1                    8:1    0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sda2                    8:2    0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sda3                    8:3    0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var
sdb                       8:16   0  1.8T  0 disk  
├─sdb1                    8:17   0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sdb2                    8:18   0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sdb3                    8:19   0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var
sdc                       8:32   0  1.8T  0 disk  
├─sdc1                    8:33   0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sdc2                    8:34   0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sdc3                    8:35   0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var
sdd                       8:48   0  1.8T  0 disk  
├─sdd1                    8:49   0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sdd2                    8:50   0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sdd3                    8:51   0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var
sde                       8:64   0  1.8T  0 disk  
├─sde1                    8:65   0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sde2                    8:66   0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sde3                    8:67   0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var
sdf                       8:80   0  1.8T  0 disk  
├─sdf1                    8:81   0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sdf2                    8:82   0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sdf3                    8:83   0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var
sdg                       8:96   0  1.8T  0 disk  
├─sdg1                    8:97   0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sdg2                    8:98   0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sdg3                    8:99   0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var
sdh                       8:112  0  1.8T  0 disk  
├─sdh1                    8:113  0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sdh2                    8:114  0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sdh3                    8:115  0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var
sdi                       8:128  0  1.8T  0 disk  
├─sdi1                    8:129  0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sdi2                    8:130  0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sdi3                    8:131  0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var
sdj                       8:144  0  1.8T  0 disk  
├─sdj1                    8:145  0    1G  0 part  
│ └─md127                 9:127  0    1G  0 raid1 /boot
├─sdj2                    8:146  0  5.9G  0 part  
│ └─md126                 9:126  0   47G  0 raid6 [SWAP]
└─sdj3                    8:147  0  1.8T  0 part  
  └─md125                 9:125  0 14.5T  0 raid6 
    ├─vg_sys-f19_root   253:0    0  1.6G  0 lvm   /
    ├─vg_sys-f19_usr    253:1    0   22G  0 lvm   /usr
    ├─vg_sys-extra_disk 253:2    0 73.4G  0 lvm   /extra_disk
    ├─vg_sys-archive    253:3    0 10.2T  0 lvm   /extra_disk/archive
    ├─vg_sys-mirror     253:4    0  200G  0 lvm   /nfs/.mirror
    ├─vg_sys-f16_root   253:5    0  1.6G  0 lvm   
    ├─vg_sys-f16_usr    253:6    0   22G  0 lvm   
    ├─vg_sys-f16_var    253:7    0  1.6G  0 lvm   
    ├─vg_sys-f16_pgsql  253:8    0  411G  0 lvm   /var/lib/pgsql
    └─vg_sys-f19_var    253:9    0  1.6G  0 lvm   /var

I tried running "pvscan --cache":

[root@ti124 ~]# pvscan --cache
  WARNING: Duplicate VG name vg_sys: Existing 8uJjK1-L1fe-DvsY-XsTv-hsq7-SXrl-pqK7Rt (created here) takes precedence over vH9L43-qz3Y-HqXo-nNPv-0IhI-NmMw-1NFtsF

That did not fix the errors.

Comment 3 Peter Rajnoha 2013-10-15 13:15:46 UTC

(In reply to Andrew J. Schorr from comment #2)
> I tried running "pvscan --cache":
> 
> [root@ti124 ~]# pvscan --cache
>   WARNING: Duplicate VG name vg_sys: Existing
> 8uJjK1-L1fe-DvsY-XsTv-hsq7-SXrl-pqK7Rt (created here) takes precedence over
> vH9L43-qz3Y-HqXo-nNPv-0IhI-NmMw-1NFtsF
> 

I'd say this is the source of the problem here - there seem to be two VGs detected with the same name (though I can't tell now directly why it's seen only on F16, but not F19 - we'll need to inspect more here).

Please, attach also the debug output for pvscan --cache command:
  pvscan --cache -vvvv

Also the output of:
  "pvs -o+vg_uuid -vvvv"

Run these on F16 as well as F19 that is failing. We can compare then and also we should see where the metadata are found exactly on disks. Thanks.

Comment 4 Peter Rajnoha 2013-10-15 13:23:12 UTC

> Please, attach also the debug output for pvscan --cache command:
>   pvscan --cache -vvvv
> 
> Also the output of:
>   "pvs -o+vg_uuid -vvvv"
> 
> Run these on F16 as well as F19 that is failing. We can compare then and
> also we should see where the metadata are found exactly on disks. Thanks.

(well, the pvscan --cache is not available on F16 as there's no lvmetad, so please do pvscan and pvs on F19 and only that pvs on F16)

Comment 5 Andrew J. Schorr 2013-10-15 14:47:14 UTC

Created attachment 812563 [details]
F16 output of "pvs -o+vg_uuid -vvvv"

Comment 6 Andrew J. Schorr 2013-10-15 14:48:00 UTC

Created attachment 812564 [details]
F19 output of "pvs -o+vg_uuid -vvvv"

Comment 7 Andrew J. Schorr 2013-10-15 14:48:45 UTC

Created attachment 812565 [details]
F19 output of "pvscan --cache -vvvv"

Comment 8 Andrew J. Schorr 2013-10-16 14:35:38 UTC

I just noticed that this "No device found for PV" is also occurring
on 2 other systems running F19.  So it's happening on a total of 3 out
of 19 systems running F19 at my site.  On the other 2 systems, it
doesn't seem to be impacting the output of lvs, so it is somehow
not as severe.

Is there any way to fix this corruption?  I have no idea where these
hidden PVs came from...

Comment 9 Peter Rajnoha 2013-10-16 14:53:12 UTC

Seems there some stale metadata that the newer version of LVM see and old LVM not. I need to have a more closer look (I couldn't today as I was busy with something else), but I'll dig deeper tomorrow surely.

What I see from the logs is that the other metadata that the new LVM sees is:

#format_text/format-text.c:1190         /dev/sdb3: Found metadata at 329728 size 3426 (in area at 4096 size 4190208) for vg_sys (8uJjK1-L1fe-DvsY-XsTv-hsq7-SXrl-pqK7Rt)

#format_text/format-text.c:1190         /dev/md127: Found metadata at 329728 size 3426 (in area at 4096 size 4190208) for vg_sys (8uJjK1-L1fe-DvsY-XsTv-hsq7-SXrl-pqK7Rt)

#format_text/format-text.c:1190         /dev/sdj3: Found metadata at 9728 size 1663 (in area at 4096 size 4190208) for vg_sys (vH9L43-qz3Y-HqXo-nNPv-0IhI-NmMw-1NFtsF)


The /dev/md127 is the proper one, and /dev/sdj3 and /dev/sdb3 are incorrect ones as they're MD components and they should have been filtered out. You may try using "global/use_lvmetad=0" if that helps in any way (since filtering is processed a bit differently when lvmetad is used). Anyway, this is a bug and need to be resolved... I'll have a more closer look tomorrow. I'll let you know about my findings (I'll try to reproduce).

Comment 10 Andrew J. Schorr 2013-10-16 20:28:09 UTC

FYI, if I follow your suggestion and set "global/use_lvmetad=0", the
problem disappears.  I hope that helps with troubleshooting.  Does that
mean it's a bug in lvmetad?

Comment 11 Andrew J. Schorr 2013-11-26 13:43:06 UTC

I installed lvm2-2.02.98-13.fc19.x86_64.rpm, but that did not help.  Have you had any luck chasing down this problem?  I upgraded a total of 32 hosts to Fedora 19, and 3 have this problem.
 
Thanks,
Andy

Comment 12 Andrew J. Schorr 2013-11-26 13:45:59 UTC

I'm not sure this matters, but after upgrading to lvm2-2.02.98-13.fc19.x86_64, I see a warning message like this on every system when I run pvs for the first time:

Found duplicate PV 3kiIlUfm9gkIdRYSW8SC2hTJlyZccMCG: using /dev/sdb3 not /dev/sda3


If I run it again, the warning message is not repeated.  It appears to be some kind of initialization issue with lvmetad.

Regards,
Andy

Comment 13 Pierguido Lambri 2014-03-05 17:33:15 UTC

I had a similar problem on my machine running F20.
The problem in my case appeared when I adjusted the filters.
As I'm using this machine to run VMs, LVM was scanning the LVs inside the VMS (/dev/mapper/vg-lv*).

Once I adjusted the filters to exclude these, I started to see the message:

  No device found for PV suykEq-TTrk-eH0c-VnEL-BjX3-tWS7-TpQndc.
  No device found for PV w255qb-cxYd-QY5l-jdSy-Cih7-6uSy-9oO3Tx.
  No device found for PV AfNjWm-X1hk-ZM4R-aag6-WLnz-ja9V-GYfDXH.
  No device found for PV EssdTJ-nzJD-dHuF-WHLB-FgBz-76Qf-5JNwqt.
[...]

The solution was to adjust the global_filter to match the one in 'filter'.
Once this was set, I no more got this issue.

I hope this helps.

Comment 14 Peter Rajnoha 2014-09-08 12:51:08 UTC

Andrew, sorry for the long delay, but somehow I didn't get to this...

However, recently, I've found a problem when using MD metadata version 1.0 in conjunction with lvmetad. What is (was) the MD metadata version used in your case? (mdadm --detail <md_dev>) Is it 1.0?

If it's 1.0, I think it's the same situations as described in bug #1139216.

Comment 15 Andrew J. Schorr 2014-09-08 15:39:36 UTC

Hi Peter,

So far, I have seen this problems on 2 of my systems.  Here are the metadata versions on those 2 systems:

        Version : 0.90
        Version : 1.2

Regards,
Andrew

Comment 16 Andrew J. Schorr 2014-09-08 15:41:42 UTC

It's actually also a problem on another system, but it has 2 PV volumes, and I don't recall which one was causing trouble.  One is version 0.90, and the other is version 1.2.

Comment 17 Peter Rajnoha 2014-09-16 12:14:26 UTC

Please, try these packages if possible:
  https://prajnoha.fedorapeople.org/rpms/bz1018852/x86_64/

It's the latest upstream git head (test build), including fixes for 1139216 which could help here too I think. Please, give it a try and let me know if it helps.

Comment 18 Andrew J. Schorr 2014-09-17 17:13:27 UTC

I set "use_lvmetad = 1" on one of the problem hosts.  I then ran pvs to confirm that the problem still exists with the current software:

[root@ti5 ~]# pvs
  Found duplicate PV dCufSyINaVhF3FscyKCZZp1yCbgyqO03: using /dev/sdb3 not /dev/sda3
  Incorrect metadata area header checksum on /dev/sdc1 at offset 4096
  Incorrect metadata area header checksum on /dev/sdc1 at offset 4096
  Found duplicate PV ATG7FD8U723qNiYpYrxzb8fzLT2vpUcm: using /dev/sdd1 not /dev/sdc1
  Incorrect metadata area header checksum on /dev/sdd1 at offset 4096
  Incorrect metadata area header checksum on /dev/sdd1 at offset 4096
  No device found for PV ATG7FD-8U72-3qNi-YpYr-xzb8-fzLT-2vpUcm.
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/md127 vg_os  lvm2 a--  144.50g  85.88g
  /dev/md128 vg_ext lvm2 a--    1.36t 927.25g

Note that /dev/md127 has metadata version 0.90 and contains sda3 and sdb3.
But md128 has version 1.2 and devices sdc1 and sdd1.

I then reran pvs:

[root@ti5 ~]# pvs
  No device found for PV ATG7FD-8U72-3qNi-YpYr-xzb8-fzLT-2vpUcm.
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/md127 vg_os  lvm2 a--  144.50g  85.88g
  /dev/md128 vg_ext lvm2 a--    1.36t 927.25g

This is with lvm2-2.02.98-13.fc19.x86_64.

I then upgraded to lvm2-2.02.98-15.fc19.x86_64 to see if that helped, since
that has already been pushed for F19.  It gives the same error:

[root@ti5 ~]# pvs
  Found duplicate PV dCufSyINaVhF3FscyKCZZp1yCbgyqO03: using /dev/sdb3 not /dev/sda3
  Incorrect metadata area header checksum on /dev/sdc1 at offset 4096
  Incorrect metadata area header checksum on /dev/sdc1 at offset 4096
  Found duplicate PV ATG7FD8U723qNiYpYrxzb8fzLT2vpUcm: using /dev/sdd1 not /dev/sdc1
  Incorrect metadata area header checksum on /dev/sdd1 at offset 4096
  Incorrect metadata area header checksum on /dev/sdd1 at offset 4096
  No device found for PV ATG7FD-8U72-3qNi-YpYr-xzb8-fzLT-2vpUcm.
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/md127 vg_os  lvm2 a--  144.50g  85.88g
  /dev/md128 vg_ext lvm2 a--    1.36t 927.25g

I then installed these rpms from the link you supplied:

device-mapper-1.02.90-0.1.fc19.x86_64.rpm
device-mapper-devel-1.02.90-0.1.fc19.x86_64.rpm
device-mapper-event-1.02.90-0.1.fc19.x86_64.rpm
device-mapper-event-libs-1.02.90-0.1.fc19.x86_64.rpm
device-mapper-libs-1.02.90-0.1.fc19.x86_64.rpm
lvm2-2.02.111-0.1.fc19.x86_64.rpm
lvm2-libs-2.02.111-0.1.fc19.x86_64.rpm

And now the problem is gone:

[root@ti5 ~]# pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/md127 vg_os  lvm2 a--  144.50g  85.88g
  /dev/md128 vg_ext lvm2 a--    1.36t 927.25g

I then decided to reboot it to make sure it works OK across a reboot.
Unfortunately, it did not.  I haven't had a chance to troubleshoot the
system yet, because I am out of the office, but the boot error messages
include these:

md128: unknown partition table
...
Timed out waiting for device dev-disk-by\x2duuid-85d4ae4d\x2d8ae3\x2d4165\x2db...

I will update when I am able to recover the system.

-Andy

Comment 19 Peter Rajnoha 2014-09-18 07:45:33 UTC

(...also try regenerating initramfs image - by calling "dracut")

Comment 20 Andrew J. Schorr 2014-09-18 14:51:49 UTC

That was my first idea as well -- to run dracut in a rescue shell and hope that it would fix the problem.

Unfortunately, it does not. For some reason, it is unable to find the /var filesystem (which is on an LV), and it cannot find another local filesystem as well.

In the emergency shell, I edited /etc/lvm/lvm.conf and changed "use_lvmetad" from 1 to 0. I rebooted, and everything worked fine.

FYI, I see these errors in the journal:

Sep 18 10:23:11 ti5 systemd[1]: [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue 'RemoveOnStop' in section 'Socket'
Sep 18 10:23:11 ti5 systemd[1]: [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop' in section 'Socket'

I don't know if this matters, but I doubt it, since systemctl status for lvm2-lvmetad.socket looks good.

I'm not sure how to troubleshoot further, but there's definitely a problem here.

When I ran "lvs" in the rescue shell, I think it saw all the LVs.  So I'm not sure why they weren't found.

The error message was "Timed out waiting for device dev-disk-by\x2duuid-<hex junk>".  That appeared twice -- once for /var, and once for my other local scratch filesystem. There did not seem to be any problem with the / and /usr LVs.

Regards,
Andy

Comment 21 Peter Rajnoha 2014-09-19 08:29:49 UTC

(In reply to Andrew J. Schorr from comment #20)
> That was my first idea as well -- to run dracut in a rescue shell and hope
> that it would fix the problem.
> 

Hmm, that's odd. Could you grab some more info for me (having this new version installed) - put back "use_lvmetad=1" and then reboot - if the problem/timeout appears and you're dropped to the emergency shell, try running "lvmdump -l -s -u" and attach the lvmdump-....tgz here (or mail it to me directly). Thanks.

Comment 22 Andrew J. Schorr 2014-09-19 15:55:54 UTC

Created attachment 939389 [details]
tarball from "lvmdump -l -s -u" in the emergency shell

It may be odd, but it is very repeatable.  It has happened at least 5 or 6 times.  It never boots successfully with "use_lvmetad = 1".  I hope the tarball helps.

Comment 23 Peter Rajnoha 2014-09-24 14:05:14 UTC

Ah, sorry - the systemctl is a bit older in F19, so I had to modify that a bit so it logs things properly. Could you please rerun the lvmdump (comment #21) with these new packages? (sorry for the wrong packages before)
  
  https://prajnoha.fedorapeople.org/rpms/bz1018852/x86_64/

Comment 24 Andrew J. Schorr 2014-09-24 19:55:43 UTC

Created attachment 940883 [details]
new tarball from "lvmdump -l -s -u" in the emergency shell

I installed these rpms:

device-mapper-1.02.90-0.3.fc19.x86_64.rpm
device-mapper-devel-1.02.90-0.3.fc19.x86_64.rpm
device-mapper-event-1.02.90-0.3.fc19.x86_64.rpm
device-mapper-event-libs-1.02.90-0.3.fc19.x86_64.rpm
device-mapper-libs-1.02.90-0.3.fc19.x86_64.rpm
lvm2-2.02.111-0.3.fc19.x86_64.rpm
lvm2-libs-2.02.111-0.3.fc19.x86_64.rpm

I enabled use_lvmetad and then rebooted.  It hung as usual.  This is the tarball from "lvmdump -l -s -u".

I hope this helps.

Comment 25 Peter Rajnoha 2014-09-25 09:02:07 UTC

So, based on the logs:

  - udev database content:

    MD array "md128":
    =================
    P: /devices/virtual/block/md128
    N: md128
    ...
    E: ID_FS_TYPE=LVM2_member
    ...
    E: LVM_MD_PV_ACTIVATED=1
    E: SYSTEMD_WANTS=lvm2-pvscan@9:128.service


    MD array "md126":
    =================
    P: /devices/virtual/block/md126
    N: md126
    ...
    ID_FS_TYPE=LVM2_member
    SYSTEMD_READY=0

From above it's clear that both md128 as well as md126 are properly identified as PVs. However, while there's pvscan service instantiated for md128 (which is also visible in systemd logs), the other one - md126 - this one is marked as SYSTEMD_READY=0 and there's no "LVM_MD_PV_ACTIVATED=1" which results in no pvscan service to be instantiated. That's the reason why LVs on md128 PV are not activated.

(With lvmetad, this activation is automatic based on incoming device events, while withou lvmetad, there's rough "vgchange -ay" call during boot process that activates what is visible at the time of that call - that's the reason why it doesn't work with lvmetad and it works wihtout it).

So we need to find out *why* md126 is not marked as activated MD array. The /lib/udev/rules.d/69-dm-lvm-metad.rules is responsible for this:

  # MD device:
  LABEL="next"
  KERNEL!="md[0-9]*", GOTO="next"
  IMPORT{db}="LVM_MD_PV_ACTIVATED"
  ACTION=="add", ENV{LVM_MD_PV_ACTIVATED}=="1", GOTO="lvm_scan"
  ACTION=="change", ENV{LVM_MD_PV_ACTIVATED}!="1", TEST=="md/array_state",ENV{LVM_MD_PV_ACTIVATED}="1", GOTO="lvm_scan"
  ACTION=="add", KERNEL=="md[0-9]*p[0-9]*", GOTO="lvm_scan"
  ENV{LVM_MD_PV_ACTIVATED}!="1", ENV{SYSTEMD_READY}="0"
  GOTO="lvm_end"


The line that is important for us is this one exactly:

  ACTION=="change", ENV{LVM_MD_PV_ACTIVATED}!="1", TEST=="md/array_state",ENV{LVM_MD_PV_ACTIVATED}="1", GOTO="lvm_scan"

...which checks for the /sys/block/<md_name>/md/array_state existence and if it exists, the MD array is activated. Clearly, this is not the case for some reason. Hence, this rule is applied then:

  ENV{LVM_MD_PV_ACTIVATED}!="1", ENV{SYSTEMD_READY}="0"
  GOTO="lvm_end"

...and nothing gets activated then of course. It seems I'll need more debug info from udev daemon to see why this rule failed, but I'll try a few things first. I'll then ask you for more info later on...

Comment 26 Peter Rajnoha 2014-09-25 11:42:59 UTC

Could please try this?

(with use_lvmetad=0)
  - running the "lvmdump -l -s -u" when the machine is booted completely

(with use_lvmetad=1)

  - adding "--debug" to /lib/systemd/system/systemd-udevd.service's Exec Start line:

  ExecStart=/usr/lib/systemd/systemd-udevd --debug

  - adding "debug" option to kernel cmd line


Then, when the failure happens, try to get the logs and, please, attach it here:

  systemctl status --full -n 1000000 systemd-udevd.service
  journalctl -b --full -n 1000000

===

Also, it would be great if you could try this in addition:

(with use_lvmetad=0 at first)
 - boot the system
 - umount the mountpoint (if possible) - the one which is on the MD that caused problems with use_lvmetad=0
 - deactivate any LVs on top of the MD (vgchange -an ...)
 - stop the MD array (mdadm -S)
 - change lvm.conf to use_lvmetad=1
 - run pvscan --cache
 - run mdadm -I <dev> for each dev that is making up the raid array (the one that makes problems)
 - check whether LVs get activated automatically after last "mdadm -I <dev>"

(Sorry - it's a bit fiddly, I know, but unfortunately I can't reproduce this in any way - so either this is a very tight race and you're "happy" to have it on your system or some other problem - I hope to see more from the additional debug logs)

Thanks!

Comment 27 Peter Rajnoha 2014-09-25 11:44:56 UTC

(In reply to Peter Rajnoha from comment #26)
> Also, it would be great if you could try this in addition:
> 
> (with use_lvmetad=0 at first)
>  - boot the system
>  - umount the mountpoint (if possible) - the one which is on the MD that
> caused problems with use_lvmetad=0

I meant "...on the MD that caused problems with_use_lvmetad=1"

Comment 28 Andrew J. Schorr 2014-10-08 18:40:37 UTC

Created attachment 945130 [details]
"lvmdump -l -s -u" from running system with use_lvmetad=0

Hi Peter,

Here is the first item you requested:

(with use_lvmetad=0)
  - running the "lvmdump -l -s -u" when the machine is booted completely

I'll see if I can get further items.  This is unfortunately a desktop system, not a rack-mount server.  So it's rather painful to conduct these experiments.

Regards,
Andy

Comment 29 Andrew J. Schorr 2014-10-08 19:45:11 UTC

Created attachment 945148 [details]
systemctl status --full -n 1000000 systemd-udevd.service

I set use_lvmetad=1 and rebooted.  This is the output in the emergency shell of "systemctl status --full -n 1000000 systemd-udevd.service"

Comment 30 Andrew J. Schorr 2014-10-08 19:46:42 UTC

Created attachment 945149 [details]
journalctl -b --full -n 1000000

I rebooted with use_lvmetad=1.  Here is the output in the emergency shell from "journalctl -b --full -n 1000000".

Comment 31 Andrew J. Schorr 2014-10-08 19:54:50 UTC

Hi Peter,

I'm not sure I can follow the rest of your instructions.  You asked me to umount the problematic mount point.  But one of those is /var.  So how will that work?  I could maybe do it with the other mount point, but I'm not sure whether that will be of value.  Please advise.

Regards,
Andy

Comment 32 Pierre Juhen 2014-10-10 18:08:13 UTC

Hi,

I have a similar problem on Fedora 20.

My configuration :

2 identical hard disks with partitionned
2 md volumes on those disks.
2 volumes groups named raid0 and raid1.

Those are OK.

One SSD disks, with a volume group inside a partition (sdb2), named ssd.

This volume group contains a LV named root (of course root of the system), which contains /boot.

Problem : 

ssd volume is detected by grub  and mounted during the boot process. However, the physical volume sdb2 and the volume group ssd aren't accessibles, whilst the logical volume ssd-root remains mounted under /dev/mapper.

sudo pvck /dev/sdb2 ->   Device /dev/sdb2 not found (or ignored by filtering).

If I use rescuecd, the ssd volume is accessible.

Thank you for your help,

Regards,

Comment 33 Pierre Juhen 2014-10-11 06:09:12 UTC

Shame on me,

the problem was my fault.

I was modifying lvm. conf to fet rid of duplicate PV, and I filtered 
sd[abc] instead of sd[ac].

Could you send me a brown bag ?

I still have a Warning :

lvmetad is running but disabled. Restart lvmetad before enabling it!

that occurs when the system shuts down.

I think that's link with the fact that a disk remains busy..

Oct 11 02:05:42 pierre umount: umount: /usr/local : cible occupée
Oct 11 02:05:42 pierre umount: (Dans certains cas, des renseignements sur les processus utilisant
Oct 11 02:05:42 pierre umount: le périphérique sont accessibles avec lsof(8) ou fuser(1).)
Oct 11 02:05:42 pierre lvm: WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!


Regards,

Comment 34 Fedora End Of Life 2015-01-09 22:20:20 UTC

This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 35 Fedora End Of Life 2015-02-18 11:36:23 UTC

Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.