Hide Forgot
Description of problem: Create a VG on an iSCSI LUN and reboot the server. The filesystem will fail to mount. device-mapper: multipath: version 1.6.0 loaded Setting up Logical Volume Management: No volume groups found [ OK ] Checking filesystems Checking all file systems. [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/mapper/vg_ibmx3250m401-lv_root /dev/mapper/vg_ibmx3250m401-lv_root: clean, 89134/3276800 files, 757585/13107200 blocks [/sbin/fsck.ext4 (1) -- /boot] fsck.ext4 -a /dev/sda1 /dev/sda1: clean, 39/128016 files, 57553/512000 blocks [/sbin/fsck.ext4 (1) -- /home] fsck.ext4 -a /dev/mapper/vg_ibmx3250m401-lv_home fsck.ext4: No such file or directory while trying to open /dev/mapper/vg_ibmx3250m401-lv_home /dev/mapper/vg_ibmx3250m401-lv_home: The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> %G Welcome to Red Hat Enterprise Linux Server Starting udev: %G[ OK ] Setting hostname ibm-x3250m4-01.rhts.eng.bos.redhat.com: [ OK ] Setting up Logical Volume Management: No volume groups found [ OK ] Checking filesystems Checking all file systems. [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/mapper/vg_ibmx3250m401-lv_root /dev/mapper/vg_ibmx3250m401-lv_root: clean, 89134/3276800 files, 757585/13107200 blocks [/sbin/fsck.ext4 (1) -- /boot] fsck.ext4 -a /dev/sda1 /dev/sda1: clean, 39/128016 files, 57553/512000 blocks [/sbin/fsck.ext4 (1) -- /home] fsck.ext4 -a /dev/mapper/vg_ibmx3250m401-lv_home fsck.ext4: No such file or directory while trying to open /dev/mapper/vg_ibmx3250m401-lv_home /dev/mapper/vg_ibmx3250m401-lv_home: The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> [FAILED] *** An error occtype=1404 audit(1405545982.298:4): enforcing=0 old_enforcing=1 auid=4294967295 ses=4294967295 urred during the file system check. *** Dropping you to a shell; the system will reboot *** whGive root password for maintenance (or type Control-D to continue): Version-Release number of selected component (if applicable): lvm2-2.02.107-2.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1.Add an iSCSI LUN to the server multipath -l mpatha (360a980003246694a412b456733426164) dm-3 NETAPP,LUN size=10G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active |- 4:0:0:0 sdc 8:32 active undef running `- 5:0:0:0 sdb 8:16 active undef running 2.Create a VG group vgcreate -f test_vg /dev/mapper/mpatha Physical volume "/dev/mapper/mpatha" successfully created Volume group "test_vg" successfully created 3.Reboot the server
From the output in comment #0 it appears the lv_home volume is inactive (lv_root was presumably activated in the initramfs). What backs the home lv? Is it actually using the iSCSI device? > Setting up Logical Volume Management: No volume groups found Sounds more like devices are being filtered (ignored) than a missing device preventing activation. Can you post an lvmdump or sosreport from the system so we can see the device configuration in full?
(In reply to Bruno Goncalves from comment #0) > Description of problem: > Create a VG on an iSCSI LUN and reboot the server. > The filesystem will fail to mount. > Just to make sure - it's the root and home LV that's on iSCSI, right? > device-mapper: multipath: version 1.6.0 loaded > Setting up Logical Volume Management: No volume groups found > [ OK ] ...the vgchange call is in rc.sysinit which happens before iSCSI initscript is executed. However, if root fs is also on iSCSI (and the same VG as the "home" LV), the iSCSI should already be set up from initramfs. So if it shows "No volume groups found", it's a bit odd since it should at least show the already activated LV "root". Also, for activating VG/LV on a net-attached storage, there should be a "_netdev" option defined in fstab for the device (in this case the LV "home"). Is it set? > Checking filesystems > Checking all file systems. > [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/mapper/vg_ibmx3250m401-lv_root > > /dev/mapper/vg_ibmx3250m401-lv_root: clean, 89134/3276800 files, > 757585/13107200 blocks > [/sbin/fsck.ext4 (1) -- /boot] fsck.ext4 -a /dev/sda1 > /dev/sda1: clean, 39/128016 files, 57553/512000 blocks > [/sbin/fsck.ext4 (1) -- /home] fsck.ext4 -a > /dev/mapper/vg_ibmx3250m401-lv_home > fsck.ext4: No such file or directory while trying to open > /dev/mapper/vg_ibmx3250m401-lv_home > /dev/mapper/vg_ibmx3250m401-lv_home: > The superblock could not be read or does not describe a correct ext2 > filesystem. If the device is valid and it really contains an ext2 > filesystem (and not swap or ufs or something else), then the superblock > is corrupt, and you might try running e2fsck with an alternate superblock: > e2fsck -b 8193 <device> > > %G Welcome to Red Hat Enterprise Linux Server > Starting udev: %G[ OK ] > Setting hostname ibm-x3250m4-01.rhts.eng.bos.redhat.com: [ OK ] > Setting up Logical Volume Management: No volume groups found > [ OK ] > Checking filesystems > Checking all file systems. > [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/mapper/vg_ibmx3250m401-lv_root > > /dev/mapper/vg_ibmx3250m401-lv_root: clean, 89134/3276800 files, > 757585/13107200 blocks > [/sbin/fsck.ext4 (1) -- /boot] fsck.ext4 -a /dev/sda1 > /dev/sda1: clean, 39/128016 files, 57553/512000 blocks > [/sbin/fsck.ext4 (1) -- /home] fsck.ext4 -a > /dev/mapper/vg_ibmx3250m401-lv_home > fsck.ext4: No such file or directory while trying to open > /dev/mapper/vg_ibmx3250m401-lv_home > /dev/mapper/vg_ibmx3250m401-lv_home: > The superblock could not be read or does not describe a correct ext2 > filesystem. If the device is valid and it really contains an ext2 > filesystem (and not swap or ufs or something else), then the superblock > is corrupt, and you might try running e2fsck with an alternate superblock: > e2fsck -b 8193 <device> > > [FAILED] > > *** An error occtype=1404 audit(1405545982.298:4): enforcing=0 > old_enforcing=1 auid=4294967295 ses=4294967295 > urred during the file system check. > *** Dropping you to a shell; the system will reboot > *** whGive root password for maintenance > (or type Control-D to continue): > > Version-Release number of selected component (if applicable): > lvm2-2.02.107-2.el6.x86_64 > > > How reproducible: > 100% > > Steps to Reproduce: > 1.Add an iSCSI LUN to the server > multipath -l > mpatha (360a980003246694a412b456733426164) dm-3 NETAPP,LUN > size=10G features='4 queue_if_no_path pg_init_retries 50 > retain_attached_hw_handle' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=0 status=active > |- 4:0:0:0 sdc 8:32 active undef running > `- 5:0:0:0 sdb 8:16 active undef running > > 2.Create a VG group > vgcreate -f test_vg /dev/mapper/mpatha > Physical volume "/dev/mapper/mpatha" successfully created > Volume group "test_vg" successfully created > > 3.Reboot the server
(In reply to Peter Rajnoha from comment #5) > (In reply to Bruno Goncalves from comment #0) > > Description of problem: > > Create a VG on an iSCSI LUN and reboot the server. > > The filesystem will fail to mount. > > > > Just to make sure - it's the root and home LV that's on iSCSI, right? > No, the root and home are on local disk. The iSCSI session is established once the server is up and running.
I've borrowed the machine from Bruno, so far, this is what I've found: --> the boot sequence stopped on fsck failure since it can't find home LV which is not activated --> the "/" is still RO (the remount comes later in rc.sysinit) --> running "vgchange -a ay --sysinit --ignoreskippedcluster" (the exact line that is called withing rc.sysinit) gives: #lvmcmdline.c:1346 DEGRADED MODE. Incomplete RAID LVs will be processed. #lvmcmdline.c:1352 Processing: vgchange -a ay --sysinit --ignoreskippedcluster -vvvv #lvmcmdline.c:1355 O_DIRECT will be used #libdm-config.c:877 Setting global/locking_type to 1 #libdm-config.c:941 Setting global/wait_for_locks to 1 #locking/locking.c:128 File-based locking selected. #libdm-config.c:941 Setting global/prioritise_write_locks to 1 #libdm-config.c:846 Setting global/locking_dir to /var/lock/lvm #libdm-common.c:903 Preparing SELinux context for /var/lock/lvm to system_u:object_r:lvm_lock_t:s0. #libdm-common.c:906 Resetting SELinux context to default value. #locking/locking.c:132 File-based locking initialisation failed. #locking/locking.c:197 Locking disabled - only read operations permitted. #toollib.c:674 Finding all volume groups #toollib.c:678 No volume groups found #lvmcmdline.c:1413 Completed: vgchange -a ay --sysinit --ignoreskippedcluster -vvvv (so that exactly the same and incorrect state "No volume groups found") --> current device presence: [root@tyan-gt24-08 shm]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 111.8G 0 disk ├─sda1 8:1 0 500M 0 part └─sda2 8:2 0 111.3G 0 part ├─vg_tyangt2408-lv_root (dm-0) 253:0 0 50G 0 lvm / └─vg_tyangt2408-lv_swap (dm-1) 253:1 0 7.9G 0 lvm sr0 --> the home LV is in the same VG as root LV and on the same PV (sda2) --> the .cache state: [root@tyan-gt24-08 shm]# cat /etc/lvm/cache/.cache # This file is automatically maintained by lvm. persistent_filter_cache { valid_devices=[ "/dev/mapper/mpatha", "/dev/block/253:3", "/dev/disk/by-id/dm-name-mpatha", "/dev/dm-3", "/dev/disk/by-id/dm-uuid-mpath-360a980003246694a412b456733426164" ] } (there's no sda2 nor any of its aliases!!!) --> bypassing the .cache file helps (pointing cache_dir to a place where there's no .cache): [root@tyan-gt24-08 shm]# vgchange -a ay --sysinit --ignoreskippedcluster --config 'devices{cache_dir="/etc/lvm"}' 3 logical volume(s) in volume group "vg_tyangt2408" now active --> lvm.conf used is the default one and unchanged in any way So the problem is that the .cache is wrong and it can't be updated either (the root fs is still RO). We need to track how this wrong .cache file got generated and when exactly - why sda2 was not visible at the time the .cache was generated (or it got filtered out)!
OK, I've found the culprit, it should be now fixed with: https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=54685c20fc9dfb155a2e5bc9d8cf5f0aad944305 It's a regression caused recently by this commit actually (appeared in recent 6.6 builds): https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=e80884cd080cad7e10be4588e3493b9000649426 I'm changing the summary of the bug to reflect the problem better...
To QA to reproduce and check: - disks sda and sdb (both available to be used as PVs) # vgscan Reading all physical volumes. This may take a while... Found volume group "fedora" using metadata type lvm2 # grep sda /etc/lvm/cache/.cache "/dev/sda", # grep sdb /etc/lvm/cache/.cache "/dev/sdb", # vgcreate vg /dev/sda Physical volume "/dev/sda" successfully created Volume group "vg" successfully created Before this fix (only sda cached - the one that was processed in vgcreate): # grep sda /etc/lvm/cache/.cache "/dev/sda", # grep sdb /etc/lvm/cache/.cache (sdb is incorrectly dropped from .cache as available block device) After this fix (all available block devs cached): # grep sda /etc/lvm/cache/.cache "/dev/sda", # grep sdb /etc/lvm/cache/.cache "/dev/sdb", (Solution to bug #1113539 must still work!)
Seems that there are still some slight issues involving .cache. Here's the output from a different test but which encountered a problem related to .cache: [root@virt-064 ~]# mdadm --create md1 -l 1 --raid-devices 2 /dev/sdc1 /dev/sdd1 mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md/md1 started. [root@virt-064 ~]# vgs Incorrect metadata area header checksum on /dev/sdc1 at offset 4096 Incorrect metadata area header checksum on /dev/sdd1 at offset 4096 VG #PV #LV #SN Attr VSize VFree vg_virt064 1 2 0 wz--n- 7.51g 0 [root@virt-064 ~]# mdadm -S /dev/md/md1 mdadm: stopped /dev/md/md1 [root@virt-064 ~]# vgcreate two /dev/sdc1 /dev/sdd1 /dev/sdf1 Incorrect metadata area header checksum on /dev/sdc1 at offset 4096 Incorrect metadata area header checksum on /dev/sdd1 at offset 4096 WARNING: software RAID md superblock detected on /dev/sdc1. Wipe it? [y/n]: y Wiping software RAID md superblock on /dev/sdc1. WARNING: software RAID md superblock detected on /dev/sdd1. Wipe it? [y/n]: y Wiping software RAID md superblock on /dev/sdd1. Incorrect metadata area header checksum on /dev/sdc1 at offset 4096 Incorrect metadata area header checksum on /dev/sdc1 at offset 4096 Incorrect metadata area header checksum on /dev/sdd1 at offset 4096 Incorrect metadata area header checksum on /dev/sdd1 at offset 4096 Physical volume "/dev/sdc1" successfully created Physical volume "/dev/sdd1" successfully created Clustered volume group "two" successfully created [root@virt-064 ~]# vgs VG #PV #LV #SN Attr VSize VFree two 3 0 0 wz--nc 44.99g 44.99g [root@virt-064 ~]# lvs [root@virt-064 ~]# vgs -a VG #PV #LV #SN Attr VSize VFree two 3 0 0 wz--nc 44.99g 44.99g The VG which was there in the beginning "vanished" However if obtain_device_list_from_udev = 1 then: [root@virt-064 ~]# mdadm -S /dev/md/md1 mdadm: stopped /dev/md/md1 [root@virt-064 ~]# vgs VG #PV #LV #SN Attr VSize VFree vg_virt064 1 2 0 wz--n- 7.51g 0 [root@virt-064 ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv_root vg_virt064 -wi-ao---- 6.71g lv_swap vg_virt064 -wi-ao---- 816.00m [root@virt-064 ~]# vgcreate two /dev/sdd1 /dev/sdc1 /dev/sdf1 WARNING: software RAID md superblock detected on /dev/sdd1. Wipe it? [y/n]: y Wiping software RAID md superblock on /dev/sdd1. WARNING: software RAID md superblock detected on /dev/sdc1. Wipe it? [y/n]: y Wiping software RAID md superblock on /dev/sdc1. Physical volume "/dev/sdd1" successfully created Physical volume "/dev/sdc1" successfully created Clustered volume group "two" successfully created [root@virt-064 ~]# vgs VG #PV #LV #SN Attr VSize VFree two 3 0 0 wz--nc 44.99g 44.99g vg_virt064 1 2 0 wz--n- 7.51g 0 There is a difference in messages as well where in the second try, LVM does not complain about incorrect metadata area header checksum. Maybe that is related. Tested with: lvm2-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 lvm2-libs-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014
Clustered environmnet affected only. The reproducer/check in comment #9 works. So the finding in comment #11 must be another path in the code where the .cache is incorrectly generated... I'll check that.
Happens only if MD is incorporated: (with swap - problem not hit) [root@rhel6-b ~]# mkswap /dev/sda mkswap: /dev/sda: warning: don't erase bootbits sectors on whole disk. Use -f to force. Setting up swapspace version 1, size = 131068 KiB no label, UUID=a2265867-65d5-461a-b12d-ec4dc9aac8aa [root@rhel6-b ~]# vgcreate vg /dev/sda WARNING: swap signature detected on /dev/sda. Wipe it? [y/n]: y Wiping swap signature on /dev/sda. Physical volume "/dev/sda" successfully created Clustered volume group "vg" successfully created [root@rhel6-b ~]# vgs VG #PV #LV #SN Attr VSize VFree VolGroup 1 2 0 wz--n- 9.51g 0 vg 1 0 0 wz--nc 124.00m 124.00m (with MD - problem hit) [root@rhel6-b ~]# mdadm --create /dev/md0 -l1 --raid-devices 2 /dev/sda /dev/sdb mdadm: Note: this array has metadata at the start and may not be suitable as a boot device. If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use --metadata=0.90 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. [root@rhel6-b ~]# mdadm -S /dev/md0 mdadm: stopped /dev/md0 [root@rhel6-b ~]# vgcreate vg /dev/sda WARNING: software RAID md superblock detected on /dev/sda. Wipe it? [y/n]: y Wiping software RAID md superblock on /dev/sda. Physical volume "/dev/sda" successfully created Clustered volume group "vg" successfully created [root@rhel6-b ~]# vgs VG #PV #LV #SN Attr VSize VFree vg 1 0 0 wz--nc 124.00m 124.00m ( MISSING VolGroup VG!!!) [root@rhel6-b ~]# cat /etc/lvm/cache/.cache # This file is automatically maintained by lvm. persistent_filter_cache { valid_devices=[ "/dev/disk/by-id/scsi-360000000000000000e00000000020001", "/dev/disk/by-id/wwn-0x60000000000000000e00000000020001", "/dev/block/8:0", "/dev/disk/by-path/ip-192.168.122.1:3260-iscsi-iqn.2012-07.com.redhat.brq.alatyr.virt:host.target_rhel6-lun-1", "/dev/sda" ] } ( .cache CONTAINS ONLY sda and its aliases)
We've decided it is a different - and fairly old - problem, and we do have a fix that can go in.
[root@tardis-01 ~]# grep sdc /etc/lvm/cache/.cache "/dev/sdc1", [root@tardis-01 ~]# grep sdb /etc/lvm/cache/.cache "/dev/sdb1", [root@tardis-01 ~]# vgcreate vg /dev/sdb1 Volume group "vg" successfully created [root@tardis-01 ~]# grep sdb /etc/lvm/cache/.cache "/dev/sdb1", [root@tardis-01 ~]# grep sdc /etc/lvm/cache/.cache "/dev/sdc1", [root@tardis-01 ~]# Marking this specific cache bug as VERIFIED with: lvm2-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 lvm2-libs-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 lvm2-cluster-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 udev-147-2.57.el6 BUILT: Thu Jul 24 15:48:47 CEST 2014 device-mapper-1.02.88-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 device-mapper-libs-1.02.88-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 device-mapper-event-1.02.88-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 device-mapper-event-libs-1.02.88-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 device-mapper-persistent-data-0.3.2-1.el6 BUILT: Fri Apr 4 15:43:06 CEST 2014 cmirror-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 A new cache bug (not related to the problem reported here) was filed under Bug 1129311
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1387.html