+++ This bug was initially created as a clone of Bug #2090169 +++ This is a **similar** issue, but unsure if the root of the problem is the same Description of problem: After upgrading an rhv virtualization-host from 4.4.10 to 4.5.0, the node didn't boot anymore and ended up in dracut rescue shell. In the rescue shell it was clear that it didn't boot because LVM did not activate the gluster devices, and it failed to mount the related filsystems: UUID=24092465-cad0-412b-8ef3-d3abbf9e7b5b /gluster_bricks/engine xfs inode64,noatime,nodiratime 0 0 UUID=67dce97d-13b1-4734-9158-aeafd7e57426 /gluster_bricks/data xfs inode64,noatime,nodiratime 0 0 UUID=d303fdb0-3bf4-4cab-8d66-125035a24899 /gluster_bricks/vmstore xfs inode64,noatime,nodiratime 0 0 UUID=21e8907f-122a-44d8-8f6c-cfc8dc6bc8eb /gluster_bricks/vmstore2 xfs inode64,noatime,nodiratime 0 0 The LVM physical volumes are luks encrypted drives: /dev/mapper/luks_sdb gluster_vg_luks_sdb lvm2 a-- <3.20t 0 <3.20t yktKae-63L1-GIfU-mWdK-gvXS-1e0f-CLXXDu 505.50k 1020.00k 1 1 1.00m /dev/mapper/luks_sdc gluster_vg_sdc lvm2 a-- <4.37t 35.16g <4.37t TWNu4B-aaGD-LjNh-W2Y9-FRzW-Fzho-9nfmFA 506.50k 1020.00k 1 1 1.00m /dev/sda1 --- 0 0 1.00g 0 0 0 0 0 /dev/sda2 --- 0 0 <299.00g 0 0 0 0 0 /dev/sdb --- 0 0 <3.20t 0 0 0 0 0 /dev/sdc --- 0 0 <4.37t 0 0 0 0 0 When trying to manually scan lvm inside of rescue/maintenance mode, we see pvscan skipped over the device due to "deviceid" 17:39:12.088409 pvscan[14920] filters/filter-deviceid.c:40 /dev/mapper/luks_sdb: Skipping (deviceid) Comparing customers /etc/lvm/lvm.conf file prior to the upgrade and after, we see the following is added in 4.5: use_devicesfile = 1 This is per: https://bugzilla.redhat.com/show_bug.cgi?id=2012830 On the customer system we can see that the /etc/lvm/devices/system.devices was not properly populated with the two /dev/mapper/luks* devices and there for lvm ignores them. From maintenance mode, we did the following to work around the problem: vi /etc/lvm/lvm.conf use_devicesfile = 0 ## temporarily disable so we can manually activate vgchange -ay ## activate all of the volume groups vi /etc/lvm/lvm.conf use_devicesfile = 1 ## set it back to enabled vgimportdevices -a ## run the import and correctly populate /etc/lvm/devices/system.devices Moving to the second host it was suggested based on Bug #2090169 to removing the device from the wwids file prior to upgrading a second node. This did not resolve the problem, and we had to do the above work around on 2 more hosts. We have sosreports from the 4.10 boot, 4.5 boot in maintenance mode, and 4.5 boot after fixing it.
Based on the info in description 0, this is not the same issue as bug 2090169. In that case removing the wwid from /etc/multipath/wwid should avoid this issue. This may be an issue with luks encrypted devices, I don't think this was tested or even considered in "vdsm-tool config-lvm-filter" tool. We need to reproduce by building such RHV-H system. It can help if we get the output of "vdsm-tool config-lvm-filter" when running on such host before the upgrade. For the way to fix such system, we can simplify it by disabling the devices file temporarily, and import only the required vgs. The example give is very risky if the host is had FC storage connected - it can import RHV stoarge domain vgs, and even guest vgs from active lvs for raw disks. Fixing instructions: 1. Activate the needed vgs, disabling the devices file temporarily: vgchange --devicesfile= -ay gluster_vg_luks_sdb gluster_vg_sdc 2. Import the devices to the system devices file vgimportdevices gluster_vg_luks_sdb gluster_vg_sdc
We're still looking for an environment that this happens on
Lev, please check comment 22 and comment 23. I think this should be fixed in imagebased (bind mount /gluster_bricks in the chroot?).
(In reply to Sean Haselden from comment #0) > In the rescue shell it was clear that it didn't boot because LVM did not > activate the gluster devices, and it failed to mount the related filsystems: This is explained by comment 23 and comment 23. So this is a new issue and not related to bug 2090169.
(In reply to Nir Soffer from comment #24) > Lev, please check comment 22 and comment 23. I think this should be fixed > in imagebased (bind mount /gluster_bricks in the chroot?). Shouldn't it access/detect it through /dev , just as it does with the LVM based volumes?
*** Bug 2104515 has been marked as a duplicate of this bug. ***
(In reply to Lev Veyde from comment #26) > (In reply to Nir Soffer from comment #24) > > Lev, please check comment 22 and comment 23. I think this should be fixed > > in imagebased (bind mount /gluster_bricks in the chroot?). > > Shouldn't it access/detect it through /dev , just as it does with the LVM > based volumes? No, it need to see the mounts to detect the required lvs. Run lsblk in the chroot - if it does not show the mountpoints for lvs, the lvs are not considered for creating filter/adding to devices file.
temporary solution without Ansible Before upgrading following procedure can be also applied to avoid hypervisor boot issues. * Remove LVM filters. ~~~ # sed -i /^filter/d /etc/lvm/lvm.conf ~~~ * Enable system devices. Search *Allow_mixed_block_sizes* in */etc/lvm/lvm.conf* file and add a new line after it as follows. ~~~ # sed '/^Allow_mixed_block_sizes = 0/a use_devicesfile = 1' /etc/lvm/lvm.conf ~~~ * Populate system devices ~~~ # vgimportdevices -a ~~~ Continue with upgrade will not have any issue after that.
The attached KCS was validated, please checkout a minor suggestion to improve it in comment 35 Since we don't have an easy way to handle that and this is a one-time issue (once fixed, it won't reproduce on future upgrades), following the KCS is the best way to go
(In reply to Arik from comment #38) > The attached KCS was validated, please checkout a minor suggestion to > improve it in comment 35 Only minor point I'd make to the KCS solution is maybe using this in step 3: # vgimportdevices <volume group name>
(In reply to Arik from comment #39) > (In reply to Arik from comment #38) > > The attached KCS was validated, please checkout a minor suggestion to > > improve it in comment 35 > > Only minor point I'd make to the KCS solution is maybe using this in step 3: > # vgimportdevices <volume group name> I assume this command will leave the current disks in the devices file and add the ones specified as part of "<volume group name>"? If so I can make the edit.
(In reply to Sean Haselden from comment #40) > (In reply to Arik from comment #39) > > (In reply to Arik from comment #38) > > > The attached KCS was validated, please checkout a minor suggestion to > > > improve it in comment 35 > > > > Only minor point I'd make to the KCS solution is maybe using this in step 3: > > # vgimportdevices <volume group name> > > I assume this command will leave the current disks in the devices file and > add the ones specified as part of "<volume group name>"? If so I can make > the edit. Yes, that is correct. vgimportdevices creates the devicesfile if none exists, and appends new devices individually. Actually, vdsm-tool invokes vgimportdevices in a loop for the proper devices when we do "vdsm-tool config-lvm-filter".