Description of problem: A fresh installation of RHVH 4.4.7 on an external FC disk fails to boot in the first reboot after Anaconda. After switching to the real root, the /boot partition fails to mount and the boot process is then stuck forever after failing to start the kdump service. We made it to boot by using the multipath device instead of the UUID in /etc/fstab for the /boot and /boot/efi filesystems. If we interrupt the boot process with 'rd.break=pre-pivot', we correctly see the multipath device with 4 healthy paths. Running blkid displays the UUID of the /boot filesystem duplicated for every path. Version-Release number of selected component (if applicable): RHVH 4.4.7 (RHVH-4.4-20210715.1-RHVH-x86_64-dvd1.iso) How reproducible: It has to be a race condition, because a 10% of the boots using the UUID succeed. The customer has to install dozens of new servers and it happens in all of them. Steps to Reproduce: 1. Install RHVH 4.4.7 on a FC disk. In Anaconda select the multipath device. 2. Reboot Actual results: ~~~ [ TIME ] Timed out waiting for device dev-mapper-rhvh\x2dswap.device. [DEPEND] Dependency failed for Resume from hibernation using device /dev/mapper/rhvh-swap. [FAILED] Failed to mount /boot. See 'systemctl status boot.mount' for details. [DEPEND] Dependency failed for /boot/efi. [DEPEND] Dependency failed for Local File Systems. [FAILED] Failed to start Crash recovery kernel arming. ~~~ Expected results: A system booting without problems. Additional info: - We tried to add the options 'x-systemd.device-timeout=0,x-systemd.mount-timeout=0' to /etc/fstab but the mount process hung forever. - Storage array is NetApp in C-Mode - HBA using driver qla2xxx: Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter [1077:2261] (rev 01)
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
Created attachment 1814689 [details] the file doc
I can't reproduce this issue on FC machine. Test version: RHVH-4.4-20210715.1-RHVH-x86_64-dvd1.iso RHVH-4.4-20210812.0-RHVH-x86_64-dvd1.iso Test steps: 1. Install RHVH 4.4.7 on a FC disk, select the multipath device. 2. Finish the installation and reboot. 3. Register to engine. 4. Attach FC storage Test result: All above steps can succeed.
I am trying these two iso, but that doesn't work when finishing from installation and reboot machine, the machine going to Dracut mode and doesn't see any storage or anythings from the file system such as (/, /boot, /var), etc. that is a bug in this version that doesn't see the FC when installed the rhv.
We're trying to reproduce this issue on an additional environment and will provide an update as soon as we have more information.
(In reply to Mohamed Hegazy from comment #4) > I am trying these two iso, but that doesn't work > > when finishing from installation and reboot machine, the machine going to > Dracut mode and doesn't see any storage or anythings from the file system > such as (/, /boot, /var), etc. > > that is a bug in this version that doesn't see the FC when installed the rhv. Have you tried to install plain RHEL, and if so, do you have the same issue with RHEL installations as well?
Yes, I install RHEL and the installation is successful.
and it is not FCoE by any chance, is it?
Actually, I am using FCoE with RHEL and the installation is successful, but when installing RHV 4.4.7 using FCOE the installation is successful but when I reboot after the installation the machine can't boot.
I wonder if just installing the vdsm-hook-fcoe would make a difference. What is it that's needed for the host to see the storage? fcoe kernel module or anything else? Perhaps just fcoe-utils rpm? (that's what got removed from RHVH in default install due to making vdsm-hook-fcoe optional recently)
Seems the FCOE issue is related with bug 1575930, detail info see https://bugzilla.redhat.com/show_bug.cgi?id=1575930#c44
Well RHEL 8 does not support FCoE that is software based. Only hardware assisted FCoE is supported, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_storage_devices/configuring-fibre-channel-over-ethernet_managing-storage-devices So as long as we are using one of these drivers, we should be fine: * qedf * bnx2fc * fnic As multipath does see the devices, I don't think it has anything to do with the fcoe-utils package then. Also from the c#0 it looks like proper Fibrechannel is used: > - HBA using driver qla2xxx: Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter [1077:2261] (rev 01) This is a plain FC HBA and not a hybrid network/FCoE device. So can you please clarify which device is used for boot? Also the output of the following commands would be helpful: - lspci - multipath -vvv - lsmod Cheers, Martin
As an update regarding the additional environment: we weren't able to reproduce the issue there either. Could we get the output from the above commands?
For reference, this is the RHV KCS about not supporting software FCoE starting RHV 4.4: https://access.redhat.com/solutions/5269201