Bug 1994438 - RHVH 4.4.7 fails to boot from SAN when using UUID for /boot partition
Summary: RHVH 4.4.7 fails to boot from SAN when using UUID for /boot partition
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: redhat-virtualization-host
Version: 4.4.7
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ovirt-4.4.9
: ---
Assignee: Sanja Bonic
QA Contact: cshao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-17 09:27 UTC by Mohamed Hegazy
Modified: 2022-12-26 03:15 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-26 12:43:32 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:
sanja: needinfo-
sanja: needinfo-


Attachments (Terms of Use)
the file doc (133.94 KB, text/plain)
2021-08-17 09:59 UTC, Mohamed Hegazy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43043 0 None None None 2021-08-17 09:29:05 UTC

Description Mohamed Hegazy 2021-08-17 09:27:20 UTC
Description of problem:
A fresh installation of RHVH 4.4.7 on an external FC disk fails to boot in the first reboot after Anaconda.

After switching to the real root, the /boot partition fails to mount and the boot process is then stuck forever after failing to start the kdump service.

We made it to boot by using the multipath device instead of the UUID in /etc/fstab for the /boot and /boot/efi filesystems.

If we interrupt the boot process with 'rd.break=pre-pivot', we correctly see the multipath device with 4 healthy paths. Running blkid displays the UUID of the /boot filesystem duplicated for every path.


Version-Release number of selected component (if applicable):
RHVH 4.4.7 (RHVH-4.4-20210715.1-RHVH-x86_64-dvd1.iso)

How reproducible:
It has to be a race condition, because a 10% of the boots using the UUID succeed. The customer has to install dozens of new servers and it happens in all of them.

Steps to Reproduce:
1. Install RHVH 4.4.7 on a FC disk. In Anaconda select the multipath device.
2. Reboot

Actual results:

~~~
[ TIME ] Timed out waiting for device dev-mapper-rhvh\x2dswap.device.
[DEPEND] Dependency failed for Resume from hibernation using device /dev/mapper/rhvh-swap.

[FAILED] Failed to mount /boot.
See 'systemctl status boot.mount' for details.
[DEPEND] Dependency failed for /boot/efi.
[DEPEND] Dependency failed for Local File Systems.

[FAILED] Failed to start Crash recovery kernel arming.
~~~

Expected results:
A system booting without problems.


Additional info:
- We tried to add the options 'x-systemd.device-timeout=0,x-systemd.mount-timeout=0' to /etc/fstab but the mount process hung forever.
- Storage array is NetApp in C-Mode
- HBA using driver qla2xxx: Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter [1077:2261] (rev 01)

Comment 1 RHEL Program Management 2021-08-17 09:48:50 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 2 Mohamed Hegazy 2021-08-17 09:59:38 UTC
Created attachment 1814689 [details]
the file doc

Comment 3 cshao 2021-08-17 11:39:55 UTC
I can't reproduce this issue on FC machine.

Test version:
RHVH-4.4-20210715.1-RHVH-x86_64-dvd1.iso
RHVH-4.4-20210812.0-RHVH-x86_64-dvd1.iso

Test steps:
1. Install RHVH 4.4.7 on a FC disk, select the multipath device.
2. Finish the installation and reboot.
3. Register to engine.
4. Attach FC storage

Test result:
All above steps can succeed.

Comment 4 Mohamed Hegazy 2021-08-17 11:48:54 UTC
I am trying these two iso, but that doesn't work

when finishing from installation and reboot machine, the machine going to Dracut mode and doesn't see any storage or anythings from the file system such as (/, /boot, /var), etc.

that is a bug in this version that doesn't see the FC when installed the rhv.

Comment 5 Sanja Bonic 2021-08-18 14:28:21 UTC
We're trying to reproduce this issue on an additional environment and will provide an update as soon as we have more information.

Comment 6 Lev Veyde 2021-08-19 11:36:51 UTC
(In reply to Mohamed Hegazy from comment #4)
> I am trying these two iso, but that doesn't work
> 
> when finishing from installation and reboot machine, the machine going to
> Dracut mode and doesn't see any storage or anythings from the file system
> such as (/, /boot, /var), etc.
> 
> that is a bug in this version that doesn't see the FC when installed the rhv.

Have you tried to install plain RHEL, and if so, do you have the same issue with RHEL installations as well?

Comment 7 Mohamed Hegazy 2021-08-19 12:44:52 UTC
Yes, I install RHEL and the installation is successful.

Comment 8 Michal Skrivanek 2021-08-19 12:52:25 UTC
and it is not FCoE by any chance, is it?

Comment 9 Mohamed Hegazy 2021-08-19 12:59:53 UTC
Actually, I am using FCoE with RHEL and the installation is successful, but when installing RHV 4.4.7 using FCOE the installation is successful but when I reboot after the installation the machine can't boot.

Comment 10 Michal Skrivanek 2021-08-19 13:17:36 UTC
I wonder if just installing the vdsm-hook-fcoe would make a difference. What is it that's needed for the host to see the storage? fcoe kernel module or anything else? Perhaps just fcoe-utils rpm? (that's what got removed from RHVH in default install due to making vdsm-hook-fcoe optional recently)

Comment 11 cshao 2021-08-19 14:20:43 UTC
Seems the FCOE issue is related with bug 1575930, detail info see https://bugzilla.redhat.com/show_bug.cgi?id=1575930#c44

Comment 12 Martin Tessun 2021-08-19 15:52:42 UTC
Well RHEL 8 does not support FCoE that is software based. Only hardware assisted FCoE is supported, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_storage_devices/configuring-fibre-channel-over-ethernet_managing-storage-devices

So as long as we are using one of these drivers, we should be fine:
* qedf
* bnx2fc
* fnic 

As multipath does see the devices, I don't think it has anything to do with the fcoe-utils package then.


Also from the c#0 it looks like proper Fibrechannel is used:

> - HBA using driver qla2xxx: Fibre Channel [0c04]: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter [1077:2261] (rev 01)

This is a plain FC HBA and not a hybrid network/FCoE device. So can you please clarify which device is used for boot? Also the output of the following commands would be helpful:

- lspci
- multipath -vvv
- lsmod

Cheers,
Martin

Comment 15 Sanja Bonic 2021-08-20 14:18:23 UTC
As an update regarding the additional environment: we weren't able to reproduce the issue there either. Could we get the output from the above commands?

Comment 16 Marina Kalinin 2021-08-20 15:57:21 UTC
For reference, this is the RHV KCS about not supporting software FCoE starting RHV 4.4:
https://access.redhat.com/solutions/5269201


Note You need to log in before you can comment on or make changes to this bug.