Bug 1573776

Summary: [blocked] Enter dracut mode after reboot RHVH with FCoE storage
Product: Red Hat Enterprise Virtualization Manager Reporter: cshao <cshao>
Component: redhat-virtualization-hostAssignee: Sandro Bonazzola <sbonazzo>
Status: CLOSED WONTFIX QA Contact: cshao <cshao>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.3CC: cshao, dfediuck, huzhao, jkonecny, mgoldboi, michal.skrivanek, qiyuan, sbueno, weiwang, yaniwang, ycui
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-24 08:57:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1575930    
Bug Blocks:    
Attachments:
Description Flags
rdsosreport.png & journalctl.png
none
/tmp/* grub.cfg none

Description cshao 2018-05-02 09:01:06 UTC
Created attachment 1429801 [details]
rdsosreport.png & journalctl.png

Description of problem:
Enter dracut mode after reboot RHVH with FCoE storage

Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.2-20180430.0 
fcoe-utils-1.0.32-1.el7.x86_64
imgbased-1.0.14-0.1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install RHVH on FCoE storage.
2. Specialized & Network disks 
  -> Add a disk 
     -> Add FCoE SAN 
        -> NIC(p5p1/p5p2) 
           -> choose "use auto Vlan"
3. Choose the FCoE boot lun to install RHVH.
4. Choose auto partitioning on Anaconda GUI, and finish other mandatory steps. 
5. Reboot RHVH.

Actual results:
RHVH enter emergency mode.

Expected results:
Login RHVH can successful without error.

Additional info:

Comment 1 Ryan Barry 2018-05-02 10:23:35 UTC
Chen, does this happen if it's not a vlan?

This looks like it may be another 7.5 vlan bug

Comment 2 cshao 2018-05-02 10:30:09 UTC
(In reply to Ryan Barry from comment #1)
> Chen, does this happen if it's not a vlan?
> 
> This looks like it may be another 7.5 vlan bug

Ryan,

We have 2 FCoE machines which both have vlan, is there other ways can be used to debug? If no, I will send ticket to admin for pull out the vlan connection.

Thanks.

Comment 3 Ryan Barry 2018-05-03 02:34:49 UTC
Samantha - any known problems here?

Comment 4 Jiri Konecny 2018-05-07 10:30:59 UTC
Hello,

We have a few FCoE bugs reported so it could be related. However, I'm not able to tell you much from those pictures.

Could you please cshao provide installation logs as plain text files from /tmp/*.log at the end of the installation? Also if you can provide a grub command line used for the failed system boot it would be great.

Thank you.

Comment 5 cshao 2018-05-07 11:11:19 UTC
Created attachment 1432575 [details]
/tmp/*   grub.cfg

Comment 6 Yuval Turgeman 2018-05-07 20:51:57 UTC
I don't really have much experience with FCOE, so I might be way off here - But 2 things I noticed on that machine that might be worth something:

1. AUTO_VLAN in the initrd (/etc/fcoe/cfg-p5p1) is set to "no" and `fcoeadm -i` shows p5p1 as Offline.

2. Running `fipvlan -dcs p5p1` in initrd will start fcoe, and now fcoeadm will show p5p1 as Online and display a bunch of luns there as well.  After this, running lvm_scan will show all the missing lvs under /dev/rhvh_dell-per730-35/

Comment 7 Ryan Barry 2018-05-07 21:40:59 UTC
I think this is very likely.

Or, rather, it's likely the one or both of the following is true:

* fcoe= is not present on the cmdline
* the dracut fcoe hook is missing

Comment 8 Yuval Turgeman 2018-05-08 03:59:18 UTC
fcoe= looks ok in cmdline, don't know about the hook, I'll check today

Comment 9 Yuval Turgeman 2018-05-08 07:30:23 UTC
dracut fcoe modules are there also (from dracut-network), I wonder does this work with a regular RHEL ?

Comment 10 cshao 2018-05-08 10:10:49 UTC
(In reply to Yuval Turgeman from comment #9)
> dracut fcoe modules are there also (from dracut-network), I wonder does this
> work with a regular RHEL ?

Can reproduce this issue with RHEL 75 host.
Already sent test env to you by mail.

Comment 11 Ryan Barry 2018-05-08 10:26:34 UTC
Is this actually a regression in RHV-H? Has this ever worked?

If this is reproducible on RHEL, I'll open a platform bug and we'll block on it

Comment 12 cshao 2018-05-09 02:28:15 UTC
(In reply to Ryan Barry from comment #11)
> Is this actually a regression in RHV-H? 
Not a regression issue.

> Has this ever worked?
No, install RHVH with FCoE storage can successful, but failed to boot(enter dracut).
 

> If this is reproducible on RHEL, I'll open a platform bug and we'll block on
> it
Agree with you.

Comment 13 cshao 2018-05-29 09:56:55 UTC
remove regression keyword according #c12

Comment 14 Ryan Barry 2018-06-05 09:10:32 UTC
Moving out while we wait for Anaconda

Comment 16 Sandro Bonazzola 2020-03-11 08:27:05 UTC
Dependent bug has been dropped from 8.2, not going to be fixed for RHV 4.4

Comment 17 cshao 2020-03-19 13:31:55 UTC
Test version:
redhat-virtualization-host-4.4.0-20200318.0.el8_2
fcoe-utils-1.0.32-7.el8.x86_64
imgbased-1.2.8-1.el8ev.noarch

RHVH can't detect FCOE storage at all.
1. Install RHVH-UNSIGNED-ISO-4.4-RHEL-8-20200318.0-RHVH-x86_64-dvd1.iso via anaconda GUI on FCoE storage machine.
2. Specialized & Network disks 
  -> Add a disk 
     -> Add FCoE SAN 
        -> NIC(p5p1/p5p2) 
           -> choose "use auto Vlan"

Test result:
RHVH can't detect FCOE storage at all.

Comment 18 Sandro Bonazzola 2021-02-04 08:28:47 UTC
Let's give this a try again when RHEL 8.4  will be available

Comment 21 Sandro Bonazzola 2021-03-24 08:57:21 UTC
dependent bug has been closed wontfix, closing this one accordingly.