Bug 1575930 - Installation over FCOE results in a system which cannot be booted [NEEDINFO]
Summary: Installation over FCOE results in a system which cannot be booted
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: dracut
Version: 8.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: alpha
: 8.1
Assignee: Lukáš Nykrýn
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks: 1573776 1861898 1767643
TreeView+ depends on / blocked
 
Reported: 2018-05-08 10:29 UTC by Ryan Barry
Modified: 2020-07-29 20:24 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
lnykryn: needinfo? (cleech)


Attachments (Terms of Use)
all log (250.97 KB, application/x-gzip)
2018-05-10 03:15 UTC, cshao
no flags Details
/etc/fcoe/* after reboot (14.67 KB, image/png)
2018-05-10 03:27 UTC, cshao
no flags Details

Description Ryan Barry 2018-05-08 10:29:30 UTC
Description of problem:
After FCOE installations, the resulting initrd does not set AUTO_VLAN for the associated NIC. 

How reproducible:
100%

Steps to Reproduce:
1. Install over FCOE with a target whcih is visible on a VLAN
2. Reboot
3.

Actual results:
No LUNs are visible from dracut, and the system is not bootable

Expected results:
Everything works.

Additional info:
I don't really have much experience with FCOE, so I might be way off here - But 2 things I noticed on that machine that might be worth something:

1. AUTO_VLAN in the initrd (/etc/fcoe/cfg-p5p1) is set to "no" and `fcoeadm -i` shows p5p1 as Offline.

2. Running `fipvlan -dcs p5p1` in initrd will start fcoe, and now fcoeadm will show p5p1 as Online and display a bunch of luns there as well.  After this, running lvm_scan will show all the missing lvs under /dev/rhvh_dell-per730-35/

Comment 2 Radek Vykydal 2018-05-09 08:25:27 UTC
Please attach
- installer logs (/var/log/anaconda/* in the installed system)
- the kickstart file if there was used any
- the content of /etc/fcoe/cfg-* files from initrd and installed system root.

Comment 3 Ryan Barry 2018-05-09 12:03:18 UTC
Chen, can you provide these? I don't have an FCOE environment to test with

Comment 4 cshao 2018-05-10 03:15:51 UTC
Created attachment 1434166 [details]
all log

Comment 5 cshao 2018-05-10 03:17:17 UTC
(In reply to Ryan Barry from comment #3)
> Chen, can you provide these? I don't have an FCOE environment to test with

Sure, already uploaded all log info.

Comment 6 cshao 2018-05-10 03:27:02 UTC
Created attachment 1434167 [details]
/etc/fcoe/* after reboot

Comment 7 Radek Vykydal 2018-05-10 07:58:01 UTC
(In reply to cshao from comment #6)
> Created attachment 1434167 [details]
> /etc/fcoe/* after reboot

I think the value (cfg file) is set by dracut, probably here:
https://github.com/dracutdevs/dracut/blob/RHEL-7/modules.d/95fcoe/fcoe-up.sh#L94

Anaconda is not passing vlan value to dracut as there is no option to do that, and it seems the value is inferred from other values and drivers used.
Related patch:
https://github.com/dracutdevs/dracut/commit/d02f522089863af2a802cef9e63965349bfcc819

Comment 8 Radek Vykydal 2018-05-10 08:00:07 UTC
Asking Chris for ideas.

Comment 9 Lukáš Nykrýn 2018-05-10 09:47:23 UTC
After you install the machine can you add rd.debug to kernel cmdline and after the boots times out get the rdsosreport.txt and put it here?

Comment 10 cshao 2018-05-10 10:52:01 UTC
(In reply to Lukáš Nykrýn from comment #9)
> After you install the machine can you add rd.debug to kernel cmdline and
> after the boots times out get the rdsosreport.txt and put it here?

Already send test env to you by mail.

Comment 11 Lukáš Nykrýn 2018-05-11 11:21:14 UTC
I think we need to backport https://github.com/dracutdevs/dracut/commit/2aac3194100b903740bb9057aed71a35ce92a2e3 , but I would like to have Chris' opinion on that.

Comment 12 Chris Leech 2018-05-11 17:26:36 UTC
(In reply to Lukáš Nykrýn from comment #11)
> I think we need to backport
> https://github.com/dracutdevs/dracut/commit/
> 2aac3194100b903740bb9057aed71a35ce92a2e3 , but I would like to have Chris'
> opinion on that.

That seems reasonable, as switches should reply to VLAN discovery with an ID of 0 and fcoemon looks to enable FCoE on the base interface in that case.

I'm trying to think of where we might run into switches that don't do VLAN discovery at all, and the only place I'm worried about is really old Cisco UCS fnic setups.  And I'm not sure there's going to be an issue there.

Comment 13 Sandro Bonazzola 2018-05-17 11:21:35 UTC
Please consider to backport this to 7.5

Comment 14 Lukáš Nykrýn 2018-05-17 13:16:17 UTC
(In reply to Sandro Bonazzola from comment #13)
> Please consider to backport this to 7.5

I am not 100% confident that the patch can't break anything and given the limited testing during z-stream I don't think it should go there. Also, we don't respin the installation images, so the fix might not be that useful in the end

Comment 15 Sandro Bonazzola 2018-10-09 09:12:03 UTC
(In reply to Lukáš Nykrýn from comment #14)
> (In reply to Sandro Bonazzola from comment #13)
> > Please consider to backport this to 7.5
> 
> I am not 100% confident that the patch can't break anything and given the
> limited testing during z-stream I don't think it should go there. Also, we
> don't respin the installation images, so the fix might not be that useful in
> the end

Any update on this  for 7.6?

Comment 16 Sandro Bonazzola 2018-10-10 15:19:50 UTC
Samantha?

Comment 17 Sandro Bonazzola 2018-11-28 10:33:21 UTC
Missed 7.6, retrying with 7.6.z

Comment 18 Sandro Bonazzola 2019-06-14 07:03:31 UTC
Missed 7.7, retrying with 8.1 for RHV 4.4

Comment 19 Sandro Bonazzola 2019-09-10 08:13:01 UTC
Any update?

Comment 21 Lukáš Nykrýn 2019-11-26 14:52:20 UTC
Looka we should be able to backport this for 8.2.

Comment 22 Lukáš Nykrýn 2019-11-29 12:32:39 UTC
Hmm, it looks we already have this patch in rhel. So I have no idea what to do here.

Comment 23 Lukáš Nykrýn 2019-11-29 12:56:25 UTC
Chris any ideas?

Comment 26 Gianni Salinetti 2020-03-12 16:49:25 UTC
Hi, any updates on this issue? We are following a similar case and every hint can be really helpful.

Comment 27 Sandro Bonazzola 2020-03-18 11:15:53 UTC
Chen, is this still reproducible with RHV-4.4 based on RHEL 8.2?

Comment 28 cshao 2020-03-19 03:14:38 UTC
(In reply to Sandro Bonazzola from comment #27)
> Chen, is this still reproducible with RHV-4.4 based on RHEL 8.2?

Working on this now, will update later.

Comment 29 cshao 2020-03-19 13:32:24 UTC
(In reply to Sandro Bonazzola from comment #27)
> Chen, is this still reproducible with RHV-4.4 based on RHEL 8.2?

Test version:
redhat-virtualization-host-4.4.0-20200318.0.el8_2
fcoe-utils-1.0.32-7.el8.x86_64
imgbased-1.2.8-1.el8ev.noarch

RHVH can't detect FCOE storage at all.
1. Install RHVH-UNSIGNED-ISO-4.4-RHEL-8-20200318.0-RHVH-x86_64-dvd1.iso via anaconda GUI on FCoE storage machine.
2. Specialized & Network disks 
  -> Add a disk 
     -> Add FCoE SAN 
        -> NIC(p5p1/p5p2) 
           -> choose "use auto Vlan"

Test result:
RHVH can't detect FCOE storage at all.

Comment 30 Gianni Salinetti 2020-03-21 18:41:31 UTC
Thank you for testing it, Chen.

In previuos comments, it was clear that the above mentioned patches were already applied in RHEL.

Could it be useful to pass the fcoe boot argument to dracut to be totally sure that the desired interface has been used?

The boot argument is:
fcoe=<edd|interface|MAC>:{dcb|nodcb}:{fabric|vn2vn}

Mac addresses must be lowercase.

Comment 31 Sandro Bonazzola 2020-03-26 07:20:33 UTC
Not blocking RHV 4.4 on this bug but still important for RHV.

Comment 32 cshao 2020-05-21 08:52:06 UTC
Any update?

Comment 33 Michal Skrivanek 2020-06-24 11:10:08 UTC
(In reply to cshao from comment #32)
> Any update?

can you please try again with suggestion from comment #30?

Comment 34 cshao 2020-06-25 13:47:20 UTC
(In reply to Gianni Salinetti from comment #30)
> Thank you for testing it, Chen.
> 
> In previuos comments, it was clear that the above mentioned patches were
> already applied in RHEL.
> 
> Could it be useful to pass the fcoe boot argument to dracut to be totally
> sure that the desired interface has been used?
> 
> The boot argument is:
> fcoe=<edd|interface|MAC>:{dcb|nodcb}:{fabric|vn2vn}
> 
> Mac addresses must be lowercase.



Test version:
RHVH-4.4-20200618.0-RHVH-x86_64-dvd1.iso


RHVH can't detect FCOE storage at all.
1. pass below fcoe boot argument to dract:
   fcoe=<edd|enp7sofo|a0:36:9f:ae:9f:50>:{dcb|nodcb}:{fabric|vn2vn}
1. Install RHVH-4.4-20200618.0-RHVH-x86_64-dvd1.iso via anaconda GUI on FCoE storage machine.
2. Specialized & Network disks 
  -> Add a disk 
     -> Add FCoE SAN 
        -> NIC(enp7sofo) 
           -> choose "use auto Vlan"

Test result:
RHVH can't detect FCOE storage at all.


Note You need to log in before you can comment on or make changes to this bug.