Bug 2171350

Summary: anaconda failed to detect the fcoe target(only affects ixgbe)
Product: [Fedora] Fedora Reporter: lnie <lnie>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 38CC: acaringi, adscvr, airlied, alciregi, anaconda-maint-list, awilliam, bgalvani, bskeggs, cleech, dcbw, ferferna, gary.buhrmaster, gnome-sig, hdegoede, hpa, jarodwilson, jforbes, jglisse, josef, kernel-maint, kparal, lgoncalv, liangwen12year, linville, lkundrak, masami256, mchehab, mclasen, ptalbert, robatino, rstrode, sandmann, steved, vbubela, vponcova, vslavik, vtrefny, w
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: RejectedBlocker
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
syslog
none
screencast
none
syslog with inst.selinux=0 none

Description lnie 2023-02-20 05:11:09 UTC
Created attachment 1945170 [details]
syslog

Description of problem:
As shown in the attached screencast,anaconda failed to show the fcoe target after 
I select the CNA and click "Add fcoe disk".

Version-Release number of selected component (if applicable):
fcoe-utils-1.0.34-3.gitb233050.fc37.x86_64
anaconda-38.21-1.fc38.x86_64.rpm 

How reproducible:
always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 lnie 2023-02-20 05:12:54 UTC
Created attachment 1945171 [details]
screencast

Comment 2 lnie 2023-02-20 07:41:27 UTC
I see  the following in the log,and I try to add inst.selinux=0 ,doesn't work
07:10:27,798 NOTICE audit:AVC avc:  denied  { create } for  pid=2670 comm="fcoemon" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=netlink_scsitransport_socket permissive=1
07:10:27,798 NOTICE kernel:audit: type=1400 audit(1676877027.796:401): avc:  denied  { create } for  pid=2670 comm="fcoemon" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=netlink_scsitransport_socket permissive=1

Comment 3 lnie 2023-02-20 07:46:25 UTC
Created attachment 1945178 [details]
syslog with inst.selinux=0

Comment 4 Fedora Blocker Bugs Application 2023-02-21 09:44:00 UTC
Proposed as a Blocker for 38-final by Fedora user lnie using the blocker tracking app because:

 This affects:
The installer must be able to detect (if possible) and install to supported network-attached storage devices.

Comment 5 Vendula Poncova 2023-02-21 14:58:30 UTC
The installer runs in a permissive mode, so the SELinux warnings are not relevant.

From syslog:

03:56:17,250 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:anaconda.threading:Running Thread: AnaTaskThread-FCOEDiscoverTask-2 (139754307974848)
03:56:17,250 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:anaconda.modules.common.task.task:Discover a FCoE
03:56:17,250 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:blivet:Activating FCoE SAN attached to ens2f1, dcb: True autovlan: True
03:56:17,251 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... systemctl start lldpad.service
03:56:17,270 INFO systemd:Listening on lldpad.socket - Link Layer Discovery Protocol Agent Socket..
03:56:17,278 INFO systemd:Started lldpad.service - Link Layer Discovery Protocol Agent Daemon..
03:56:17,280 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0
03:56:17,280 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... lldptool -p
03:56:17,340 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:stdout:
03:56:17,340 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:2841
03:56:17,340 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0
03:56:17,340 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... dcbtool sc ens2f1 dcb on
03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:stdout:
03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Command:   #011Set Config
03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Feature:   #011DCB State
03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Port:      #011ens2f1
03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Status:    #011Successful
03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0
03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... dcbtool sc ens2f1 pfc e:1 a:1 w:1
03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:stdout:
03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Command:   #011Set Config
03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Feature:   #011Priority Flow Control
03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Port:      #011ens2f1
03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Status:    #011Successful
03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0
03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... dcbtool sc ens2f1 app:fcoe e:1 a:1 w:1
03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:stdout:
03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Command:   #011Set Config
03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Feature:   #011Application FCoE
03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Port:      #011ens2f1
03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Status:    #011Successful
03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0
03:56:17,380 DEBUG NetworkManager:<debug> [1676865377.3807] ndisc-lndp[0x55f1e987cce0,"eno0"]: processing libndp events
03:56:18,366 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... systemctl restart fcoe.service
03:56:18,381 INFO fcoemon:fcoemon: error 9 Bad file descriptor
03:56:18,381 INFO fcoemon:fcoemon: Failed write req D len 1
03:56:18,381 INFO systemd:Stopping fcoe.service - Open-FCoE initiator daemon...
03:56:18,382 INFO systemd:fcoe.service: Deactivated successfully.
03:56:18,393 INFO systemd:Stopped fcoe.service - Open-FCoE initiator daemon.
03:56:18,405 INFO systemd:Starting fcoe.service - Open-FCoE initiator daemon...
03:56:18,410 INFO systemd:Started fcoe.service - Open-FCoE initiator daemon.
03:56:18,412 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0

The fcoemon tool seems to fail. Reassigning.

Comment 6 Adam Williamson 2023-03-07 23:15:50 UTC
+4 in https://pagure.io/fedora-qa/blocker-review/issue/1041 , marking accepted.

Comment 7 Adam Williamson 2023-03-13 17:55:27 UTC
Chris, can you please take a look at this? It has been sitting here a long time. It is a Fedora 38 final release blocker, which means we need it fixed in the next month or so.

Comment 8 Chris Leech 2023-03-17 19:22:14 UTC
I'm pretty sure this is a network problem on these interfaces and not specific to fcoe.

F37 for comparison, start anaconda with inst.sshd and ssh in without interacting with anaconda at all.

ens2f0/1 are both connected

# nmcli dev
DEVICE  TYPE      STATE        CONNECTION       
eno0    ethernet  connected    Wired Connection 
ens2f0  ethernet  connected    ens2f0           
ens2f1  ethernet  connected    ens2f1           
eno1    ethernet  unavailable  --               
lo      loopback  unmanaged    -- 

fipvlan diagnostic command find the fabric gateways

# fipvlan ens2f0 ens2f1
Fibre Channel Forwarders Discovered
interface       | VLAN | FCF MAC          
------------------------------------------
ens2f0          | 802  | 00:05:73:b2:7f:00
ens2f1          | 802  | 00:05:73:b2:7f:00


now let's try that with F38-20230317.n.0

NetworkManager seems to have activated the connections

# nmcli dev
DEVICE  TYPE      STATE                   CONNECTION       
eno0    ethernet  connected               Wired Connection 
ens2f0  ethernet  connected               ens2f0           
ens2f1  ethernet  connected               ens2f1           
lo      loopback  connected (externally)  lo               
eno1    ethernet  unavailable             --          

but now the link state shows NO-CARRIER and DORMANT
(we haven't done anything except query network state at this point, on F37 the network links were "state UP mode DEFAULT")
     
# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether e8:39:35:2d:e0:b8 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
3: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DORMANT mode DORMANT group default qlen 1000
    link/ether 00:1b:21:59:12:34 brd ff:ff:ff:ff:ff:ff
    altname enp7s0f0
4: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether e8:39:35:2d:e0:b9 brd ff:ff:ff:ff:ff:ff
    altname enp3s0
5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DORMANT mode DORMANT group default qlen 1000
    link/ether 00:1b:21:59:12:35 brd ff:ff:ff:ff:ff:ff
    altname enp7s0f1

fipvlan fails, because the interface isn't IFF_RUNNING

# fipvlan -d ens2f0
fipvlan: creating netlink socket
fipvlan: Using libfcoe module parameter interfaces
fipvlan: sending RTM_GETLINK dump request
fipvlan: RTM_NEWLINK: ifindex 1, type 772, flags 10049
fipvlan: RTM_NEWLINK: ifindex 2, type 1, flags 11043
fipvlan: RTM_NEWLINK: ifindex 3, type 1, flags 11003
fipvlan: RTM_NEWLINK: ifindex 4, type 1, flags 1003
fipvlan: RTM_NEWLINK: ifindex 5, type 1, flags 11003
fipvlan: NLMSG_DONE
fipvlan: if 3 not running, starting
fipvlan: sending RTM_SETLINK request
fipvlan: NLMSG_ERROR (0) Success
fipvlan: waiting for IFF_RUNNING [1/20]
fipvlan: return from poll 0
fipvlan: if 3 not running, waiting for link up
...
fipvlan: waiting for IFF_RUNNING [20/20]
fipvlan: return from poll 0
fipvlan: if 3 not running, waiting for link up
fipvlan: return from poll 0
fipvlan: if 2: skipping, FIP not ready
fipvlan: if 3: skipping, FIP not ready
fipvlan: if 4: skipping, FIP not ready
fipvlan: if 5: skipping, FIP not ready
No Fibre Channel Forwarders or VN2VN Responders Found
fipvlan: shutdown if 3
fipvlan: sending RTM_SETLINK request
fipvlan: NLMSG_ERROR (0) Success

and now lets check NetworkManager again

# nmcli dev
DEVICE  TYPE      STATE                   CONNECTION       
eno0    ethernet  connected               Wired Connection 
ens2f1  ethernet  connected               ens2f1           
lo      loopback  connected (externally)  lo               
eno1    ethernet  unavailable             --               
ens2f0  ethernet  unavailable             --

Comment 9 Chris Leech 2023-03-17 19:37:11 UTC
If I stop NetworkManager from managing these interfaces, and reload the driver, things seem better.

# nmcli dev set ens1f0 managed no
# nmcli dev set ens1f1 managed no
# rmmod ixgbe
# modprobe ixgbe

# fipvlan ens2f0 ensf10
Fibre Channel Forwarders Discovered
interface       | VLAN | FCF MAC          
------------------------------------------
ens2f0          | 802  | 00:05:73:b2:7f:00
ens2f1          | 802  | 00:05:73:b2:7f:00


But, returning to Anaconda and attempting to add an FCoE SAN, and it fails again and returns to the DORMANT state?

Comment 10 Adam Williamson 2023-03-18 03:50:39 UTC
Thanks for looking into it. Could it be a kernel issue?

Comment 11 Chris Leech 2023-03-19 00:56:37 UTC
(In reply to Adam Williamson from comment #10)
> Thanks for looking into it. Could it be a kernel issue?

Could be, I'm not familiar enough the the DORMANT state here.  But I could only manage to get the link working by telling NM to stop managing it, and I'm guessing that the Anaconda FCoE connection code might have gone back to requesting NM to active the connection?

Comment 12 Beniamino Galvani 2023-03-22 14:00:51 UTC
I don't know why the device ends up in "NO-CARRIER state DORMANT", and from my understanding that only depends on the NIC and the kernel driver, not on NetworkManager.

I'm reassiging this bz to kernel.

Comment 13 Justin M. Forbes 2023-03-27 16:51:23 UTC
What kernel version was last tested with this?  Does it reproduce with the rawhide kernel/installer?  Does it reproduce with F37 and a 6.1.x kernel? The later might be harder to test, but would be helpful in determining where this regression came in.

Comment 14 Gary Buhrmaster 2023-03-30 16:00:14 UTC
There is a comment in the Intel ixgbe driver docs that state the following (which may, or may not, be related to no-carrier).



Unable to obtain DHCP lease on boot with Red Hat
-----------------------------------------------
In configurations where the auto-negotiation process takes more than 5 seconds,
the boot script may fail with the following message:
"<ethX>: failed. No link present. Check cable?"

This error may occur even though the presence of link can be confirmed using
ethtool <ethX>. In this case, try setting "LINKDELAY=30" in
/etc/sysconfig/network-scripts/ifdfg-<ethX>.

The same issue can occur during a network boot (via PXE) on Red Hat
distributions that use the dracut script:
"Warning: No carrier detected on interface <ethX>"

In this case add "rd.net.timeout.carrier=30" at the kernel command line.

NOTE: Link time can vary. Adjust LINKDELAY value accordingly.

Comment 15 lnie 2023-04-03 07:14:12 UTC
Sorry for the late reply,I was on pto.
The last known good release version,is...,f33,rawhide is also affected.
rd.net.timeout.carrier doesn't help

Comment 16 Adam Williamson 2023-04-03 15:25:10 UTC
If last known good is F33, how did you record passes for the FCoE test in F37?

https://openqa.fedoraproject.org/testcase_stats/37/Installation/QA_Testcase_install_to_FCoE_target_Storage_devices.html

was that with different hardware?

Comment 17 lnie 2023-04-03 15:48:54 UTC
Hi Chris,
Please Note:
Here is the output from f38 system,*before* ip link set ens2f1/ens2f0 down(and then up)
[root@storageqe-13 ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether e8:39:35:2d:e0:b8 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
3: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether e8:39:35:2d:e0:b9 brd ff:ff:ff:ff:ff:ff
    altname enp3s0
4: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:1b:21:59:12:34 brd ff:ff:ff:ff:ff:ff
    altname enp7s0f0
5: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:1b:21:59:12:35 brd ff:ff:ff:ff:ff:ff
    altname enp7s0f1
[root@storageqe-13 ~]# fipvlan ens2f0 ens2f1
Fibre Channel Forwarders Discovered
interface       | VLAN | FCF MAC          
------------------------------------------
ens2f0          | 802  | 00:05:73:b2:7f:00
ens2f1          | 802  | 00:05:73:b2:7f:00


Here is the "ip link" output from f37 system *after* "ip link set ens2f1/ens2f0 down(and then up)" or "fcoeadm -i" :
[root@storageqe-13 ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether e8:39:35:2d:e0:b8 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
3: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
    link/ether e8:39:35:2d:e0:b9 brd ff:ff:ff:ff:ff:ff
    altname enp3s0
4: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DORMANT mode DORMANT group default qlen 1000
    link/ether 00:1b:21:59:12:34 brd ff:ff:ff:ff:ff:ff
    altname enp7s0f0
5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DORMANT mode DORMANT group default qlen 1000
    link/ether 00:1b:21:59:12:35 brd ff:ff:ff:ff:ff:ff
    altname enp7s0f1

[root@storageqe-13 ~]# fipvlan -d ens2f0
fipvlan: creating netlink socket
fipvlan: Using /sys/bus/fcoe interfaces
fipvlan: sending RTM_GETLINK dump request
fipvlan: RTM_NEWLINK: ifindex 1, type 772, flags 10049
fipvlan: RTM_NEWLINK: ifindex 2, type 1, flags 11043
fipvlan: RTM_NEWLINK: ifindex 3, type 1, flags 1003
fipvlan: RTM_NEWLINK: ifindex 4, type 1, flags 11003
fipvlan: RTM_NEWLINK: ifindex 5, type 1, flags 11003
fipvlan: NLMSG_DONE
fipvlan: if 4 not running, starting
fipvlan: sending RTM_SETLINK request
fipvlan: NLMSG_ERROR (0) Success
fipvlan: waiting for IFF_RUNNING [1/20]
fipvlan: return from poll 0
...
fipvlan: if 4 not running, waiting for link up
fipvlan: waiting for IFF_RUNNING [20/20]
fipvlan: return from poll 0
fipvlan: if 4 not running, waiting for link up
fipvlan: return from poll 0
fipvlan: if 2: skipping, FIP not ready
fipvlan: if 3: skipping, FIP not ready
fipvlan: if 4: skipping, FIP not ready
fipvlan: if 5: skipping, FIP not ready
No Fibre Channel Forwarders or VN2VN Responders Found
fipvlan: shutdown if 4
fipvlan: sending RTM_SETLINK request
fipvlan: NLMSG_ERROR (0) Success

Comment 18 lnie 2023-04-03 15:52:23 UTC
> was that with different hardware?
Yes,the pass is for bnxfc driver, the ixgbe server was,er,pretty busy.

Comment 19 Adam Williamson 2023-04-03 15:55:05 UTC
Can you test F38 with the hardware that you tested F34-F37 on, then? If F38 works there, we probably don't need to treat this as a blocker.

Comment 20 lnie 2023-04-04 06:01:28 UTC
> Can you test F38 with the hardware that you tested F34-F37 on, then? If F38 works there, we probably don't need to treat this as a blocker.

F38 works well on the bnxfx servers I tested f34-f37.

Here are more informations:
I'm not able to create fcoe instance on f34-f38 installed system either,and I guess the original(installer environment) bug will be fixed if fcoe-utils works well on installed system.

On f33 system,you will get "up" all the time for /sys/class/net/nic/operstate,you will get "up" on f34-f38 installed system,
but you will get "dormant" after you run systemctl start lldpad or ip link set ens2f1/ens2f0 down(and then up).

It seems that fcoe-utils depends on that dir?,and that maybe the cause of the problem.
I didn't check iproute code,I'm not sure about the NO-CARRIER,but you will get 
Apr 04 01:31:19 storageqe-13.sqe.lab.eng.bos.redhat.com NetworkManager[780]: <info>  [1680586279.0479] dhcp4 (ens2f1): state changed new lease, address=172.17.x.x
***
Apr 04 01:31:19 storageqe-13.sqe.lab.eng.bos.redhat.com NetworkManager[780]: <info>  [1680586279.0960] device (ens2f1): Activation: successful, device activated.

ping and ssh works on that ip address. 

A side problem: I get *unavailable*(even on bnx2x server),when I run ip link set ens2f1 down,though nmcli dev shows *connected* and ethtool ens2f1 shows "Link detected: yes"
 
NetworkManager[780]: <info>  [1680586244.7495] device (ens2f1): state change:  ***unavailable*** -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')

Please feel free to tell me if you need more information.

Comment 22 Kamil Páral 2023-04-06 08:55:06 UTC
Let's discuss the blocker status again:
https://pagure.io/fedora-qa/blocker-review/issue/1041#comment-850286

Comment 23 Adam Williamson 2023-04-06 19:03:13 UTC
-4 in https://pagure.io/fedora-qa/blocker-review/issue/1041 , marking rejected (with the new info about affected hw).