Created attachment 1945170 [details] syslog Description of problem: As shown in the attached screencast,anaconda failed to show the fcoe target after I select the CNA and click "Add fcoe disk". Version-Release number of selected component (if applicable): fcoe-utils-1.0.34-3.gitb233050.fc37.x86_64 anaconda-38.21-1.fc38.x86_64.rpm How reproducible: always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1945171 [details] screencast
I see the following in the log,and I try to add inst.selinux=0 ,doesn't work 07:10:27,798 NOTICE audit:AVC avc: denied { create } for pid=2670 comm="fcoemon" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=netlink_scsitransport_socket permissive=1 07:10:27,798 NOTICE kernel:audit: type=1400 audit(1676877027.796:401): avc: denied { create } for pid=2670 comm="fcoemon" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=netlink_scsitransport_socket permissive=1
Created attachment 1945178 [details] syslog with inst.selinux=0
Proposed as a Blocker for 38-final by Fedora user lnie using the blocker tracking app because: This affects: The installer must be able to detect (if possible) and install to supported network-attached storage devices.
The installer runs in a permissive mode, so the SELinux warnings are not relevant. From syslog: 03:56:17,250 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:anaconda.threading:Running Thread: AnaTaskThread-FCOEDiscoverTask-2 (139754307974848) 03:56:17,250 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:anaconda.modules.common.task.task:Discover a FCoE 03:56:17,250 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:blivet:Activating FCoE SAN attached to ens2f1, dcb: True autovlan: True 03:56:17,251 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... systemctl start lldpad.service 03:56:17,270 INFO systemd:Listening on lldpad.socket - Link Layer Discovery Protocol Agent Socket.. 03:56:17,278 INFO systemd:Started lldpad.service - Link Layer Discovery Protocol Agent Daemon.. 03:56:17,280 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0 03:56:17,280 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... lldptool -p 03:56:17,340 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:stdout: 03:56:17,340 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:2841 03:56:17,340 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0 03:56:17,340 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... dcbtool sc ens2f1 dcb on 03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:stdout: 03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Command: #011Set Config 03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Feature: #011DCB State 03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Port: #011ens2f1 03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Status: #011Successful 03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0 03:56:17,352 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... dcbtool sc ens2f1 pfc e:1 a:1 w:1 03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:stdout: 03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Command: #011Set Config 03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Feature: #011Priority Flow Control 03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Port: #011ens2f1 03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Status: #011Successful 03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0 03:56:17,358 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... dcbtool sc ens2f1 app:fcoe e:1 a:1 w:1 03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:stdout: 03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Command: #011Set Config 03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Feature: #011Application FCoE 03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Port: #011ens2f1 03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Status: #011Successful 03:56:17,364 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0 03:56:17,380 DEBUG NetworkManager:<debug> [1676865377.3807] ndisc-lndp[0x55f1e987cce0,"eno0"]: processing libndp events 03:56:18,366 WARNING org.fedoraproject.Anaconda.Modules.Storage:INFO:program:Running... systemctl restart fcoe.service 03:56:18,381 INFO fcoemon:fcoemon: error 9 Bad file descriptor 03:56:18,381 INFO fcoemon:fcoemon: Failed write req D len 1 03:56:18,381 INFO systemd:Stopping fcoe.service - Open-FCoE initiator daemon... 03:56:18,382 INFO systemd:fcoe.service: Deactivated successfully. 03:56:18,393 INFO systemd:Stopped fcoe.service - Open-FCoE initiator daemon. 03:56:18,405 INFO systemd:Starting fcoe.service - Open-FCoE initiator daemon... 03:56:18,410 INFO systemd:Started fcoe.service - Open-FCoE initiator daemon. 03:56:18,412 WARNING org.fedoraproject.Anaconda.Modules.Storage:DEBUG:program:Return code: 0 The fcoemon tool seems to fail. Reassigning.
+4 in https://pagure.io/fedora-qa/blocker-review/issue/1041 , marking accepted.
Chris, can you please take a look at this? It has been sitting here a long time. It is a Fedora 38 final release blocker, which means we need it fixed in the next month or so.
I'm pretty sure this is a network problem on these interfaces and not specific to fcoe. F37 for comparison, start anaconda with inst.sshd and ssh in without interacting with anaconda at all. ens2f0/1 are both connected # nmcli dev DEVICE TYPE STATE CONNECTION eno0 ethernet connected Wired Connection ens2f0 ethernet connected ens2f0 ens2f1 ethernet connected ens2f1 eno1 ethernet unavailable -- lo loopback unmanaged -- fipvlan diagnostic command find the fabric gateways # fipvlan ens2f0 ens2f1 Fibre Channel Forwarders Discovered interface | VLAN | FCF MAC ------------------------------------------ ens2f0 | 802 | 00:05:73:b2:7f:00 ens2f1 | 802 | 00:05:73:b2:7f:00 now let's try that with F38-20230317.n.0 NetworkManager seems to have activated the connections # nmcli dev DEVICE TYPE STATE CONNECTION eno0 ethernet connected Wired Connection ens2f0 ethernet connected ens2f0 ens2f1 ethernet connected ens2f1 lo loopback connected (externally) lo eno1 ethernet unavailable -- but now the link state shows NO-CARRIER and DORMANT (we haven't done anything except query network state at this point, on F37 the network links were "state UP mode DEFAULT") # ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether e8:39:35:2d:e0:b8 brd ff:ff:ff:ff:ff:ff altname enp2s0 3: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DORMANT mode DORMANT group default qlen 1000 link/ether 00:1b:21:59:12:34 brd ff:ff:ff:ff:ff:ff altname enp7s0f0 4: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000 link/ether e8:39:35:2d:e0:b9 brd ff:ff:ff:ff:ff:ff altname enp3s0 5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DORMANT mode DORMANT group default qlen 1000 link/ether 00:1b:21:59:12:35 brd ff:ff:ff:ff:ff:ff altname enp7s0f1 fipvlan fails, because the interface isn't IFF_RUNNING # fipvlan -d ens2f0 fipvlan: creating netlink socket fipvlan: Using libfcoe module parameter interfaces fipvlan: sending RTM_GETLINK dump request fipvlan: RTM_NEWLINK: ifindex 1, type 772, flags 10049 fipvlan: RTM_NEWLINK: ifindex 2, type 1, flags 11043 fipvlan: RTM_NEWLINK: ifindex 3, type 1, flags 11003 fipvlan: RTM_NEWLINK: ifindex 4, type 1, flags 1003 fipvlan: RTM_NEWLINK: ifindex 5, type 1, flags 11003 fipvlan: NLMSG_DONE fipvlan: if 3 not running, starting fipvlan: sending RTM_SETLINK request fipvlan: NLMSG_ERROR (0) Success fipvlan: waiting for IFF_RUNNING [1/20] fipvlan: return from poll 0 fipvlan: if 3 not running, waiting for link up ... fipvlan: waiting for IFF_RUNNING [20/20] fipvlan: return from poll 0 fipvlan: if 3 not running, waiting for link up fipvlan: return from poll 0 fipvlan: if 2: skipping, FIP not ready fipvlan: if 3: skipping, FIP not ready fipvlan: if 4: skipping, FIP not ready fipvlan: if 5: skipping, FIP not ready No Fibre Channel Forwarders or VN2VN Responders Found fipvlan: shutdown if 3 fipvlan: sending RTM_SETLINK request fipvlan: NLMSG_ERROR (0) Success and now lets check NetworkManager again # nmcli dev DEVICE TYPE STATE CONNECTION eno0 ethernet connected Wired Connection ens2f1 ethernet connected ens2f1 lo loopback connected (externally) lo eno1 ethernet unavailable -- ens2f0 ethernet unavailable --
If I stop NetworkManager from managing these interfaces, and reload the driver, things seem better. # nmcli dev set ens1f0 managed no # nmcli dev set ens1f1 managed no # rmmod ixgbe # modprobe ixgbe # fipvlan ens2f0 ensf10 Fibre Channel Forwarders Discovered interface | VLAN | FCF MAC ------------------------------------------ ens2f0 | 802 | 00:05:73:b2:7f:00 ens2f1 | 802 | 00:05:73:b2:7f:00 But, returning to Anaconda and attempting to add an FCoE SAN, and it fails again and returns to the DORMANT state?
Thanks for looking into it. Could it be a kernel issue?
(In reply to Adam Williamson from comment #10) > Thanks for looking into it. Could it be a kernel issue? Could be, I'm not familiar enough the the DORMANT state here. But I could only manage to get the link working by telling NM to stop managing it, and I'm guessing that the Anaconda FCoE connection code might have gone back to requesting NM to active the connection?
I don't know why the device ends up in "NO-CARRIER state DORMANT", and from my understanding that only depends on the NIC and the kernel driver, not on NetworkManager. I'm reassiging this bz to kernel.
What kernel version was last tested with this? Does it reproduce with the rawhide kernel/installer? Does it reproduce with F37 and a 6.1.x kernel? The later might be harder to test, but would be helpful in determining where this regression came in.
There is a comment in the Intel ixgbe driver docs that state the following (which may, or may not, be related to no-carrier). Unable to obtain DHCP lease on boot with Red Hat ----------------------------------------------- In configurations where the auto-negotiation process takes more than 5 seconds, the boot script may fail with the following message: "<ethX>: failed. No link present. Check cable?" This error may occur even though the presence of link can be confirmed using ethtool <ethX>. In this case, try setting "LINKDELAY=30" in /etc/sysconfig/network-scripts/ifdfg-<ethX>. The same issue can occur during a network boot (via PXE) on Red Hat distributions that use the dracut script: "Warning: No carrier detected on interface <ethX>" In this case add "rd.net.timeout.carrier=30" at the kernel command line. NOTE: Link time can vary. Adjust LINKDELAY value accordingly.
Sorry for the late reply,I was on pto. The last known good release version,is...,f33,rawhide is also affected. rd.net.timeout.carrier doesn't help
If last known good is F33, how did you record passes for the FCoE test in F37? https://openqa.fedoraproject.org/testcase_stats/37/Installation/QA_Testcase_install_to_FCoE_target_Storage_devices.html was that with different hardware?
Hi Chris, Please Note: Here is the output from f38 system,*before* ip link set ens2f1/ens2f0 down(and then up) [root@storageqe-13 ~]# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether e8:39:35:2d:e0:b8 brd ff:ff:ff:ff:ff:ff altname enp2s0 3: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000 link/ether e8:39:35:2d:e0:b9 brd ff:ff:ff:ff:ff:ff altname enp3s0 4: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:1b:21:59:12:34 brd ff:ff:ff:ff:ff:ff altname enp7s0f0 5: ens2f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:1b:21:59:12:35 brd ff:ff:ff:ff:ff:ff altname enp7s0f1 [root@storageqe-13 ~]# fipvlan ens2f0 ens2f1 Fibre Channel Forwarders Discovered interface | VLAN | FCF MAC ------------------------------------------ ens2f0 | 802 | 00:05:73:b2:7f:00 ens2f1 | 802 | 00:05:73:b2:7f:00 Here is the "ip link" output from f37 system *after* "ip link set ens2f1/ens2f0 down(and then up)" or "fcoeadm -i" : [root@storageqe-13 ~]# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eno0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether e8:39:35:2d:e0:b8 brd ff:ff:ff:ff:ff:ff altname enp2s0 3: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000 link/ether e8:39:35:2d:e0:b9 brd ff:ff:ff:ff:ff:ff altname enp3s0 4: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DORMANT mode DORMANT group default qlen 1000 link/ether 00:1b:21:59:12:34 brd ff:ff:ff:ff:ff:ff altname enp7s0f0 5: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state DORMANT mode DORMANT group default qlen 1000 link/ether 00:1b:21:59:12:35 brd ff:ff:ff:ff:ff:ff altname enp7s0f1 [root@storageqe-13 ~]# fipvlan -d ens2f0 fipvlan: creating netlink socket fipvlan: Using /sys/bus/fcoe interfaces fipvlan: sending RTM_GETLINK dump request fipvlan: RTM_NEWLINK: ifindex 1, type 772, flags 10049 fipvlan: RTM_NEWLINK: ifindex 2, type 1, flags 11043 fipvlan: RTM_NEWLINK: ifindex 3, type 1, flags 1003 fipvlan: RTM_NEWLINK: ifindex 4, type 1, flags 11003 fipvlan: RTM_NEWLINK: ifindex 5, type 1, flags 11003 fipvlan: NLMSG_DONE fipvlan: if 4 not running, starting fipvlan: sending RTM_SETLINK request fipvlan: NLMSG_ERROR (0) Success fipvlan: waiting for IFF_RUNNING [1/20] fipvlan: return from poll 0 ... fipvlan: if 4 not running, waiting for link up fipvlan: waiting for IFF_RUNNING [20/20] fipvlan: return from poll 0 fipvlan: if 4 not running, waiting for link up fipvlan: return from poll 0 fipvlan: if 2: skipping, FIP not ready fipvlan: if 3: skipping, FIP not ready fipvlan: if 4: skipping, FIP not ready fipvlan: if 5: skipping, FIP not ready No Fibre Channel Forwarders or VN2VN Responders Found fipvlan: shutdown if 4 fipvlan: sending RTM_SETLINK request fipvlan: NLMSG_ERROR (0) Success
> was that with different hardware? Yes,the pass is for bnxfc driver, the ixgbe server was,er,pretty busy.
Can you test F38 with the hardware that you tested F34-F37 on, then? If F38 works there, we probably don't need to treat this as a blocker.
> Can you test F38 with the hardware that you tested F34-F37 on, then? If F38 works there, we probably don't need to treat this as a blocker. F38 works well on the bnxfx servers I tested f34-f37. Here are more informations: I'm not able to create fcoe instance on f34-f38 installed system either,and I guess the original(installer environment) bug will be fixed if fcoe-utils works well on installed system. On f33 system,you will get "up" all the time for /sys/class/net/nic/operstate,you will get "up" on f34-f38 installed system, but you will get "dormant" after you run systemctl start lldpad or ip link set ens2f1/ens2f0 down(and then up). It seems that fcoe-utils depends on that dir?,and that maybe the cause of the problem. I didn't check iproute code,I'm not sure about the NO-CARRIER,but you will get Apr 04 01:31:19 storageqe-13.sqe.lab.eng.bos.redhat.com NetworkManager[780]: <info> [1680586279.0479] dhcp4 (ens2f1): state changed new lease, address=172.17.x.x *** Apr 04 01:31:19 storageqe-13.sqe.lab.eng.bos.redhat.com NetworkManager[780]: <info> [1680586279.0960] device (ens2f1): Activation: successful, device activated. ping and ssh works on that ip address. A side problem: I get *unavailable*(even on bnx2x server),when I run ip link set ens2f1 down,though nmcli dev shows *connected* and ethtool ens2f1 shows "Link detected: yes" NetworkManager[780]: <info> [1680586244.7495] device (ens2f1): state change: ***unavailable*** -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') Please feel free to tell me if you need more information.
Let's discuss the blocker status again: https://pagure.io/fedora-qa/blocker-review/issue/1041#comment-850286
-4 in https://pagure.io/fedora-qa/blocker-review/issue/1041 , marking rejected (with the new info about affected hw).
This message is a reminder that Fedora Linux 38 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 38 on 2024-05-21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '38'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 38 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
lnie - can you retest with f40 or rawhide and update this? Thanks!
Adam,sorry for the late reply,I'm really pretty busy with other stuff,the ixgbe servers I used were obsolete,and the storage QE told me that his team doesn't own any ixgbe server now and likely forever,as ixgbe is eliminated.
OK, let's just close this out, then.