Bug 2226912 - Solarflare SFN8522 adapter loses physical link if tuned is started with profile powersave
Summary: Solarflare SFN8522 adapter loses physical link if tuned is started with profi...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: tuned
Version: 38
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Jaroslav Škarvada
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-27 00:21 UTC by Trevor Hemsley
Modified: 2023-07-27 00:21 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Trevor Hemsley 2023-07-27 00:21:46 UTC
I have recently swapped network cards from a dual port Solarflare SFN6122F  (uses an sfc_siena driver) to a Solarflare SFN8522 (uses sfc) adapter. Shut down, swapped cards, rebooted and have just spent the last 3 hours working out why my network card  no longer has a link detected for more than a few seconds after power on. All cables are the same between the two adapter cards and I have also tried with different cables.

kernel-6.4.4-200.fc38.x86_64
tuned-2.20.0-1.fc38.noarch

This is possibly a tuned bug or maybe it's a kernel bug. Thoguth I'd start with tuned as not running that fixes the immediate problem.

Reproducible: Always

Steps to Reproduce:
1. Install Solarflare SFN8522 and connect using Direct Attach cable to a switch or to another machine (I do both)
2. Install tuned and set to use profile 'powersave'
3. May need a reboot to activate tuned
Actual Results:  
A cold boot (power on) comes up as normal, link is detected, connection established for anything between 1s and about 1 minute then it goes away. Running `ethtool enp9s0f0np0` shows "Link detected: No". Syslog shows

Jul 26 21:48:29 trevor4 kernel: [    5.263140] sfc 0000:09:00.0: Solarflare NIC detected
Jul 26 21:48:29 trevor4 kernel: [    5.269260] sfc 0000:09:00.0: Part Number : SFN8522
Jul 26 21:48:29 trevor4 kernel: [    5.498217] sfc 0000:09:00.1: Solarflare NIC detected
Jul 26 21:48:29 trevor4 kernel: [    5.501776] sfc 0000:09:00.1: Part Number : SFN8522
Jul 26 21:48:29 trevor4 kernel: [    5.522425] sfc 0000:09:00.0 enp9s0f0np0: renamed from eth0
Jul 26 21:48:29 trevor4 kernel: [    5.617192] sfc 0000:09:00.1 enp9s0f1np1: renamed from eth1
Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.0: Solarflare NIC detected
Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.0: Part Number : SFN8522
Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.1: Solarflare NIC detected
Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.1: Part Number : SFN8522
Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.0 enp9s0f0np0: renamed from eth0
Jul 26 21:48:28 trevor4 kernel: sfc 0000:09:00.1 enp9s0f1np1: renamed from eth1
Jul 26 21:48:29 trevor4 kernel: [    7.167012] sfc 0000:09:00.0 enp9s0f0np0: link up at 10000Mbps full-duplex (MTU 1500)
Jul 26 21:48:29 trevor4 kernel: sfc 0000:09:00.0 enp9s0f0np0: link up at 10000Mbps full-duplex (MTU 1500)
Jul 26 21:48:29 trevor4 kernel: [    7.340488] sfc 0000:09:00.1 enp9s0f1np1: link up at 10000Mbps full-duplex (MTU 1500)
Jul 26 21:48:29 trevor4 kernel: sfc 0000:09:00.1 enp9s0f1np1: link up at 10000Mbps full-duplex (MTU 1500)
Jul 26 21:49:20 trevor4 kernel: [   57.446270] sfc 0000:09:00.0 enp9s0f0np0: link down
Jul 26 21:49:20 trevor4 kernel: [   57.446440] sfc 0000:09:00.0 enp9s0f0np0: link down
Jul 26 21:49:20 trevor4 kernel: [   57.488974] sfc 0000:09:00.1 enp9s0f1np1: link down
Jul 26 21:49:20 trevor4 kernel: [   57.489024] sfc 0000:09:00.1 enp9s0f1np1: link down


# sfctool enp9s0f0npo0
Settings for enp9s0f0np0:
	Supported ports: [ FIBRE ]
	Supported link modes:   1000baseT/Full 
	                        1000baseX/Full 
	                        10000baseCR/Full 
	                        10000baseSR/Full 
	                        10000baseLR/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseT/Full 
	                        1000baseX/Full 
	                        10000baseCR/Full 
	                        10000baseSR/Full 
	                        10000baseLR/Full 
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  Not reported
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: No
	Link partner advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Port: FIBRE
	PHYAD: 255
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: d
	Wake-on: d
	Current message level: 0x000020f7 (8439)
			       drv probe link ifdown ifup rx_err tx_err hw
	Link detected: no

At this point I found that the only way to get the Link Detected: yes back was to cold boot the machine using the power button. Ctrl-Alt-Del sometimes seemed to work but the only reliable way to get it working again was to power off/on. This was repeatable on every boot, link would connect, things would work for some time - never more than about one minute, sometimes going away before I could even login to ping things.

I have two of these cards installed, one in a machine with tuned set to profile powersave which exhibits the problem. The other is set to profile virtual-host and does not. I have swapped SFN8522 cards between the 2 systems and both work in the virtual-host system and both fail in the one in powersave mode.

After many many reboots into single user, multi-user, and emergency targets and activating the network manually with `ip`, then bringing up services one by one I found that `systemctl mask tuned` will stop this. I haven't experimented with tuned settings to see if I can get it to stop doing whatever it is that it's doing that breaks this.  I'm just thankful to have a working network connection again!

Expected Results:  
Network connection works reliably without needing power off/on!

	Link detected: yes


While debugging this problem I have used sfboot to reset the dual port adapter to all default settings, upgraded to latest Solarflare firmware:

    Firmware version:   v8.5.2
    Controller type:    Solarflare SFC9200 family
    Controller version: v8.5.0.1002
    Boot ROM version:   v5.2.2.1006
    UEFI ROM version:   v2.9.6.3

I actually use a profile called powersave-nodisk which is set to use the following 

# cat /etc/tuned/powersave-nodisk/tuned.conf 
[main]
summary=Optimize for low power consumption but leave storage alone
include=powersave

[disk]
devices=sdz

I have no sdz, I just wanted it to stop powering down the two spinning rust devices in my mdadm array.


Note You need to log in before you can comment on or make changes to this bug.