Bug 2226912
| Summary: | Solarflare SFN8522 adapter loses physical link if tuned is started with profile powersave | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Trevor Hemsley <trevor.hemsley> |
| Component: | tuned | Assignee: | Jaroslav Škarvada <jskarvad> |
| Status: | NEW --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 38 | CC: | jskarvad, jzerdik, olysonek-foss |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | --- | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I have recently swapped network cards from a dual port Solarflare SFN6122F (uses an sfc_siena driver) to a Solarflare SFN8522 (uses sfc) adapter. Shut down, swapped cards, rebooted and have just spent the last 3 hours working out why my network card no longer has a link detected for more than a few seconds after power on. All cables are the same between the two adapter cards and I have also tried with different cables. kernel-6.4.4-200.fc38.x86_64 tuned-2.20.0-1.fc38.noarch This is possibly a tuned bug or maybe it's a kernel bug. Thoguth I'd start with tuned as not running that fixes the immediate problem. Reproducible: Always Steps to Reproduce: 1. Install Solarflare SFN8522 and connect using Direct Attach cable to a switch or to another machine (I do both) 2. Install tuned and set to use profile 'powersave' 3. May need a reboot to activate tuned Actual Results: A cold boot (power on) comes up as normal, link is detected, connection established for anything between 1s and about 1 minute then it goes away. Running `ethtool enp9s0f0np0` shows "Link detected: No". Syslog shows Jul 26 21:48:29 trevor4 kernel: [ 5.263140] sfc 0000:09:00.0: Solarflare NIC detected Jul 26 21:48:29 trevor4 kernel: [ 5.269260] sfc 0000:09:00.0: Part Number : SFN8522 Jul 26 21:48:29 trevor4 kernel: [ 5.498217] sfc 0000:09:00.1: Solarflare NIC detected Jul 26 21:48:29 trevor4 kernel: [ 5.501776] sfc 0000:09:00.1: Part Number : SFN8522 Jul 26 21:48:29 trevor4 kernel: [ 5.522425] sfc 0000:09:00.0 enp9s0f0np0: renamed from eth0 Jul 26 21:48:29 trevor4 kernel: [ 5.617192] sfc 0000:09:00.1 enp9s0f1np1: renamed from eth1 Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.0: Solarflare NIC detected Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.0: Part Number : SFN8522 Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.1: Solarflare NIC detected Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.1: Part Number : SFN8522 Jul 26 21:48:27 trevor4 kernel: sfc 0000:09:00.0 enp9s0f0np0: renamed from eth0 Jul 26 21:48:28 trevor4 kernel: sfc 0000:09:00.1 enp9s0f1np1: renamed from eth1 Jul 26 21:48:29 trevor4 kernel: [ 7.167012] sfc 0000:09:00.0 enp9s0f0np0: link up at 10000Mbps full-duplex (MTU 1500) Jul 26 21:48:29 trevor4 kernel: sfc 0000:09:00.0 enp9s0f0np0: link up at 10000Mbps full-duplex (MTU 1500) Jul 26 21:48:29 trevor4 kernel: [ 7.340488] sfc 0000:09:00.1 enp9s0f1np1: link up at 10000Mbps full-duplex (MTU 1500) Jul 26 21:48:29 trevor4 kernel: sfc 0000:09:00.1 enp9s0f1np1: link up at 10000Mbps full-duplex (MTU 1500) Jul 26 21:49:20 trevor4 kernel: [ 57.446270] sfc 0000:09:00.0 enp9s0f0np0: link down Jul 26 21:49:20 trevor4 kernel: [ 57.446440] sfc 0000:09:00.0 enp9s0f0np0: link down Jul 26 21:49:20 trevor4 kernel: [ 57.488974] sfc 0000:09:00.1 enp9s0f1np1: link down Jul 26 21:49:20 trevor4 kernel: [ 57.489024] sfc 0000:09:00.1 enp9s0f1np1: link down # sfctool enp9s0f0npo0 Settings for enp9s0f0np0: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 1000baseX/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 1000baseT/Full 1000baseX/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Link partner advertised link modes: Not reported Link partner advertised pause frame use: No Link partner advertised auto-negotiation: No Link partner advertised FEC modes: Not reported Speed: 10000Mb/s Duplex: Full Port: FIBRE PHYAD: 255 Transceiver: internal Auto-negotiation: on Supports Wake-on: d Wake-on: d Current message level: 0x000020f7 (8439) drv probe link ifdown ifup rx_err tx_err hw Link detected: no At this point I found that the only way to get the Link Detected: yes back was to cold boot the machine using the power button. Ctrl-Alt-Del sometimes seemed to work but the only reliable way to get it working again was to power off/on. This was repeatable on every boot, link would connect, things would work for some time - never more than about one minute, sometimes going away before I could even login to ping things. I have two of these cards installed, one in a machine with tuned set to profile powersave which exhibits the problem. The other is set to profile virtual-host and does not. I have swapped SFN8522 cards between the 2 systems and both work in the virtual-host system and both fail in the one in powersave mode. After many many reboots into single user, multi-user, and emergency targets and activating the network manually with `ip`, then bringing up services one by one I found that `systemctl mask tuned` will stop this. I haven't experimented with tuned settings to see if I can get it to stop doing whatever it is that it's doing that breaks this. I'm just thankful to have a working network connection again! Expected Results: Network connection works reliably without needing power off/on! Link detected: yes While debugging this problem I have used sfboot to reset the dual port adapter to all default settings, upgraded to latest Solarflare firmware: Firmware version: v8.5.2 Controller type: Solarflare SFC9200 family Controller version: v8.5.0.1002 Boot ROM version: v5.2.2.1006 UEFI ROM version: v2.9.6.3 I actually use a profile called powersave-nodisk which is set to use the following # cat /etc/tuned/powersave-nodisk/tuned.conf [main] summary=Optimize for low power consumption but leave storage alone include=powersave [disk] devices=sdz I have no sdz, I just wanted it to stop powering down the two spinning rust devices in my mdadm array.