Created attachment 802064 [details] Callstack SS 1/2 Description of problem: Since updating to 3.11 the ALX driver no longer successfully resumes from suspend. Post resume the kernel logs gets flooded by the following message (in the 1000's): alx 0000:04:00.0: invalid PHY speed/duplex: 0xffff .... Attempting to shutdown the machine will result in slowpath OOPs (SS attached, sorry for the poor quality, no serial port on this laptop). Switching back to 3.10 solves the issue. Version-Release number of selected component (if applicable): 3.11.1-200.fc19.x86_64 Additional info: Also reported by Ubuntu users: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1213009
*** Bug 1011777 has been marked as a duplicate of this bug. ***
I emailed upstream about this. It's possible this was introduced when WoL support was removed, but I'm unsure at the moment. http://thread.gmane.org/gmane.linux.network/284929
Let me know if you want me to open an upstream bug about it. Thanks. - Gilboa
The problem persists in the latest kernel: ... [ 2992.767858] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 2992.768989] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 2992.770126] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 2992.771228] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 2992.772355] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 2992.773521] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 2992.774662] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 2992.775797] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 2992.776924] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [neteler@oboe ~]$ uname -a Linux oboe.localdomain 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27 19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux I need to continue to use 3.10.6 which is the last working kernel in F19 on since the following 3.11 kernels were affected by bug 917081.
The callstack is helpful, could you please upload the dmesg output so I can get a bit fuller picture. The output of this would help me too (just the part with the alx device is all I need). lspci -vvnn
Created attachment 807162 [details] lspci log. (Slowpath OPPs callstack already attached as screenshot, due to the lack of serial port on the laptop)
Created attachment 807164 [details] dmesg (boot) dmesg post resume is useless (invalid PHY speed/duplex: 0xffff by the millions) Here's a fresh dmesg (pre-suspend). - Gilboa
I see the same symptoms as the original report using the 3.11.3-201.fc19.x86_64 kernel. I ended up with over 5GiB of these "invalid PHY speed/duplex" lines in /var/log/messages after a suspend. I'm using an ASUS X202E (Intel i3-3217U and Atheros AR816x/AR817x).
Same here, after suspending my CPU was at full speed writing on journal... Using a 3.10 kernel solved the issue by now.
So it appears a regression has occurred from 3.10 to 3.11. Small change list c3eb7a7 alx: remove redundant D0 power state set a8798a5 alx: fix lockdep annotation bc2bebe alx: remove WoL support 7ec5689 alx: fix ethtool support code 46ab9b3 alx: fix MAC address alignment problem a5b87cc alx: separate link speed/duplex fields 4a134c3 alx: make sizes unsigned 17fdd35 alx: fix 100mbit/half duplex speed translation ef0cc4b alx: treat flow control correctly in alx_set_pauseparam() Educated guess is the problem is one of the above.. c3eb7a7 alx: remove redundant D0 power state set 7ec5689 alx: fix ethtool support code a5b87cc alx: separate link speed/duplex fields 17fdd35 alx: fix 100mbit/half duplex speed translation a5b87cc alx: separate link speed/duplex fields Do any of you have the ability to build and test kernels? Try to reverting these on 3.11...
Looks like it is still an issue with a kernel upgrade in F20 Linux localhost.localdomain 3.11.4-302.fc20.x86_64 #1 SMP Fri Oct 11 17:43:41 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Also this looks like the bugreport upstream: https://bugzilla.kernel.org/show_bug.cgi?id=62491 As of Oct 11 they had know idea what it is.
(In reply to John Greene from comment #10) > Do any of you have the ability to build and test kernels? > Try to reverting these on 3.11... I'll try and free some time next week to revert each patch and see what breaks. - Gilboa
Still unsolved: [ 928.222244] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 928.223338] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 928.224448] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 928.225572] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 928.226722] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 928.227844] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 928.228971] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 928.230081] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff [ 928.231190] alx 0000:03:00.0: invalid PHY speed/duplex: 0xffff uname -a Linux oboe.localdomain 3.11.4-201.fc19.x86_64 #1 SMP Thu Oct 10 14:11:18 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux The only usable kernel for F19 remains 3.10.6. I will preserve it carefully...
Any news / progress on this. Just had yet another hit, 2nd in two days.
No fix at this point upstream, but I did some additional looking at the history. It appears that the 3.10.x family all seem to share the same history wrt the ALX driver. You may be able to (In reply to John Greene from comment #10) > So it appears a regression has occurred from 3.10 to 3.11. > > Small change list > c3eb7a7 alx: remove redundant D0 power state set > a8798a5 alx: fix lockdep annotation > bc2bebe alx: remove WoL support > 7ec5689 alx: fix ethtool support code > 46ab9b3 alx: fix MAC address alignment problem > a5b87cc alx: separate link speed/duplex fields > 4a134c3 alx: make sizes unsigned > 17fdd35 alx: fix 100mbit/half duplex speed translation > ef0cc4b alx: treat flow control correctly in alx_set_pauseparam() > > Educated guess is the problem is one of the above.. > c3eb7a7 alx: remove redundant D0 power state set > 7ec5689 alx: fix ethtool support code > a5b87cc alx: separate link speed/duplex fields > 17fdd35 alx: fix 100mbit/half duplex speed translation > a5b87cc alx: separate link speed/duplex fields > > Do any of you have the ability to build and test kernels? > > Try to reverting these on 3.11... I don't have access to this device internally, so if I get time I might be able to revert these for you. It may take a bit to get around to that. Hence the question: Do any of you have the ability to build and test kernels? Or at least willingness to test a kernel I might be able to generate?
oh..somebody try this: add this to kernel load command and see if it help this at all: pcie_aspm=off Let me know what you come up with.
I've just had another hit so willing to try anything. As time is pretty scarce for me ATM I've taken the easy route with the pcie_aspm option. Obviously will not know if that fixes the problem or just minimises it. If that doesn't work I'm happy to test a kernel - but don't have the time to build it myself.
Kevin, Great..If you could try that and let me know if the problem does go away. It's a common workaround. It will tell me a bit to focus the bisect for the problem.
Here is the comment from kernel.org with the attached patch: https://bugzilla.kernel.org/show_bug.cgi?id=62491#c7 Link to the patch: https://bugzilla.kernel.org/attachment.cgi?id=114381&action=diff#a/drivers/net/ethernet/atheros/alx/main.c_sec1 and here's the patch: ============================================== From 27744b24f9291782c1342dbd6cac511e68da907c Mon Sep 17 00:00:00 2001 From: hahnjo <hahnjo> Date: Tue, 12 Nov 2013 18:19:24 +0100 Subject: [PATCH] alx: Reset phy speed after resume This fixes bug 62491 (https://bugzilla.kernel.org/show_bug.cgi?id=62491). After resuming some users got the following error flooding the kernel log: alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff --- drivers/net/ethernet/atheros/alx/main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/atheros/alx/main.c b/drivers/net/ethernet/atheros/alx/main.c index fc95b23..6305a5d 100644 --- a/drivers/net/ethernet/atheros/alx/main.c +++ b/drivers/net/ethernet/atheros/alx/main.c @@ -1389,6 +1389,9 @@ static int alx_resume(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); struct alx_priv *alx = pci_get_drvdata(pdev); + struct alx_hw *hw = &alx->hw; + + alx_reset_phy(hw); if (!netif_running(alx->dev)) return 0; -- 1.8.4.2 ======================================
Good find. Will check this out soon.
I do have time to install a patched kernel and check it out. But unfortunately no time for building.
I hope that patch works because the kernel parameter has not - 2 * hits this morning. :( Also willing to try the custom kernel.
I made a local mock build with the patch from comment #19 and tested it on my Asus X202E. It fixes the problem for me. I also have some koji scratch builds submitted with the patch applied. Try these out when they finish: F19: http://koji.fedoraproject.org/koji/taskinfo?taskID=6187061 F18: http://koji.fedoraproject.org/koji/taskinfo?taskID=6186733
(In reply to Charles R. Anderson from comment #23) > F19: > http://koji.fedoraproject.org/koji/taskinfo?taskID=6187061 I would be happy to test it. How to install, any RTFM for these builts? Or simply download all relevant RPMs from there?
For Fedora kernel folks, this has now hit the 'net' tree, so it will trickle down to Linus: commit b54629e226d196e802abdd30c5e34f2a47cddcf2 Author: hahnjo <hahnjo> Date: Tue Nov 12 18:19:24 2013 +0100 alx: Reset phy speed after resume This fixes bug 62491 (https://bugzilla.kernel.org/show_bug.cgi?id=62491). After resuming some users got the following error flooding the kernel log: alx 0000:02:00.0: invalid PHY speed/duplex: 0xffff Signed-off-by: Jonas Hahnfeld <linux> Signed-off-by: David S. Miller <davem> I don't currently see it in Davem's stable patchwork, so it might worth adding to Fedora's tree for the time being (http://patchwork.ozlabs.org/bundle/davem/stable/?state=*)
(In reply to markusN from comment #24) > (In reply to Charles R. Anderson from comment #23) > > F19: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=6187061 > > I would be happy to test it. How to install, any RTFM for these builts? > Or simply download all relevant RPMs from there? On your system, find out which ones you need first by doing this: rpm -qa kernel\* | sort Then download the ones you need (typically only kernel-3.* and kernel-modules-extra-3.* for your arch either i686 or x86_64) and do: yum update kernel* and reboot to the new kernel.
(In reply to markusN from comment #24) > (In reply to Charles R. Anderson from comment #23) > > F19: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=6187061 > > I would be happy to test it. How to install, any RTFM for these builts? > Or simply download all relevant RPMs from there? As per Comment #26 I have updated to 3.11.8-200.bz1011362.fc19.x86_64 and resumed successfully from suspend already twice. No more message flooding and the wireless device works! Looks good, thanks for the test kernel which I'll continue to test.
Nice start to the week..thanks Charles. I'll close the loop and see to it this flows into Fedora asap, if not in process already. Please update your testing status here as you go.
Thanks for testing everyone. I've applied the patch Michele pointed to with comment #25.
I see that kernel 3.11.8-200 has been released. Does it contain the bugfix which continues to work fine on my ASUS X202E? (I ask since I don't see it mentioned in http://koji.fedoraproject.org/koji/buildinfo?buildID=478117 ) Thanks again for the fix.
(In reply to markusN from comment #31) > I see that kernel 3.11.8-200 has been released. Does it contain the bugfix > which continues to work fine on my ASUS X202E? > > (I ask since I don't see it mentioned in > http://koji.fedoraproject.org/koji/buildinfo?buildID=478117 ) > > Thanks again for the fix. The last date on the changelog for link you give is Nov 13. The path was applied on Nov 18. See comment 29
kernel-3.11.9-300.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.11.9-300.fc20
kernel-3.11.9-200.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.11.9-200.fc19
kernel-3.11.9-100.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.11.9-100.fc18
(In reply to Fedora Update System from comment #34) > kernel-3.11.9-200.fc19 has been submitted as an update for Fedora 19. > https://admin.fedoraproject.org/updates/kernel-3.11.9-200.fc19 Thanks, suspend/resume works with this kernel. I left stable karma.
Package kernel-3.11.9-100.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.11.9-100.fc18' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-21822/kernel-3.11.9-100.fc18 then log in and leave karma (feedback).
kernel-3.11.9-200.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
Installed the new kernel last night and this is the first morning in over a week that I've not had to do a reboot. THANK YOU!!!!!!!
kernel-3.11.9-300.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.11.9-100.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.