Bug 751990

Summary: [atl1c] network is unreliable (regression from F17)
Product: [Fedora] Fedora Reporter: Nathanael Noblet <nathanael>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: gansalmon, itamar, jfeeney, jonathan, kernel-maint, madhu.chinakonda, michele, michele, nathanael, sgruszka
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-08 17:47:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
lspic -t and lspci -vvnn
none
lspci -vvvv (non-working)
none
lspci -vvvv (working)
none
lspci -vvvv (working)
none
lspci -vvvv (not-working) none

Description Nathanael Noblet 2011-11-08 09:15:41 UTC
Description of problem:
When trying to install F15 as well as F16 onto a netbook with 

Atheros Communications AR8152 v1.1 Fast Ethernet (rev c1)

using netinstall method via a local http mirror. The installation gets stuck because the network ceases to work. File's don't download so I get errors. If I manually run ifdown em1 and then ifup em1. it can continue for a short while, but eventually stops working.

This worked fine on whatever kernel F14 was running. I can't see any odd messages in dmesg, the only one is 

vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update.

Otherwise no other errors to indicate an issue.


How reproducible:
Always


Additional info:

ifup em1 tends to spit out random errors however things start working again.

Also, when the net stops working ifconfig still shows em1 with an IP Address and all the other stats you would expect, ping and other network communications ceases to function.

Comment 1 Nathanael Noblet 2011-11-08 20:41:32 UTC
So I've been able to narrow down the behaviour to something between

2.6.38.4 and 2.6.38.5 but can't see any changes in git log that point to anything untowards

Comment 2 Nathanael Noblet 2011-11-09 21:44:20 UTC
Scratch that. 2.6.38.4 took a long time to fail.

I've played with 2.6.38.1 and 2.6.38.2

between those two, .1 was able to git clone the kernel repo 5 times (with one remote branch). .2 never got one clone done. There have been no changes to the driver itself between those two trees. Where do I look now?

Comment 3 Nathanael Noblet 2011-11-09 22:22:06 UTC
jwb on irc suggested adding pcie_aspm=off to the kernel command line. Doing so allows the network to continuously function. I can complete an entire kickstart net install without a single ifdown / ifup cycle.

Comment 4 Nathanael Noblet 2011-11-10 22:08:11 UTC
Created attachment 532937 [details]
lspic -t and lspci -vvnn

Comment 5 Matthew Garrett 2011-11-10 23:05:42 UTC
Thanks - any chance we could also have lspci -vvv (note the extra v) with and without pci_aspm=off?

Comment 6 Nathanael Noblet 2011-11-11 02:49:49 UTC
Created attachment 532957 [details]
lspci -vvvv (non-working)

Comment 7 Nathanael Noblet 2011-11-11 02:51:32 UTC
Created attachment 532958 [details]
lspci -vvvv (working)

Comment 8 Matthew Garrett 2011-11-11 03:24:04 UTC
Ah, sorry, those need to be done as root.

Comment 9 Nathanael Noblet 2011-11-11 04:11:55 UTC
Created attachment 532966 [details]
lspci -vvvv (working)

Comment 10 Nathanael Noblet 2011-11-11 04:14:08 UTC
Created attachment 532967 [details]
lspci -vvvv (not-working)

Comment 11 Stanislaw Gruszka 2012-02-26 12:37:04 UTC
atl1c do some ASPM magic on atl1c_set_aspm which apparently it should not do...

Comment 12 Josh Boyer 2012-02-29 17:55:49 UTC
Matthew had some patches to atl1c to fix the ASPM issues, but they were dropped upstream because further rework in the PCI area was needed to really fix the problems.

Comment 13 Dave Jones 2012-03-22 16:47:27 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 14 Dave Jones 2012-03-22 16:52:10 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 15 Dave Jones 2012-03-22 17:02:39 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 16 Nathanael Noblet 2012-04-25 20:29:19 UTC
So I tried installing F17 beta on the machine with this issue. 
Beta seems to have kernel Kernel 3.3.0-1.fc17.i686, not sure if the release number matters however it isn't fixed.

Comment 17 Stanislaw Gruszka 2012-05-03 08:21:03 UTC
Atheros developer posted atl1c patches to net-next that changes various register programming code on alt1c, including ASPM. Below kernel build include atl1c driver update from net-next. Please check if that solve the problem:

http://koji.fedoraproject.org/koji/taskinfo?taskID=4045082

Comment 18 Nathanael Noblet 2012-05-17 17:44:01 UTC
Sorry - I didn't get to it in time, can you post a srpm I can use to build?

Comment 19 Stanislaw Gruszka 2012-05-18 10:59:00 UTC
We apply this atl1c update because it fix other bug. Patch is already in kernel-3.3.6-3.fc17, does it also fix this problem?

Comment 20 Nathanael Noblet 2012-07-13 22:50:40 UTC
Seems to have fixed this. I built a F17 boot.iso with updated kernel to use as installer (using 3.4.4-5). The install proceeds normally.

Comment 21 Nathanael Noblet 2013-07-17 17:08:23 UTC
So this bug is back... F19 3.9.5-301-fc19 I have the same behaviour on kickstart installs over the network. Suddenly I can't ping/resolve any network host. Unplugging/replugging the cable and it starts working as NM reloads the config or something.

Comment 22 Nathanael Noblet 2013-07-17 17:42:10 UTC
also - in this case pcie_aspm=off pci_aspm=off doesn't fix the problem.

Comment 23 Michele Baldessari 2013-08-17 23:06:05 UTC
Nathanel,

can you check the latest comment on https://bugzilla.redhat.com/show_bug.cgi?id=995308 ?

Comment 24 Josh Boyer 2013-09-18 20:32:55 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 25 Michele Baldessari 2013-09-20 21:29:52 UTC
Nathanel,

the bug https://bugzilla.redhat.com/show_bug.cgi?id=995308 has been long fixed
and is very similar to this issue. If you could double check the newer kernel
and report back that'd be great.

Thanks,
Michele

Comment 26 Josh Boyer 2013-10-08 17:47:09 UTC
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.