Bug 751990 - [atl1c] network is unreliable (regression from F17)
Summary: [atl1c] network is unreliable (regression from F17)
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-08 09:15 UTC by Nathanael Noblet
Modified: 2013-10-08 17:47 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-10-08 17:47:09 UTC
Type: ---


Attachments (Terms of Use)
lspic -t and lspci -vvnn (10.33 KB, text/plain)
2011-11-10 22:08 UTC, Nathanael Noblet
no flags Details
lspci -vvvv (non-working) (9.74 KB, text/plain)
2011-11-11 02:49 UTC, Nathanael Noblet
no flags Details
lspci -vvvv (working) (9.74 KB, text/plain)
2011-11-11 02:51 UTC, Nathanael Noblet
no flags Details
lspci -vvvv (working) (20.79 KB, text/plain)
2011-11-11 04:11 UTC, Nathanael Noblet
no flags Details
lspci -vvvv (not-working) (20.81 KB, text/plain)
2011-11-11 04:14 UTC, Nathanael Noblet
no flags Details

Description Nathanael Noblet 2011-11-08 09:15:41 UTC
Description of problem:
When trying to install F15 as well as F16 onto a netbook with 

Atheros Communications AR8152 v1.1 Fast Ethernet (rev c1)

using netinstall method via a local http mirror. The installation gets stuck because the network ceases to work. File's don't download so I get errors. If I manually run ifdown em1 and then ifup em1. it can continue for a short while, but eventually stops working.

This worked fine on whatever kernel F14 was running. I can't see any odd messages in dmesg, the only one is 

vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update.

Otherwise no other errors to indicate an issue.


How reproducible:
Always


Additional info:

ifup em1 tends to spit out random errors however things start working again.

Also, when the net stops working ifconfig still shows em1 with an IP Address and all the other stats you would expect, ping and other network communications ceases to function.

Comment 1 Nathanael Noblet 2011-11-08 20:41:32 UTC
So I've been able to narrow down the behaviour to something between

2.6.38.4 and 2.6.38.5 but can't see any changes in git log that point to anything untowards

Comment 2 Nathanael Noblet 2011-11-09 21:44:20 UTC
Scratch that. 2.6.38.4 took a long time to fail.

I've played with 2.6.38.1 and 2.6.38.2

between those two, .1 was able to git clone the kernel repo 5 times (with one remote branch). .2 never got one clone done. There have been no changes to the driver itself between those two trees. Where do I look now?

Comment 3 Nathanael Noblet 2011-11-09 22:22:06 UTC
jwb on irc suggested adding pcie_aspm=off to the kernel command line. Doing so allows the network to continuously function. I can complete an entire kickstart net install without a single ifdown / ifup cycle.

Comment 4 Nathanael Noblet 2011-11-10 22:08:11 UTC
Created attachment 532937 [details]
lspic -t and lspci -vvnn

Comment 5 Matthew Garrett 2011-11-10 23:05:42 UTC
Thanks - any chance we could also have lspci -vvv (note the extra v) with and without pci_aspm=off?

Comment 6 Nathanael Noblet 2011-11-11 02:49:49 UTC
Created attachment 532957 [details]
lspci -vvvv (non-working)

Comment 7 Nathanael Noblet 2011-11-11 02:51:32 UTC
Created attachment 532958 [details]
lspci -vvvv (working)

Comment 8 Matthew Garrett 2011-11-11 03:24:04 UTC
Ah, sorry, those need to be done as root.

Comment 9 Nathanael Noblet 2011-11-11 04:11:55 UTC
Created attachment 532966 [details]
lspci -vvvv (working)

Comment 10 Nathanael Noblet 2011-11-11 04:14:08 UTC
Created attachment 532967 [details]
lspci -vvvv (not-working)

Comment 11 Stanislaw Gruszka 2012-02-26 12:37:04 UTC
atl1c do some ASPM magic on atl1c_set_aspm which apparently it should not do...

Comment 12 Josh Boyer 2012-02-29 17:55:49 UTC
Matthew had some patches to atl1c to fix the ASPM issues, but they were dropped upstream because further rework in the PCI area was needed to really fix the problems.

Comment 13 Dave Jones 2012-03-22 16:47:27 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 14 Dave Jones 2012-03-22 16:52:10 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 15 Dave Jones 2012-03-22 17:02:39 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 16 Nathanael Noblet 2012-04-25 20:29:19 UTC
So I tried installing F17 beta on the machine with this issue. 
Beta seems to have kernel Kernel 3.3.0-1.fc17.i686, not sure if the release number matters however it isn't fixed.

Comment 17 Stanislaw Gruszka 2012-05-03 08:21:03 UTC
Atheros developer posted atl1c patches to net-next that changes various register programming code on alt1c, including ASPM. Below kernel build include atl1c driver update from net-next. Please check if that solve the problem:

http://koji.fedoraproject.org/koji/taskinfo?taskID=4045082

Comment 18 Nathanael Noblet 2012-05-17 17:44:01 UTC
Sorry - I didn't get to it in time, can you post a srpm I can use to build?

Comment 19 Stanislaw Gruszka 2012-05-18 10:59:00 UTC
We apply this atl1c update because it fix other bug. Patch is already in kernel-3.3.6-3.fc17, does it also fix this problem?

Comment 20 Nathanael Noblet 2012-07-13 22:50:40 UTC
Seems to have fixed this. I built a F17 boot.iso with updated kernel to use as installer (using 3.4.4-5). The install proceeds normally.

Comment 21 Nathanael Noblet 2013-07-17 17:08:23 UTC
So this bug is back... F19 3.9.5-301-fc19 I have the same behaviour on kickstart installs over the network. Suddenly I can't ping/resolve any network host. Unplugging/replugging the cable and it starts working as NM reloads the config or something.

Comment 22 Nathanael Noblet 2013-07-17 17:42:10 UTC
also - in this case pcie_aspm=off pci_aspm=off doesn't fix the problem.

Comment 23 Michele Baldessari 2013-08-17 23:06:05 UTC
Nathanel,

can you check the latest comment on https://bugzilla.redhat.com/show_bug.cgi?id=995308 ?

Comment 24 Josh Boyer 2013-09-18 20:32:55 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 25 Michele Baldessari 2013-09-20 21:29:52 UTC
Nathanel,

the bug https://bugzilla.redhat.com/show_bug.cgi?id=995308 has been long fixed
and is very similar to this issue. If you could double check the newer kernel
and report back that'd be great.

Thanks,
Michele

Comment 26 Josh Boyer 2013-10-08 17:47:09 UTC
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.