| Summary: | [atl1c] network is unreliable (regression from F17) | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Nathanael Noblet <nathanael> | ||||||||||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | 19 | CC: | gansalmon, itamar, jfeeney, jonathan, kernel-maint, madhu.chinakonda, michele, michele, nathanael, sgruszka | ||||||||||||
| Target Milestone: | --- | Keywords: | Reopened | ||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2013-10-08 17:47:09 UTC | Type: | --- | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Nathanael Noblet
2011-11-08 09:15:41 UTC
So I've been able to narrow down the behaviour to something between 2.6.38.4 and 2.6.38.5 but can't see any changes in git log that point to anything untowards Scratch that. 2.6.38.4 took a long time to fail. I've played with 2.6.38.1 and 2.6.38.2 between those two, .1 was able to git clone the kernel repo 5 times (with one remote branch). .2 never got one clone done. There have been no changes to the driver itself between those two trees. Where do I look now? jwb on irc suggested adding pcie_aspm=off to the kernel command line. Doing so allows the network to continuously function. I can complete an entire kickstart net install without a single ifdown / ifup cycle. Created attachment 532937 [details]
lspic -t and lspci -vvnn
Thanks - any chance we could also have lspci -vvv (note the extra v) with and without pci_aspm=off? Created attachment 532957 [details]
lspci -vvvv (non-working)
Created attachment 532958 [details]
lspci -vvvv (working)
Ah, sorry, those need to be done as root. Created attachment 532966 [details]
lspci -vvvv (working)
Created attachment 532967 [details]
lspci -vvvv (not-working)
atl1c do some ASPM magic on atl1c_set_aspm which apparently it should not do... Matthew had some patches to atl1c to fix the ASPM issues, but they were dropped upstream because further rework in the PCI area was needed to really fix the problems. [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. [mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update. So I tried installing F17 beta on the machine with this issue. Beta seems to have kernel Kernel 3.3.0-1.fc17.i686, not sure if the release number matters however it isn't fixed. Atheros developer posted atl1c patches to net-next that changes various register programming code on alt1c, including ASPM. Below kernel build include atl1c driver update from net-next. Please check if that solve the problem: http://koji.fedoraproject.org/koji/taskinfo?taskID=4045082 Sorry - I didn't get to it in time, can you post a srpm I can use to build? We apply this atl1c update because it fix other bug. Patch is already in kernel-3.3.6-3.fc17, does it also fix this problem? Seems to have fixed this. I built a F17 boot.iso with updated kernel to use as installer (using 3.4.4-5). The install proceeds normally. So this bug is back... F19 3.9.5-301-fc19 I have the same behaviour on kickstart installs over the network. Suddenly I can't ping/resolve any network host. Unplugging/replugging the cable and it starts working as NM reloads the config or something. also - in this case pcie_aspm=off pci_aspm=off doesn't fix the problem. Nathanel, can you check the latest comment on https://bugzilla.redhat.com/show_bug.cgi?id=995308 ? *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those. Nathanel, the bug https://bugzilla.redhat.com/show_bug.cgi?id=995308 has been long fixed and is very similar to this issue. If you could double check the newer kernel and report back that'd be great. Thanks, Michele This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously. |