Bug 889749
| Summary: | IOMMU / AMD Vi Event: IO PAGE FAULT causes gbit NIC lockups | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | bob <redzilla.coralnut> | ||||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
| Status: | CLOSED CANTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 18 | CC: | flux-redhat, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2013-04-08 13:16:51 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
bob
2012-12-23 09:17:31 UTC
Are you still seeing this with 3.8.x? I'm not a Fedora user, but I noticed that my problem seems very similar to this one, so if it is the same problem, it hasn't been solved for all cases. I'm running kernel 3.8.5. (From Debian experimental.) I cannot disable IOMMU totally as for some reason that breaks my USB. Disabling HPET has no effect. I'm yet to try reducing link speed to 100Mbit, but that would not be a solution :). The problem occurs only when transferring data out, never when transferring data in. I too am running a board (Asus Sabertooth 990FX 2.0) with RealTek R8168 and AMD FX-8350. The chipset is 990FX/SB950. The system has 16 GB memory installed. Fragment from my kernel logs: Apr 5 21:26:58 aiee kernel: [ 288.814737] AMD-Vi: Event logged [IO_PAGE_FAULT device=0a:00.0 domain=0x001e address=0x0000000000003000 flags=0x0050] Apr 5 21:27:17 aiee kernel: [ 307.928142] ------------[ cut here ]------------ Apr 5 21:27:17 aiee kernel: [ 307.928155] WARNING: at /build/buildd-linux_3.8.5-1~experimental.1-amd64-_t_ZfP/linux-3.8.5/net/sched/sch_generic.c:254 dev_watchdog+0xe3/0x153() Apr 5 21:27:17 aiee kernel: [ 307.928159] Hardware name: To be filled by O.E.M. Apr 5 21:27:17 aiee kernel: [ 307.928163] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out aiee# uname -a Linux aiee 3.8-trunk-amd64 #1 SMP Debian 3.8.5-1~experimental.1 x86_64 GNU/Linux aiee# lspci | grep 0a:00.0 0a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 09) aiee# dmesg (in attachment) Created attachment 731992 [details]
Kernel log from startup till restart with the problem visible
> Are you still seeing this with 3.8.x?
I wouldn't know. I filed this bug three months ago in December 2012 and never got a response. When it was evident that nobody cared enough to respond to the bug report I gave up on getting support for the on-board chipset and bought an Intel Gigabit PCIE card. Now things work. Now I have no reason to worry about the kernel/Realtek driver problem.
Sorry I can't help, but I couldn't wait three months for someone to even acknowledge that the problem exists.
I should add that kernel option iommu=pt resolved my problem. I'm not sure if that is going to have impact that might have on making use of KVM, though. This bug has nothing to do with IOMMU. Real cause is Realtek driver. Here is the real bug and patch to fix it (kudos to Francois Romieu): https://bugzilla.kernel.org/show_bug.cgi?id=14962 That change was integrated to v3.5-rc2-237-geb2dc35 and the problem still persisted, so it doesn't seem to be the root cause. ..but maybe a similar fix by simply enumerating the version in the switch-case statement would apply here. I don't promise to try it, though :), the iommu=pt kernel switch has indeed been a 100% workaround for the issue for me. After some digging I find that the my card (one of 8168F family) would be either RTL_GIGA_MAC_VER_35 or RTL_GIGA_MAC_VER_36 and the patch only is for RTL_GIGA_MAC_VER_34, so it may very well be the solution. Should've realized it earlier, I had seen the patch :(. Thanks for the pointer! If it turns out to be the case then it should really be a module option (as well) so people can easily try it out. ..but maybe a similar fix by simply enumerating the version in the switch-case statement would apply here. I don't promise to try it, though :), the iommu=pt kernel switch has indeed been a 100% workaround for the issue for me. After some digging I find that the my card (one of 8168F family) would be either RTL_GIGA_MAC_VER_35 or RTL_GIGA_MAC_VER_36 and the patch only is for RTL_GIGA_MAC_VER_34, so it may very well be the solution. Should've realized it earlier, I had seen the patch :(. Thanks for the pointer! If it turns out to be the case then it should really be a module option (as well) so people can easily try it out. |