Bug 716623

Summary: Frequent page allocation failures - forcedeth-related?
Product: [Fedora] Fedora Reporter: Adam Huffman <bloch>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, sergio
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-06 15:27:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
kernel error log none

Description Adam Huffman 2011-06-25 16:57:44 UTC
Created attachment 509910 [details]
kernel error log

Description of problem:

I have a couple of Asrock ION boxes that use NFS quite heavily.  In F13 and F14 I saw lots of kernel errors that seemed to be correlated with significant network traffic.  I've just updated one of them to F15 and the error rate seems to have increased.

An example is attached.



Version-Release number of selected component (if applicable):
2.6.38.8-32.fc15.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Adam Huffman 2011-06-25 16:58:39 UTC
lspci:


00:00.0 Host bridge: nVidia Corporation MCP79 Host Bridge (rev b1)
00:00.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.0 ISA bridge: nVidia Corporation MCP79 LPC Bridge (rev b2)
00:03.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.2 SMBus: nVidia Corporation MCP79 SMBus (rev b1)
00:03.3 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.5 Co-processor: nVidia Corporation MCP79 Co-processor (rev b1)
00:04.0 USB Controller: nVidia Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:04.1 USB Controller: nVidia Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:08.0 Audio device: nVidia Corporation MCP79 High Definition Audio (rev b1)
00:09.0 PCI bridge: nVidia Corporation MCP79 PCI Bridge (rev b1)
00:0a.0 Ethernet controller: nVidia Corporation MCP79 Ethernet (rev b1)
00:0b.0 SATA controller: nVidia Corporation MCP79 AHCI Controller (rev b1)
00:10.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
00:15.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
01:00.0 VGA compatible controller: nVidia Corporation ION VGA (rev b1)
02:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)

Comment 2 Chuck Ebbert 2011-06-27 05:11:18 UTC
It's trying to find 16k of contiguous space for packets. Are you using large packets?

Comment 3 Adam Huffman 2011-06-27 12:24:18 UTC
Yes, MTU is set to 9000 on that interface.  Is that not advised with forcedeth?  I tried something similar on a different machine on the same network, with a different chipset, and the driver in that case gives a warning when the MTU is set above 1500.  I didn't see a similar warning with the machine referred to here.

Is this the sort of thing that would be fixed by adding more RAM, or is it the result of memory fragmentation, which would happen regardless of the total size?

Comment 4 Adam Huffman 2011-06-27 22:19:01 UTC
I've switched back to MTU=1500 and there's no sign of those failures so far.

Comment 5 Chuck Ebbert 2011-06-29 14:16:50 UTC
(In reply to comment #3)
> Yes, MTU is set to 9000 on that interface.  Is that not advised with forcedeth?
>  I tried something similar on a different machine on the same network, with a
> different chipset, and the driver in that case gives a warning when the MTU is
> set above 1500.  I didn't see a similar warning with the machine referred to
> here.
> 
> Is this the sort of thing that would be fixed by adding more RAM, or is it the
> result of memory fragmentation, which would happen regardless of the total
> size?

It's due to fragmentation; you might be able to set MTU to something around 3500 bytes and still only use 1-page allocations. (I'm not sure what the overhead is.)

Comment 6 Sergio Basto 2011-07-14 16:19:05 UTC
have you try the new kernel update ? kernel  2.6.38.8-35

Comment 7 Sergio Basto 2011-07-14 16:22:45 UTC
My issue is fixed on kernel 2.6.38.8-35 , I think is was : 

* Wed Jul 06 2011 Chuck Ebbert <cebbert> 2.6.38.8-35 - Revert
SCSI/block patches from 2.6.38.6 that caused more problems