Bug 716623

Summary:

Frequent page allocation failures - forcedeth-related?

Product:

[Fedora] Fedora

Reporter:

Adam Huffman <bloch>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED WORKSFORME

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, sergio

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-06-06 15:27:28 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
kernel error log	none

Description Adam Huffman 2011-06-25 16:57:44 UTC

Created attachment 509910 [details]
kernel error log

Description of problem:

I have a couple of Asrock ION boxes that use NFS quite heavily.  In F13 and F14 I saw lots of kernel errors that seemed to be correlated with significant network traffic.  I've just updated one of them to F15 and the error rate seems to have increased.

An example is attached.



Version-Release number of selected component (if applicable):
2.6.38.8-32.fc15.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Adam Huffman 2011-06-25 16:58:39 UTC

lspci:


00:00.0 Host bridge: nVidia Corporation MCP79 Host Bridge (rev b1)
00:00.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.0 ISA bridge: nVidia Corporation MCP79 LPC Bridge (rev b2)
00:03.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.2 SMBus: nVidia Corporation MCP79 SMBus (rev b1)
00:03.3 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.5 Co-processor: nVidia Corporation MCP79 Co-processor (rev b1)
00:04.0 USB Controller: nVidia Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:04.1 USB Controller: nVidia Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:08.0 Audio device: nVidia Corporation MCP79 High Definition Audio (rev b1)
00:09.0 PCI bridge: nVidia Corporation MCP79 PCI Bridge (rev b1)
00:0a.0 Ethernet controller: nVidia Corporation MCP79 Ethernet (rev b1)
00:0b.0 SATA controller: nVidia Corporation MCP79 AHCI Controller (rev b1)
00:10.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
00:15.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
01:00.0 VGA compatible controller: nVidia Corporation ION VGA (rev b1)
02:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)

Comment 2 Chuck Ebbert 2011-06-27 05:11:18 UTC

It's trying to find 16k of contiguous space for packets. Are you using large packets?

Comment 3 Adam Huffman 2011-06-27 12:24:18 UTC

Yes, MTU is set to 9000 on that interface.  Is that not advised with forcedeth?  I tried something similar on a different machine on the same network, with a different chipset, and the driver in that case gives a warning when the MTU is set above 1500.  I didn't see a similar warning with the machine referred to here.

Is this the sort of thing that would be fixed by adding more RAM, or is it the result of memory fragmentation, which would happen regardless of the total size?

Comment 4 Adam Huffman 2011-06-27 22:19:01 UTC

I've switched back to MTU=1500 and there's no sign of those failures so far.

Comment 5 Chuck Ebbert 2011-06-29 14:16:50 UTC

(In reply to comment #3)
> Yes, MTU is set to 9000 on that interface.  Is that not advised with forcedeth?
>  I tried something similar on a different machine on the same network, with a
> different chipset, and the driver in that case gives a warning when the MTU is
> set above 1500.  I didn't see a similar warning with the machine referred to
> here.
> 
> Is this the sort of thing that would be fixed by adding more RAM, or is it the
> result of memory fragmentation, which would happen regardless of the total
> size?

It's due to fragmentation; you might be able to set MTU to something around 3500 bytes and still only use 1-page allocations. (I'm not sure what the overhead is.)

Comment 6 Sergio Basto 2011-07-14 16:19:05 UTC

have you try the new kernel update ? kernel  2.6.38.8-35

Comment 7 Sergio Basto 2011-07-14 16:22:45 UTC

My issue is fixed on kernel 2.6.38.8-35 , I think is was : 

* Wed Jul 06 2011 Chuck Ebbert <cebbert> 2.6.38.8-35 - Revert
SCSI/block patches from 2.6.38.6 that caused more problems