Bug 243657

Summary: [PATCH] Fix memory leak of dma_alloc_coherent() on x86_64
Product: Red Hat Enterprise Linux 4 Reporter: Masaki MAENO <maeno.masaki>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.5CC: alan.tyson, alice.pancamo, juanino, tao
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0791 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-15 16:28:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 282351    
Attachments:
Description Flags
dma_alloc_coherent memleak fix patch
none
memleak example
none
debug code (MMDEBUG) to get evidence
none
evidence of memleak by debug code (MMDEBUG line)
none
RHEL4.6 Fix for this issue none

Description Masaki MAENO 2007-06-11 10:15:51 UTC
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=

[PATCH] Fix memory leak of dma_alloc_coherent() on x86_64

Description of problem:

The memory leak is generated by function dma_alloc_coherent()
on x86_64.

Especially, it is an extensive problem for the machine that 
installed hp Proliant Support Pack (=PSP). Because the memory 
leak path frequently passes by cciss_ioctl() of cmaidad and 
cmaeventd that work when PSP is installed.
The memory leak has been generated by the pace of "100KB/h -- 
15MB/h" in a certain hp PSP environment!!! The kernel entered
the state of a no response and rebooted.


This problem has already been fixed by Vanilla Kernel 2.6.10.
  - ChangeLog         : http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.10
  - VanillaKernelPatch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=792b87d770df447f3e4190d2b4732a3a36800adb
  > <ak>
  > 	[PATCH] x86_64: Fallback to swiotlb for dma_alloc_coherent
  > 	
  > 	From: Suresh B Siddha
  > 	
  > 	Coresponding change to IA64 code is in, so this can be merged too.
  > 	
  > 	- fallback to swiotlb for consistent DMA mappings
  > 	- fix a memory leak in dma_alloc_coherent


I hope that Red Hat takes my appended patch file to RHEL4.5 
kernel errata release.
  >diff -urN kernel-2.6.9-55.EL.org/arch/x86_64/kernel/pci-gart.c
kernel-2.6.9-55.EL/arch/x86_64/kernel/pci-gart.c
  >--- kernel-2.6.9-55.EL.org/arch/x86_64/kernel/pci-gart.c      2007-06-11
17:58:08.000000000 +0900
  >+++ kernel-2.6.9-55.EL/arch/x86_64/kernel/pci-gart.c      2007-06-11
17:58:38.000000000 +0900
  >@@ -238,6 +238,7 @@
  >                        if (high) {
  >                                if (!(gfp & GFP_DMA)) {
  >                                        gfp |= GFP_DMA;
  >+                                       free_pages((unsigned long)memory,
get_order(size));
  >                                        goto again;
  >                                }
  >                                goto free;


Steps to Reproduce & Actual results:

1. You install hp Proliant Support Pack.
2. hpasm service runs. (cmaidad and cmaeventd works.)
3. The hidden memory (*1) keeps increasing. 
   (example: memleak.png (Vertical: memleak amount [KB], Horizontal: time [min]))
   (*1): The hidden memory is a value of "MemTotal - MemFree - MemUsage(*2)" 
         in /proc/meminfo.
   (*2): The "MemUsage" is a value of "Active + Inactive + Slab + PageTables
         + VmallocUsed" in /proc/meminfo.

Expected results:

The memory leak is not generated by function dma_alloc_coherent().

Comment 1 Masaki MAENO 2007-06-11 10:15:51 UTC
Created attachment 156692 [details]
dma_alloc_coherent memleak fix patch

Comment 2 Masaki MAENO 2007-06-11 10:18:29 UTC
Created attachment 156693 [details]
memleak example

Comment 3 Masaki MAENO 2007-06-11 10:35:08 UTC
Created attachment 156694 [details]
debug code (MMDEBUG) to get evidence

Comment 4 Masaki MAENO 2007-06-11 10:37:18 UTC
Created attachment 156696 [details]
evidence of memleak by debug code (MMDEBUG line)

Comment 5 Masaki MAENO 2007-06-12 02:32:06 UTC
I enumerate the memleak condition for attention. 

* Condition:
- Arch: x86_64
- Memory: larger than 4GB (if cciss)
- Function: arch/x86_64/kernel/pci-gart.c:dma_alloc_coherent()
- Detail:
    The memory leak is generated 4KB a degree when the bus-address of 
    acquired memory is 4GB or more. (if cciss)


Comment 6 Prarit Bhargava 2007-06-13 14:19:17 UTC
I took Masaki's testcode and ran it on an AMD box in Westford.  Sure enough,
there is a memory leak.  I patched the kernel with the patch above and the leak
was solved.

I'm redo-ing the patch and will submit to rhkernel-list.

P.

Comment 7 Prarit Bhargava 2007-06-13 14:26:23 UTC
Created attachment 156880 [details]
RHEL4.6 Fix for this issue

Comment 8 RHEL Program Management 2007-06-13 14:32:09 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 9 Masaki MAENO 2007-06-14 08:16:14 UTC
OK. Thank you.
I hope that new maintenance kernel of RHEL4 is released early. 
 

Comment 10 Jason Baron 2007-06-19 14:09:38 UTC
committed in stream U6 build 55.9. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 12 Masaki MAENO 2007-06-20 02:54:01 UTC
I got your tree of kernel-2.6.9-55.9 and confirmed this patch is available.
And, I confirmed that it booted and worked well.
Thank you.


Comment 29 Zhang Kexin 2007-10-31 13:41:14 UTC
got no right hardware do the test, (it needs a system that has a device that
calls dma_alloc_coherent,SB600 system should be OK), so just do code review.

Comment 31 errata-xmlrpc 2007-11-15 16:28:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html