Bug 428950

Summary:	Memory leak (slub) with kernel 2.6.23
Product:	[Fedora] Fedora	Reporter:	fdupoux <fdbugs>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED DUPLICATE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	low
Version:	7
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-01-18 23:11:53 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description fdupoux 2008-01-16 11:58:39 UTC

Description of problem:
There is a memory leak in the 2.6.23 kernel that comes with a "yum upgrade" on
fedora7-amd64. Same problem with 2.6.23.1-10.fc7 and kernel-2.6.23.12-52.fc7.
The slabinfo program reports that the slub ":0000016" is groing quicky, and the
4GB server was out of memory after few week of very low activity, and it was not
able to process the network packets (it's a firewall).

Version-Release number of selected component (if applicable):
Linux hostname 2.6.23.1-10.fc7 #1 SMP Fri Oct 19 14:35:28 EDT 2007 x86_64 x86_64
x86_64 GNU/Linux

How reproducible:
Don't know what kernel module involved in allocating this slub. Our hardware is
an HP-Proliant-DL140-G3 server with 2 CPUs (2 Quad Core).

Steps to Reproduce:
1. install fedora7-amd64 on HP-Proliant-DL140-G3
2. upgrade to 2.6.23.1-10.fc7 or kernel-2.6.23.12-52.fc7
3. use only some network applications (shorewall/iptables, ssh)
  
Actual results:
I made a script to monitor the memory leak of the "16bytes" slub object. Here is
the result:
http://pastebin.com/f413aa933

Additional info:
We had exactly the same problem on two servers. The two servers have the same
hardware (HP-Proliant-DL140-G3) since they are redundant firewalls. The problem
was reported by a script when the memory was already very low (after an uptime
of few weeks). Both servers have a very low activity. 

Here are other details:
Whe noticed the problem when the servers were very slow to respond to the ssh
connections. We noticed the CPU was very busy just because it was waiting IO
(all the processes were around 0% of the CPUs). It is just a consequence of the
kernel not able to allocate memory. (http://pastebin.com/f20a67f4d)

Here is the slabinfo output just before we reboot the server:
http://pastebin.com/f1194768c
Before rebooting, some kernel threads crashed:
http://pastebin.com/f28724444
Rebooting the server allowed us to connect to the server again, but the memory
leak is still running.

Comment 1 fdupoux 2008-01-16 12:02:57 UTC

Here are the kernel modules currently loaded on one of the servers that is still
tunning with the memory leak: http://pastebin.com/f64858e3d

Comment 2 Chuck Ebbert 2008-01-16 22:56:07 UTC

Can you boot with the kernel option slub_debug=U

Also see bug 352281

Comment 3 fdupoux 2008-01-17 06:53:24 UTC

Thanks for your quick reply. It seems to be the same bug as 352281 since 16bytes
objects are involved.

Unfortunately I cannot reboot the server since it would break all the current
connections. All I can do is collecting information on the system with the
memory leak that is currently running. Thanks.

Comment 4 Chuck Ebbert 2008-01-18 23:11:53 UTC

Closing as dup of 352281.

*** This bug has been marked as a duplicate of 352281 ***