Bug 682324 - Unknown kernel memory leak
Summary: Unknown kernel memory leak
Keywords:
Status: CLOSED DUPLICATE of bug 683568
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 14
Hardware: i386
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Eric Paris
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-04 20:34 UTC by Jan ONDREJ
Modified: 2011-04-19 05:41 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2011-04-18 17:53:46 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/sys/kernel/slab data + free + top after 5 days of running (17.32 KB, application/x-bzip)
2011-03-14 08:19 UTC, Jan ONDREJ
no flags Details

Description Jan ONDREJ 2011-03-04 20:34:52 UTC
Description of problem:
Afeter aprox. 10 days of running of my kvm guest machine has >400 MB memory used, minimum swap, minimum buffers. These 400 MB remains after almost all processes are killed.

Version-Release number of selected component (if applicable):
Linux stats.upjs.sk 2.6.35.11-83.fc14.i686.PAE #1 SMP Mon Feb 7 06:57:55 UTC 2011 i686 i686 i386 GNU/Linux
(host is a fully updated f13 2.6.34.7-66.fc13.x86_64)

How reproducible:
seeing now on 2 servers, unable to check other servers, because I need to kill all processes to see how RAM is used.

Additional info:
[root@stats ~]# lsmod 
Module                  Size  Used by
sunrpc                165546  1 
ip6t_REJECT             3470  1 
ip6table_filter         1207  1 
ip6_tables              9929  1 ip6table_filter
ipv6                  229581  37 ip6t_REJECT
nf_conntrack_netbios_ns     1126  0 
i2c_piix4              10574  0 
i2c_core               21445  1 i2c_piix4
virtio_net             11031  0 
virtio_balloon          3557  0 
raid1                  17323  2 
virtio_blk              4059  6 
virtio_pci              4902  0 
[root@stats ~]# 

top sorted by memory usage:
top - 21:28:42 up 10 days, 13:33,  1 user,  load average: 0.06, 0.23, 0.47
Tasks:  71 total,   1 running,  70 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    508596k total,   441048k used,    67548k free,     3780k buffers
Swap:   524284k total,     6664k used,   517620k free,    21376k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
27409 root      20   0  6708 1820  720 S  0.0  0.4   0:00.74 bash               
 9528 root      20   0  2592  960  760 R  0.0  0.2   0:00.02 top                
 1668 ntp       20   0  5300  452  344 S  0.0  0.1   0:15.90 ntpd               
27406 root      20   0 12224  364  256 S  0.0  0.1   0:00.71 sshd               
 1343 root      20   0 30436   24   24 S  0.0  0.0   0:21.88 rsyslogd           
 1472 root      20   0  2764    4    4 S  0.0  0.0   0:04.79 mdadm              
 2026 root      20   0  2000    4    4 S  0.0  0.0   0:00.00 mingetty           
 2028 root      20   0  2000    4    4 S  0.0  0.0   0:00.00 mingetty           
 2030 root      20   0  2012    4    4 S  0.0  0.0   0:00.00 agetty             
 2031 root      20   0  2000    4    4 S  0.0  0.0   0:00.00 mingetty           
 2033 root      20   0  2000    4    4 S  0.0  0.0   0:00.00 mingetty           
 2035 root      20   0  2000    4    4 S  0.0  0.0   0:00.00 mingetty           
 2041 root      20   0  2000    4    4 S  0.0  0.0   0:00.00 mingetty           
    1 root      20   0  2872    0    0 S  0.0  0.0   5:50.42 init               
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kthreadd           
    3 root      20   0     0    0    0 S  0.0  0.0   1:29.00 ksoftirqd/0        
    4 root      RT   0     0    0    0 S  0.0  0.0   1:18.29 migration/0        
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 watchdog/0         

slabtop:
 Active / Total Objects (% used)    : 6960524 / 6993548 (99.5%)
 Active / Total Slabs (% used)      : 100678 / 100678 (100.0%)
 Active / Total Caches (% used)     : 58 / 77 (75.3%)
 Active / Total Size (% used)       : 403166.52K / 406684.44K (99.1%)
 Minimum / Average / Maximum Object : 0.01K / 0.06K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
3064192 3063036  99%    0.03K  23939      128     95756K kmalloc-32
1544928 1535693  99%    0.12K  48279       32    193116K kmalloc-128
1507776 1501355  99%    0.06K  23559       64     94236K kmalloc-64
503808 500719  99%    0.02K   1968      256      7872K kmalloc-16
294400 294361  99%    0.01K    575      512      2300K kmalloc-8
 22610  18994  84%    0.05K    266       85      1064K selinux_inode_security
 18690  18446  98%    0.19K    890       21      3560K kmalloc-192
  9480   8424  88%    0.13K    316       30      1264K dentry
  5808   5762  99%    0.35K    264       22      2112K inode_cache
  5100   1669  32%    0.02K     30      170       120K anon_vma_chain
  2856   1888  66%    0.09K     68       42       272K kmalloc-96
  2380    808  33%    0.02K     14      170        56K anon_vma
  1392   1392 100%    0.50K     87       16       696K ext3_inode_cache
  1387   1102  79%    0.05K     19       73        76K buffer_head
  1326   1223  92%    0.04K     13      102        52K Acpi-Operand

I am now trying latest koji kernel for f14.

Comment 1 Chuck Ebbert 2011-03-08 00:30:57 UTC
Can you try this to debug it?

1. Boot with kernel option "slub_debug=U"
2. Let it run for a while until there's obviously a leak
3. Attach the contents of /sys/kernel/slab/kmalloc-128/{alloc,free}_calls to this bug report (two separate plaintext attachments.)

See if there's any obvious imbalance between alloc and free calls in those files. You may also want to check the kmalloc-32 and kmalloc-64 slabs as well.

Comment 2 Jan ONDREJ 2011-03-14 08:19:31 UTC
Created attachment 484091 [details]
/sys/kernel/slab data + free + top after 5 days of running

After 5 days of data collection here are results. I don't know, what exactly to search for in these logs, but for my amateur look there are too many selinux_cred_prepare kmalloc calls and too few frees.

 847099 selinux_cred_prepare+0x18/0x2b age=16624/257343151/508671994 pid=0-32767 cpus=0-1
 218536 selinux_cred_free+0x22/0x24 age=16806/248721201/508671685 pid=0-32767 cpus=0-1

If you need, I have whole /sys/kernel/slab directory stored.

I can disable selinux on this machine for testing, if requested, but result will be available only after aprox. 5 next days again.

Comment 3 Jan ONDREJ 2011-03-18 18:44:19 UTC
Looks like this is not an selinux issue. After 2 days of uptime still 150M used, even if selinux was disabled.

[root@stats ~]# sestatus 
SELinux status:                 disabled
[root@stats ~]# uptime
 19:43:21 up 2 days,  5:01,  1 user,  load average: 0.05, 0.43, 0.74
[root@stats ~]# free
             total       used       free     shared    buffers     cached
Mem:        508596     151392     357204          0        272       5152
-/+ buffers/cache:     145968     362628
Swap:       524284          0     524284
[root@stats ~]#

Comment 4 Jan ONDREJ 2011-04-07 06:30:52 UTC
Today most of processes at my server have been killed by OOM. I was unable to collect more data. Selinux is disabled.

Any progress with this bug? This happens mostly on my nagios/mrtg monitoring server. May be this is something networking related?

Comment 5 Dan Rimal 2011-04-17 17:51:07 UTC
Hello,

i have similar issue. For one month, kernel used 5.3GB of RAM for slab, as you can see: http://gal.danrimal.net/main.php?g2_itemId=1842

I use slightly modified kernel from F14 2.6.35.11-83.rt3.fc14.x86_64, where significant change is using fib_trie instead hash table and unused drivers etc. are off.

Leak is in kmalloc-192 probably, because after one hour, only kmalloc-192 increase (about 5MB/hour).

Server route internet traffic, about 200mbps peak.

Comment 6 Chuck Ebbert 2011-04-18 17:53:20 UTC
(In reply to comment #5)
> 
> I use slightly modified kernel from F14 2.6.35.11-83.rt3.fc14.x86_64, where
> significant change is using fib_trie instead hash table and unused drivers etc.
> are off.

Why are you still using 2.6.35.11 when 2.6.35.12 is out?

Comment 7 Chuck Ebbert 2011-04-18 17:53:46 UTC

*** This bug has been marked as a duplicate of bug 683568 ***

Comment 8 Jan ONDREJ 2011-04-19 05:41:49 UTC
Looks that this is really fixed in 2.6.35.12-88.fc14.i686.PAE.

Just on another machine starting with 2.6.35.12-88.fc14.i686.PAE, I have repeating freezes on at least one virtual machine. Serial and VNC consoles are dead, no ping. I can't collect more information, because it's dead. How this bug should be reported? After boot to older kernel machine works again.


Note You need to log in before you can comment on or make changes to this bug.