Description of problem: Afeter aprox. 10 days of running of my kvm guest machine has >400 MB memory used, minimum swap, minimum buffers. These 400 MB remains after almost all processes are killed. Version-Release number of selected component (if applicable): Linux stats.upjs.sk 2.6.35.11-83.fc14.i686.PAE #1 SMP Mon Feb 7 06:57:55 UTC 2011 i686 i686 i386 GNU/Linux (host is a fully updated f13 2.6.34.7-66.fc13.x86_64) How reproducible: seeing now on 2 servers, unable to check other servers, because I need to kill all processes to see how RAM is used. Additional info: [root@stats ~]# lsmod Module Size Used by sunrpc 165546 1 ip6t_REJECT 3470 1 ip6table_filter 1207 1 ip6_tables 9929 1 ip6table_filter ipv6 229581 37 ip6t_REJECT nf_conntrack_netbios_ns 1126 0 i2c_piix4 10574 0 i2c_core 21445 1 i2c_piix4 virtio_net 11031 0 virtio_balloon 3557 0 raid1 17323 2 virtio_blk 4059 6 virtio_pci 4902 0 [root@stats ~]# top sorted by memory usage: top - 21:28:42 up 10 days, 13:33, 1 user, load average: 0.06, 0.23, 0.47 Tasks: 71 total, 1 running, 70 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 508596k total, 441048k used, 67548k free, 3780k buffers Swap: 524284k total, 6664k used, 517620k free, 21376k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27409 root 20 0 6708 1820 720 S 0.0 0.4 0:00.74 bash 9528 root 20 0 2592 960 760 R 0.0 0.2 0:00.02 top 1668 ntp 20 0 5300 452 344 S 0.0 0.1 0:15.90 ntpd 27406 root 20 0 12224 364 256 S 0.0 0.1 0:00.71 sshd 1343 root 20 0 30436 24 24 S 0.0 0.0 0:21.88 rsyslogd 1472 root 20 0 2764 4 4 S 0.0 0.0 0:04.79 mdadm 2026 root 20 0 2000 4 4 S 0.0 0.0 0:00.00 mingetty 2028 root 20 0 2000 4 4 S 0.0 0.0 0:00.00 mingetty 2030 root 20 0 2012 4 4 S 0.0 0.0 0:00.00 agetty 2031 root 20 0 2000 4 4 S 0.0 0.0 0:00.00 mingetty 2033 root 20 0 2000 4 4 S 0.0 0.0 0:00.00 mingetty 2035 root 20 0 2000 4 4 S 0.0 0.0 0:00.00 mingetty 2041 root 20 0 2000 4 4 S 0.0 0.0 0:00.00 mingetty 1 root 20 0 2872 0 0 S 0.0 0.0 5:50.42 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 1:29.00 ksoftirqd/0 4 root RT 0 0 0 0 S 0.0 0.0 1:18.29 migration/0 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 slabtop: Active / Total Objects (% used) : 6960524 / 6993548 (99.5%) Active / Total Slabs (% used) : 100678 / 100678 (100.0%) Active / Total Caches (% used) : 58 / 77 (75.3%) Active / Total Size (% used) : 403166.52K / 406684.44K (99.1%) Minimum / Average / Maximum Object : 0.01K / 0.06K / 8.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 3064192 3063036 99% 0.03K 23939 128 95756K kmalloc-32 1544928 1535693 99% 0.12K 48279 32 193116K kmalloc-128 1507776 1501355 99% 0.06K 23559 64 94236K kmalloc-64 503808 500719 99% 0.02K 1968 256 7872K kmalloc-16 294400 294361 99% 0.01K 575 512 2300K kmalloc-8 22610 18994 84% 0.05K 266 85 1064K selinux_inode_security 18690 18446 98% 0.19K 890 21 3560K kmalloc-192 9480 8424 88% 0.13K 316 30 1264K dentry 5808 5762 99% 0.35K 264 22 2112K inode_cache 5100 1669 32% 0.02K 30 170 120K anon_vma_chain 2856 1888 66% 0.09K 68 42 272K kmalloc-96 2380 808 33% 0.02K 14 170 56K anon_vma 1392 1392 100% 0.50K 87 16 696K ext3_inode_cache 1387 1102 79% 0.05K 19 73 76K buffer_head 1326 1223 92% 0.04K 13 102 52K Acpi-Operand I am now trying latest koji kernel for f14.
Can you try this to debug it? 1. Boot with kernel option "slub_debug=U" 2. Let it run for a while until there's obviously a leak 3. Attach the contents of /sys/kernel/slab/kmalloc-128/{alloc,free}_calls to this bug report (two separate plaintext attachments.) See if there's any obvious imbalance between alloc and free calls in those files. You may also want to check the kmalloc-32 and kmalloc-64 slabs as well.
Created attachment 484091 [details] /sys/kernel/slab data + free + top after 5 days of running After 5 days of data collection here are results. I don't know, what exactly to search for in these logs, but for my amateur look there are too many selinux_cred_prepare kmalloc calls and too few frees. 847099 selinux_cred_prepare+0x18/0x2b age=16624/257343151/508671994 pid=0-32767 cpus=0-1 218536 selinux_cred_free+0x22/0x24 age=16806/248721201/508671685 pid=0-32767 cpus=0-1 If you need, I have whole /sys/kernel/slab directory stored. I can disable selinux on this machine for testing, if requested, but result will be available only after aprox. 5 next days again.
Looks like this is not an selinux issue. After 2 days of uptime still 150M used, even if selinux was disabled. [root@stats ~]# sestatus SELinux status: disabled [root@stats ~]# uptime 19:43:21 up 2 days, 5:01, 1 user, load average: 0.05, 0.43, 0.74 [root@stats ~]# free total used free shared buffers cached Mem: 508596 151392 357204 0 272 5152 -/+ buffers/cache: 145968 362628 Swap: 524284 0 524284 [root@stats ~]#
Today most of processes at my server have been killed by OOM. I was unable to collect more data. Selinux is disabled. Any progress with this bug? This happens mostly on my nagios/mrtg monitoring server. May be this is something networking related?
Hello, i have similar issue. For one month, kernel used 5.3GB of RAM for slab, as you can see: http://gal.danrimal.net/main.php?g2_itemId=1842 I use slightly modified kernel from F14 2.6.35.11-83.rt3.fc14.x86_64, where significant change is using fib_trie instead hash table and unused drivers etc. are off. Leak is in kmalloc-192 probably, because after one hour, only kmalloc-192 increase (about 5MB/hour). Server route internet traffic, about 200mbps peak.
(In reply to comment #5) > > I use slightly modified kernel from F14 2.6.35.11-83.rt3.fc14.x86_64, where > significant change is using fib_trie instead hash table and unused drivers etc. > are off. Why are you still using 2.6.35.11 when 2.6.35.12 is out?
*** This bug has been marked as a duplicate of bug 683568 ***
Looks that this is really fixed in 2.6.35.12-88.fc14.i686.PAE. Just on another machine starting with 2.6.35.12-88.fc14.i686.PAE, I have repeating freezes on at least one virtual machine. Serial and VNC consoles are dead, no ping. I can't collect more information, because it's dead. How this bug should be reported? After boot to older kernel machine works again.