Red Hat Bugzilla – Bug 352281
Kernel memory leak with lots of threads?
Last modified: 2008-02-11 17:38:20 EST
I am running Fedora 7 with kernel-184.108.40.206-91.fc7.x86_64 on a dual Xeon 2.8GHz
(with hyperthreading) and 2G RAM. The only thing running on this box is clamd
and clamav-milter. There are several busy mail servers running sendmail that
connect to this milter to scan email for viruses, so clamav-milter often has
1000 threads (and they come and go quickly). The CPU load is light (usually in
the 90% idle range). The clamd and clamav-milter processes have a relatively
small resident size, although clamav-milter has a virtual size in the
multi-gigabyte range (I guess due to thread stacks?).
The kernel appears to be leaking RAM on this box at a fairly consistent rate.
You can see the RAM use graph here:
Shutting down the clamd and clamav-milter processes has no effect. The used RAM
of course goes down by ~100M, but that is equal to the resident size of the two
processes. The only thing that will free up the memory is a reboot (that's
where the sawtooth jumps are on the graph).
Did you try 220.127.116.11-10?
I have loaded that kernel. I will watch it for a couple of days and let you
know how it looks.
Still appears to be leaking memory.
The only other thing "interesting" I can think of about this system is that
clamav-milter and clamd are using a tmpfs for storage (clamav-milter actually
writes incoming messages to a temp directory and signals clamd to scan the
files). I have it mounted with "mode=1777,size=500m" and it usually doesn't
have much data in it (it has 640K right now).
Just to see if it made a difference, I stopped clamav-milter and clamd and
unmounted the tmpfs, and the system still showed over 300M RAM in use not
counting buffers/cache. There are no other processes using any significant
amount of resident RAM.
What does /proc/meminfo say as the count goes up?
Also, with SLUB there should be lots of debug info in /sys/slab/*
The file Documentation/vm/slub.txt from the kernel-doc package describes how to
use the SLUB debugging.
I figured I start with seeing what slabinfo reports on the running (no debug
enabled) kernel, and I get:
Unresolved alias :0000360
If I look in /sys/slab, kcopyd_job is a symlink to ../slab/:0000360, which does
not exist (all the other symlinks appear to point to things that exist).
kcopyd is part of lvm, and used by dm_snapshot and dm_mirror.
For the moment, I hacked slabinfo.c to ignore kcopyd_job, just to see what I
got. It looks like the :0000016 slab is the problem:
Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg
:0000016 14699932 16 235.2M 57423/8/4 256 0 0 8 *
I enabled tracing on that slab for about a minute which made the server really
busy (since that's a busy slab I guess). I'm not quite sure what to do with the
I think you need to try booting with slub_debug=U
It's hard to say how much that will slow down the system, but it shouldn't be
Here is the info on kmalloc-16:
Slabcache: kmalloc-16 Aliases: 0 Order : 0 Objects: 69745
Sizes (bytes) Slabs Debug Memory
Object : 16 Total : 1091 Sanity Checks : Off Total: 4468736
SlabObj: 64 Full : 1082 Redzoning : Off Used : 1115920
SlabSiz: 4096 Partial: 5 Poisoning : Off Loss : 3352816
Loss : 48 CpuSlab: 4 Tracking : On Lalig: 3347760
Align : 8 Objects: 64 Tracing : Off Lpadd: 0
kmalloc-16 has no kmem_cache operations
kmalloc-16: Kernel object allocation
4 arch_acpi_processor_init_pdc+0x41/0x118 age=491088/491092/491100 pid=1
4 arch_acpi_processor_init_pdc+0x95/0x118 age=491088/491092/491100 pid=1
226 sys_init_module+0x13ac/0x168c age=479352/484890/489962 pid=516-1454 cpus=0-3
13 mempool_create_node+0x53/0xca age=489299/490985/492039 pid=1-576 cpus=0-1
1 __vmalloc_area_node+0x71/0x170 age=212438 pid=2074 cpus=0
11 alloc_vfsmnt+0xbe/0xdb age=476442/483363/491794 pid=1-1495 cpus=0-3
1 proc_symlink+0x51/0x8f age=492335 pid=0 cpus=0
808 sysfs_new_dirent+0x29/0x105 age=456619/489711/492072 pid=1-2204 cpus=0-3
3 copy_semundo+0x32/0x79 age=458940/460478/463526 pid=1972-2074 cpus=2-3
11 key_alloc+0xef/0x34c age=455601/459193/464277 pid=1933-2230 cpus=0-3
17 selinux_key_alloc+0x21/0x48 age=455601/463408/492343 pid=0-2230 cpus=0-3
824 selinux_file_alloc_security+0x2a/0x53 age=0/303719/484179 pid=1-20430
1 hashtab_create+0x4d/0x83 age=484339 pid=1 cpus=1
159 perm_read+0x75/0xd1 age=484338/484344/484346 pid=1 cpus=1
1 policydb_init+0xa7/0x12e age=484339 pid=1 cpus=1
21 class_read+0xc6/0x233 age=484338/484343/484346 pid=1 cpus=1
3 role_read+0x79/0x139 age=484345/484345/484345 pid=1 cpus=1
981 type_read+0x89/0xe5 age=484333/484343/484345 pid=1 cpus=1
1 user_read+0x80/0x121 age=484333 pid=1 cpus=1
1 sens_read+0x28/0xfe age=484333 pid=1 cpus=1
6 policydb_read+0x3e0/0xbef age=484239/484239/484239 pid=1 cpus=0
1 policydb_read+0x645/0xbef age=484238 pid=1 cpus=0
12 policydb_read+0x7ac/0xbef age=484238/484238/484238 pid=1 cpus=0
6 policydb_read+0x89f/0xbef age=484238/484238/484238 pid=1 cpus=0
18011 security_netlbl_sid_to_secattr+0x73/0x105 age=12/212076/474187
117 cond_read_list+0xec/0x201 age=484239/484260/484298 pid=1 cpus=0
22 cond_read_bool+0xc4/0xde age=484333/484333/484333 pid=1 cpus=1
48393 cond_insertf+0xdd/0x138 age=484239/484269/484298 pid=1 cpus=0
21 kvasprintf+0x44/0x6b age=481130/491870/492408 pid=0-930 cpus=0
3 acpi_ds_build_internal_package_obj+0x9b/0x11f age=492009/492012/492014
7 acpi_ds_build_internal_buffer_obj+0xd8/0x123 age=492006/492007/492008
2 reserve_range+0x25/0xe7 age=491746/491767/491789 pid=1 cpus=1
5 pnpacpi_option_resource+0x172/0x401 age=491850/491850/491850 pid=1 cpus=1
3 hub_probe+0x1ed/0x681 age=489634/489786/489928 pid=516-526 cpus=1-2
3 usb_cache_string+0x5a/0x88 age=489641/489793/489936 pid=516-526 cpus=1-2
1 usb_create_ep_files+0x43/0x2d9 age=489935 pid=516 cpus=1
1 pci_acpi_scan_root+0x23/0xc8 age=491968 pid=1 cpus=1
1 proto_register+0x111/0x1ec age=491640 pid=1 cpus=1
2 genl_register_family+0xd4/0x14c age=491794/491798/491802 pid=1 cpus=1
1 netlbl_domhsh_init+0x26/0xc9 age=491802 pid=1 cpus=1
15 pnpacpi_add_device_handler+0x16d/0x296 age=491846/491851/491863 pid=1 cpus=1
2 pnpacpi_add_device_handler+0x220/0x296 age=491853/491853/491854 pid=1 cpus=1
1 pci_mmcfg_arch_init+0x26/0x103 age=492052 pid=1 cpus=1
1 pci_mmcfg_e7520+0x5c/0xa8 age=492059 pid=1 cpus=1
1 ext3_fill_super+0x71f/0x1472 [ext3] age=484663 pid=1 cpus=3
2 run+0xe4/0x422 [raid1] age=479530/482338/485147 pid=645-1437 cpus=0,3
6 linear_ctr+0x31/0xc3 [dm_mod] age=484996/485029/485063 pid=650 cpus=1-3
6 dev_create+0xa0/0x1fa [dm_mod] age=484996/485029/485063 pid=650 cpus=1-3
kmalloc-16: Kernel object freeing
58147 <not-available> age=4295159737 pid=0 cpus=0
4 free_sect_attrs+0x1f/0x31 age=479865/481223/485299 pid=636-1392 cpus=0-1
14 slab_objects+0x241/0x268 age=38128/319152/407343 pid=2382-19662 cpus=0-3
69 put_filp+0x1c/0x45 age=161882/464441/491164 pid=1-2254 cpus=0-3
11304 __fput+0xd2/0x191 age=2/221287/491176 pid=1-20427 cpus=0-3
1 free_vfsmnt+0xd/0x1d age=484620 pid=1 cpus=3
1 bio_free_map_data+0xc/0x15 age=457491 pid=2175 cpus=2
1 bio_uncopy_user+0xf4/0x10e age=457491 pid=2175 cpus=2
5 remove_kevent+0x31/0x44 age=456796/460105/464967 pid=1900 cpus=1-3
3 release_sysfs_dirent+0x38/0x95 age=479865/487357/491114 pid=1-1392 cpus=0
2 key_cleanup+0x72/0xf2 age=383438/419532/455626 pid=15-17 cpus=0,2
2 key_cleanup+0xde/0xf2 age=383438/400265/417093 pid=15-17 cpus=0,2
10 selinux_bprm_free_security+0x10/0x1d age=54456/389987/483934
1 sel_write_load+0x50e/0x5da age=484198 pid=1 cpus=0
1 role_destroy+0x9/0x27 age=484359 pid=1 cpus=1
1 policydb_read+0x125/0xbef age=484352 pid=1 cpus=1
1 ksign_free_signature+0x35/0x44 age=491135 pid=1 cpus=0
142 kobject_uevent_env+0x3cc/0x40f age=456632/488785/492086 pid=1-2204 cpus=0-3
1 acpi_ds_create_operand+0x1d6/0x1f2 age=491860 pid=1 cpus=1
4 acpi_evaluate_object+0x1c9/0x1e5 age=491102/491106/491113 pid=1 cpus=0
4 acpi_ut_delete_internal_obj+0x151/0x162 age=491102/491110/491119 pid=1
1 reserve_range+0x8f/0xe7 age=491764 pid=1 cpus=1
1 device_del+0xbd/0x2ea age=490824 pid=1 cpus=0
10 device_add+0x210/0x630 age=480597/487754/490824 pid=1-1058 cpus=0-2
3 bus_add_driver+0xb1/0x19d age=487847/489734/491079 pid=1-596 cpus=0-2
4 class_device_add+0x26e/0x3ca age=488592/488938/489289 pid=596 cpus=0,3
3 usb_get_configuration+0xe86/0xee3 age=489654/489807/489950 pid=516-526
1 get_modalias+0xb9/0xf2 age=492073 pid=1 cpus=1
kmalloc-16: No NUMA information available.
The allocation entry that is increasing most is security_netlbl_sid_to_secattr,
and the freeing entries that are increasing most are <not-available> and __fput.
I rebooted with SELinux disabled, and kmalloc-16 is now sitting as 1200 objects
and holding (and everything else looks to be steady as well).
Was the above trace from kernel 18.104.22.168-10? I'm trying to track down the leak.
Yes, it was from 2.6.23-1.10.fc7.x86_64.
I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.
There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?
If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.
I have not tried it again; this is a production system so I can't just poke at
it. I was waiting to see if someone thought the problem might be fixed before I
tried again; if something has changed that should fix it, let me know and I'll
*** Bug 428950 has been marked as a duplicate of this bug. ***
I've just posted a quick (aka untested) patch which I believe should solve the
problem, if you are able to test the patch please let me know if this solves
your problem. The patch can be found here:
I can create a test kernel for you with just this additional patch applied if
you wish. I take it you are still running Fedora 7?
I'm building kernel-22.214.171.124-52 right now with this patch applied. I'll have
to re-label the filesystem as well (since I've loaded updates and such since
setting SELinux to disabled), but I think I will be able to install this today.
I'll let you know what I see.
It has been running about an hour now, and it is looking okay.
One question for Red Hat kernel people on this CC: I'm planning on switching
this box from Fedora 7 to RHEL 5; will this bug also affect the current RHEL 5
kernel (and if so, will this patch be applied there as well)?
selinux_netlbl_socket_setsid() in RHEL5 does not have this leak. I looked
through all of the struct netlbl_lsm_secattr users in
security/selinux/ss/services.c and they all seem to init/destroy.
Good to know about RHEL - thanks for checking that.
Fix is in 126.96.36.199-66.
kernel-188.8.131.52-80.fc7 has been submitted as an update for Fedora 7
kernel-184.108.40.206-80.fc7 has been pushed to the Fedora 7 stable repository. If problems still persist, please make note of it in this bug report.