I am running Fedora 7 with kernel-2.6.22.9-91.fc7.x86_64 on a dual Xeon 2.8GHz (with hyperthreading) and 2G RAM. The only thing running on this box is clamd and clamav-milter. There are several busy mail servers running sendmail that connect to this milter to scan email for viruses, so clamav-milter often has 1000 threads (and they come and go quickly). The CPU load is light (usually in the 90% idle range). The clamd and clamav-milter processes have a relatively small resident size, although clamav-milter has a virtual size in the multi-gigabyte range (I guess due to thread stacks?). The kernel appears to be leaking RAM on this box at a fairly consistent rate. You can see the RAM use graph here: http://status.hiwaay.net/cricket/grapher.cgi?target=/servers/net-snmp/hornet;view=RAM+use;ranges=d:w:m:y Shutting down the clamd and clamav-milter processes has no effect. The used RAM of course goes down by ~100M, but that is equal to the resident size of the two processes. The only thing that will free up the memory is a reboot (that's where the sawtooth jumps are on the graph).
Did you try 2.6.23.1-10? http://koji.fedoraproject.org/koji/buildinfo?buildID=21666
I have loaded that kernel. I will watch it for a couple of days and let you know how it looks.
Still appears to be leaking memory. The only other thing "interesting" I can think of about this system is that clamav-milter and clamd are using a tmpfs for storage (clamav-milter actually writes incoming messages to a temp directory and signals clamd to scan the files). I have it mounted with "mode=1777,size=500m" and it usually doesn't have much data in it (it has 640K right now). Just to see if it made a difference, I stopped clamav-milter and clamd and unmounted the tmpfs, and the system still showed over 300M RAM in use not counting buffers/cache. There are no other processes using any significant amount of resident RAM.
What does /proc/meminfo say as the count goes up? Also, with SLUB there should be lots of debug info in /sys/slab/* The file Documentation/vm/slub.txt from the kernel-doc package describes how to use the SLUB debugging.
I figured I start with seeing what slabinfo reports on the running (no debug enabled) kernel, and I get: Unresolved alias :0000360 If I look in /sys/slab, kcopyd_job is a symlink to ../slab/:0000360, which does not exist (all the other symlinks appear to point to things that exist).
kcopyd is part of lvm, and used by dm_snapshot and dm_mirror.
For the moment, I hacked slabinfo.c to ignore kcopyd_job, just to see what I got. It looks like the :0000016 slab is the problem: Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg :0000016 14699932 16 235.2M 57423/8/4 256 0 0 8 * I enabled tracing on that slab for about a minute which made the server really busy (since that's a busy slab I guess). I'm not quite sure what to do with the logged info.
I think you need to try booting with slub_debug=U It's hard to say how much that will slow down the system, but it shouldn't be too bad.
Here is the info on kmalloc-16: Slabcache: kmalloc-16 Aliases: 0 Order : 0 Objects: 69745 Sizes (bytes) Slabs Debug Memory ------------------------------------------------------------------------ Object : 16 Total : 1091 Sanity Checks : Off Total: 4468736 SlabObj: 64 Full : 1082 Redzoning : Off Used : 1115920 SlabSiz: 4096 Partial: 5 Poisoning : Off Loss : 3352816 Loss : 48 CpuSlab: 4 Tracking : On Lalig: 3347760 Align : 8 Objects: 64 Tracing : Off Lpadd: 0 kmalloc-16 has no kmem_cache operations kmalloc-16: Kernel object allocation ----------------------------------------------------------------------- 4 arch_acpi_processor_init_pdc+0x41/0x118 age=491088/491092/491100 pid=1 cpus=0 4 arch_acpi_processor_init_pdc+0x95/0x118 age=491088/491092/491100 pid=1 cpus=0 226 sys_init_module+0x13ac/0x168c age=479352/484890/489962 pid=516-1454 cpus=0-3 13 mempool_create_node+0x53/0xca age=489299/490985/492039 pid=1-576 cpus=0-1 1 __vmalloc_area_node+0x71/0x170 age=212438 pid=2074 cpus=0 11 alloc_vfsmnt+0xbe/0xdb age=476442/483363/491794 pid=1-1495 cpus=0-3 1 proc_symlink+0x51/0x8f age=492335 pid=0 cpus=0 808 sysfs_new_dirent+0x29/0x105 age=456619/489711/492072 pid=1-2204 cpus=0-3 3 copy_semundo+0x32/0x79 age=458940/460478/463526 pid=1972-2074 cpus=2-3 11 key_alloc+0xef/0x34c age=455601/459193/464277 pid=1933-2230 cpus=0-3 17 selinux_key_alloc+0x21/0x48 age=455601/463408/492343 pid=0-2230 cpus=0-3 824 selinux_file_alloc_security+0x2a/0x53 age=0/303719/484179 pid=1-20430 cpus=0-3 1 hashtab_create+0x4d/0x83 age=484339 pid=1 cpus=1 159 perm_read+0x75/0xd1 age=484338/484344/484346 pid=1 cpus=1 1 policydb_init+0xa7/0x12e age=484339 pid=1 cpus=1 21 class_read+0xc6/0x233 age=484338/484343/484346 pid=1 cpus=1 3 role_read+0x79/0x139 age=484345/484345/484345 pid=1 cpus=1 981 type_read+0x89/0xe5 age=484333/484343/484345 pid=1 cpus=1 1 user_read+0x80/0x121 age=484333 pid=1 cpus=1 1 sens_read+0x28/0xfe age=484333 pid=1 cpus=1 6 policydb_read+0x3e0/0xbef age=484239/484239/484239 pid=1 cpus=0 1 policydb_read+0x645/0xbef age=484238 pid=1 cpus=0 12 policydb_read+0x7ac/0xbef age=484238/484238/484238 pid=1 cpus=0 6 policydb_read+0x89f/0xbef age=484238/484238/484238 pid=1 cpus=0 18011 security_netlbl_sid_to_secattr+0x73/0x105 age=12/212076/474187 pid=1632-3338 cpus=0-3 117 cond_read_list+0xec/0x201 age=484239/484260/484298 pid=1 cpus=0 22 cond_read_bool+0xc4/0xde age=484333/484333/484333 pid=1 cpus=1 48393 cond_insertf+0xdd/0x138 age=484239/484269/484298 pid=1 cpus=0 21 kvasprintf+0x44/0x6b age=481130/491870/492408 pid=0-930 cpus=0 3 acpi_ds_build_internal_package_obj+0x9b/0x11f age=492009/492012/492014 pid=1 cpus=1 7 acpi_ds_build_internal_buffer_obj+0xd8/0x123 age=492006/492007/492008 pid=1 cpus=1 2 reserve_range+0x25/0xe7 age=491746/491767/491789 pid=1 cpus=1 5 pnpacpi_option_resource+0x172/0x401 age=491850/491850/491850 pid=1 cpus=1 3 hub_probe+0x1ed/0x681 age=489634/489786/489928 pid=516-526 cpus=1-2 3 usb_cache_string+0x5a/0x88 age=489641/489793/489936 pid=516-526 cpus=1-2 1 usb_create_ep_files+0x43/0x2d9 age=489935 pid=516 cpus=1 1 pci_acpi_scan_root+0x23/0xc8 age=491968 pid=1 cpus=1 1 proto_register+0x111/0x1ec age=491640 pid=1 cpus=1 2 genl_register_family+0xd4/0x14c age=491794/491798/491802 pid=1 cpus=1 1 netlbl_domhsh_init+0x26/0xc9 age=491802 pid=1 cpus=1 15 pnpacpi_add_device_handler+0x16d/0x296 age=491846/491851/491863 pid=1 cpus=1 2 pnpacpi_add_device_handler+0x220/0x296 age=491853/491853/491854 pid=1 cpus=1 1 pci_mmcfg_arch_init+0x26/0x103 age=492052 pid=1 cpus=1 1 pci_mmcfg_e7520+0x5c/0xa8 age=492059 pid=1 cpus=1 1 ext3_fill_super+0x71f/0x1472 [ext3] age=484663 pid=1 cpus=3 2 run+0xe4/0x422 [raid1] age=479530/482338/485147 pid=645-1437 cpus=0,3 6 linear_ctr+0x31/0xc3 [dm_mod] age=484996/485029/485063 pid=650 cpus=1-3 6 dev_create+0xa0/0x1fa [dm_mod] age=484996/485029/485063 pid=650 cpus=1-3 kmalloc-16: Kernel object freeing ------------------------------------------------------------------------ 58147 <not-available> age=4295159737 pid=0 cpus=0 4 free_sect_attrs+0x1f/0x31 age=479865/481223/485299 pid=636-1392 cpus=0-1 14 slab_objects+0x241/0x268 age=38128/319152/407343 pid=2382-19662 cpus=0-3 69 put_filp+0x1c/0x45 age=161882/464441/491164 pid=1-2254 cpus=0-3 11304 __fput+0xd2/0x191 age=2/221287/491176 pid=1-20427 cpus=0-3 1 free_vfsmnt+0xd/0x1d age=484620 pid=1 cpus=3 1 bio_free_map_data+0xc/0x15 age=457491 pid=2175 cpus=2 1 bio_uncopy_user+0xf4/0x10e age=457491 pid=2175 cpus=2 5 remove_kevent+0x31/0x44 age=456796/460105/464967 pid=1900 cpus=1-3 3 release_sysfs_dirent+0x38/0x95 age=479865/487357/491114 pid=1-1392 cpus=0 2 key_cleanup+0x72/0xf2 age=383438/419532/455626 pid=15-17 cpus=0,2 2 key_cleanup+0xde/0xf2 age=383438/400265/417093 pid=15-17 cpus=0,2 10 selinux_bprm_free_security+0x10/0x1d age=54456/389987/483934 pid=657-18857 cpus=0-3 1 sel_write_load+0x50e/0x5da age=484198 pid=1 cpus=0 1 role_destroy+0x9/0x27 age=484359 pid=1 cpus=1 1 policydb_read+0x125/0xbef age=484352 pid=1 cpus=1 1 ksign_free_signature+0x35/0x44 age=491135 pid=1 cpus=0 142 kobject_uevent_env+0x3cc/0x40f age=456632/488785/492086 pid=1-2204 cpus=0-3 1 acpi_ds_create_operand+0x1d6/0x1f2 age=491860 pid=1 cpus=1 4 acpi_evaluate_object+0x1c9/0x1e5 age=491102/491106/491113 pid=1 cpus=0 4 acpi_ut_delete_internal_obj+0x151/0x162 age=491102/491110/491119 pid=1 cpus=0 1 reserve_range+0x8f/0xe7 age=491764 pid=1 cpus=1 1 device_del+0xbd/0x2ea age=490824 pid=1 cpus=0 10 device_add+0x210/0x630 age=480597/487754/490824 pid=1-1058 cpus=0-2 3 bus_add_driver+0xb1/0x19d age=487847/489734/491079 pid=1-596 cpus=0-2 4 class_device_add+0x26e/0x3ca age=488592/488938/489289 pid=596 cpus=0,3 3 usb_get_configuration+0xe86/0xee3 age=489654/489807/489950 pid=516-526 cpus=1-2 1 get_modalias+0xb9/0xf2 age=492073 pid=1 cpus=1 kmalloc-16: No NUMA information available. The allocation entry that is increasing most is security_netlbl_sid_to_secattr, and the freeing entries that are increasing most are <not-available> and __fput. I rebooted with SELinux disabled, and kmalloc-16 is now sitting as 1200 objects and holding (and everything else looks to be steady as well).
Was the above trace from kernel 2.6.23.1-10? I'm trying to track down the leak.
Yes, it was from 2.6.23-1.10.fc7.x86_64.
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged.
I have not tried it again; this is a production system so I can't just poke at it. I was waiting to see if someone thought the problem might be fixed before I tried again; if something has changed that should fix it, let me know and I'll try it.
*** Bug 428950 has been marked as a duplicate of this bug. ***
I've just posted a quick (aka untested) patch which I believe should solve the problem, if you are able to test the patch please let me know if this solves your problem. The patch can be found here: * http://marc.info/?l=linux-netdev&m=120071344429617&w=2
Chris, I can create a test kernel for you with just this additional patch applied if you wish. I take it you are still running Fedora 7?
I'm building kernel-2.6.23.12-52 right now with this patch applied. I'll have to re-label the filesystem as well (since I've loaded updates and such since setting SELinux to disabled), but I think I will be able to install this today. I'll let you know what I see.
It has been running about an hour now, and it is looking okay. One question for Red Hat kernel people on this CC: I'm planning on switching this box from Fedora 7 to RHEL 5; will this bug also affect the current RHEL 5 kernel (and if so, will this patch be applied there as well)?
selinux_netlbl_socket_setsid() in RHEL5 does not have this leak. I looked through all of the struct netlbl_lsm_secattr users in security/selinux/ss/services.c and they all seem to init/destroy.
Good to know about RHEL - thanks for checking that.
Fix is in 2.6.23.14-66.
kernel-2.6.23.15-80.fc7 has been submitted as an update for Fedora 7
kernel-2.6.23.15-80.fc7 has been pushed to the Fedora 7 stable repository. If problems still persist, please make note of it in this bug report.