Bug 352281
| Summary: | Kernel memory leak with lots of threads? | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Chris Adams <linux> |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 7 | CC: | chris.brown, eparis, fdbugs, jmorris, paul.moore |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | 2.6.23.15-80.fc7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-02-11 22:38:20 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Chris Adams
2007-10-25 13:41:12 UTC
Did you try 2.6.23.1-10? http://koji.fedoraproject.org/koji/buildinfo?buildID=21666 I have loaded that kernel. I will watch it for a couple of days and let you know how it looks. Still appears to be leaking memory. The only other thing "interesting" I can think of about this system is that clamav-milter and clamd are using a tmpfs for storage (clamav-milter actually writes incoming messages to a temp directory and signals clamd to scan the files). I have it mounted with "mode=1777,size=500m" and it usually doesn't have much data in it (it has 640K right now). Just to see if it made a difference, I stopped clamav-milter and clamd and unmounted the tmpfs, and the system still showed over 300M RAM in use not counting buffers/cache. There are no other processes using any significant amount of resident RAM. What does /proc/meminfo say as the count goes up? Also, with SLUB there should be lots of debug info in /sys/slab/* The file Documentation/vm/slub.txt from the kernel-doc package describes how to use the SLUB debugging. I figured I start with seeing what slabinfo reports on the running (no debug enabled) kernel, and I get: Unresolved alias :0000360 If I look in /sys/slab, kcopyd_job is a symlink to ../slab/:0000360, which does not exist (all the other symlinks appear to point to things that exist). kcopyd is part of lvm, and used by dm_snapshot and dm_mirror. For the moment, I hacked slabinfo.c to ignore kcopyd_job, just to see what I got. It looks like the :0000016 slab is the problem: Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg :0000016 14699932 16 235.2M 57423/8/4 256 0 0 8 * I enabled tracing on that slab for about a minute which made the server really busy (since that's a busy slab I guess). I'm not quite sure what to do with the logged info. I think you need to try booting with slub_debug=U It's hard to say how much that will slow down the system, but it shouldn't be too bad. Here is the info on kmalloc-16:
Slabcache: kmalloc-16 Aliases: 0 Order : 0 Objects: 69745
Sizes (bytes) Slabs Debug Memory
------------------------------------------------------------------------
Object : 16 Total : 1091 Sanity Checks : Off Total: 4468736
SlabObj: 64 Full : 1082 Redzoning : Off Used : 1115920
SlabSiz: 4096 Partial: 5 Poisoning : Off Loss : 3352816
Loss : 48 CpuSlab: 4 Tracking : On Lalig: 3347760
Align : 8 Objects: 64 Tracing : Off Lpadd: 0
kmalloc-16 has no kmem_cache operations
kmalloc-16: Kernel object allocation
-----------------------------------------------------------------------
4 arch_acpi_processor_init_pdc+0x41/0x118 age=491088/491092/491100 pid=1
cpus=0
4 arch_acpi_processor_init_pdc+0x95/0x118 age=491088/491092/491100 pid=1
cpus=0
226 sys_init_module+0x13ac/0x168c age=479352/484890/489962 pid=516-1454 cpus=0-3
13 mempool_create_node+0x53/0xca age=489299/490985/492039 pid=1-576 cpus=0-1
1 __vmalloc_area_node+0x71/0x170 age=212438 pid=2074 cpus=0
11 alloc_vfsmnt+0xbe/0xdb age=476442/483363/491794 pid=1-1495 cpus=0-3
1 proc_symlink+0x51/0x8f age=492335 pid=0 cpus=0
808 sysfs_new_dirent+0x29/0x105 age=456619/489711/492072 pid=1-2204 cpus=0-3
3 copy_semundo+0x32/0x79 age=458940/460478/463526 pid=1972-2074 cpus=2-3
11 key_alloc+0xef/0x34c age=455601/459193/464277 pid=1933-2230 cpus=0-3
17 selinux_key_alloc+0x21/0x48 age=455601/463408/492343 pid=0-2230 cpus=0-3
824 selinux_file_alloc_security+0x2a/0x53 age=0/303719/484179 pid=1-20430
cpus=0-3
1 hashtab_create+0x4d/0x83 age=484339 pid=1 cpus=1
159 perm_read+0x75/0xd1 age=484338/484344/484346 pid=1 cpus=1
1 policydb_init+0xa7/0x12e age=484339 pid=1 cpus=1
21 class_read+0xc6/0x233 age=484338/484343/484346 pid=1 cpus=1
3 role_read+0x79/0x139 age=484345/484345/484345 pid=1 cpus=1
981 type_read+0x89/0xe5 age=484333/484343/484345 pid=1 cpus=1
1 user_read+0x80/0x121 age=484333 pid=1 cpus=1
1 sens_read+0x28/0xfe age=484333 pid=1 cpus=1
6 policydb_read+0x3e0/0xbef age=484239/484239/484239 pid=1 cpus=0
1 policydb_read+0x645/0xbef age=484238 pid=1 cpus=0
12 policydb_read+0x7ac/0xbef age=484238/484238/484238 pid=1 cpus=0
6 policydb_read+0x89f/0xbef age=484238/484238/484238 pid=1 cpus=0
18011 security_netlbl_sid_to_secattr+0x73/0x105 age=12/212076/474187
pid=1632-3338 cpus=0-3
117 cond_read_list+0xec/0x201 age=484239/484260/484298 pid=1 cpus=0
22 cond_read_bool+0xc4/0xde age=484333/484333/484333 pid=1 cpus=1
48393 cond_insertf+0xdd/0x138 age=484239/484269/484298 pid=1 cpus=0
21 kvasprintf+0x44/0x6b age=481130/491870/492408 pid=0-930 cpus=0
3 acpi_ds_build_internal_package_obj+0x9b/0x11f age=492009/492012/492014
pid=1 cpus=1
7 acpi_ds_build_internal_buffer_obj+0xd8/0x123 age=492006/492007/492008
pid=1 cpus=1
2 reserve_range+0x25/0xe7 age=491746/491767/491789 pid=1 cpus=1
5 pnpacpi_option_resource+0x172/0x401 age=491850/491850/491850 pid=1 cpus=1
3 hub_probe+0x1ed/0x681 age=489634/489786/489928 pid=516-526 cpus=1-2
3 usb_cache_string+0x5a/0x88 age=489641/489793/489936 pid=516-526 cpus=1-2
1 usb_create_ep_files+0x43/0x2d9 age=489935 pid=516 cpus=1
1 pci_acpi_scan_root+0x23/0xc8 age=491968 pid=1 cpus=1
1 proto_register+0x111/0x1ec age=491640 pid=1 cpus=1
2 genl_register_family+0xd4/0x14c age=491794/491798/491802 pid=1 cpus=1
1 netlbl_domhsh_init+0x26/0xc9 age=491802 pid=1 cpus=1
15 pnpacpi_add_device_handler+0x16d/0x296 age=491846/491851/491863 pid=1 cpus=1
2 pnpacpi_add_device_handler+0x220/0x296 age=491853/491853/491854 pid=1 cpus=1
1 pci_mmcfg_arch_init+0x26/0x103 age=492052 pid=1 cpus=1
1 pci_mmcfg_e7520+0x5c/0xa8 age=492059 pid=1 cpus=1
1 ext3_fill_super+0x71f/0x1472 [ext3] age=484663 pid=1 cpus=3
2 run+0xe4/0x422 [raid1] age=479530/482338/485147 pid=645-1437 cpus=0,3
6 linear_ctr+0x31/0xc3 [dm_mod] age=484996/485029/485063 pid=650 cpus=1-3
6 dev_create+0xa0/0x1fa [dm_mod] age=484996/485029/485063 pid=650 cpus=1-3
kmalloc-16: Kernel object freeing
------------------------------------------------------------------------
58147 <not-available> age=4295159737 pid=0 cpus=0
4 free_sect_attrs+0x1f/0x31 age=479865/481223/485299 pid=636-1392 cpus=0-1
14 slab_objects+0x241/0x268 age=38128/319152/407343 pid=2382-19662 cpus=0-3
69 put_filp+0x1c/0x45 age=161882/464441/491164 pid=1-2254 cpus=0-3
11304 __fput+0xd2/0x191 age=2/221287/491176 pid=1-20427 cpus=0-3
1 free_vfsmnt+0xd/0x1d age=484620 pid=1 cpus=3
1 bio_free_map_data+0xc/0x15 age=457491 pid=2175 cpus=2
1 bio_uncopy_user+0xf4/0x10e age=457491 pid=2175 cpus=2
5 remove_kevent+0x31/0x44 age=456796/460105/464967 pid=1900 cpus=1-3
3 release_sysfs_dirent+0x38/0x95 age=479865/487357/491114 pid=1-1392 cpus=0
2 key_cleanup+0x72/0xf2 age=383438/419532/455626 pid=15-17 cpus=0,2
2 key_cleanup+0xde/0xf2 age=383438/400265/417093 pid=15-17 cpus=0,2
10 selinux_bprm_free_security+0x10/0x1d age=54456/389987/483934
pid=657-18857 cpus=0-3
1 sel_write_load+0x50e/0x5da age=484198 pid=1 cpus=0
1 role_destroy+0x9/0x27 age=484359 pid=1 cpus=1
1 policydb_read+0x125/0xbef age=484352 pid=1 cpus=1
1 ksign_free_signature+0x35/0x44 age=491135 pid=1 cpus=0
142 kobject_uevent_env+0x3cc/0x40f age=456632/488785/492086 pid=1-2204 cpus=0-3
1 acpi_ds_create_operand+0x1d6/0x1f2 age=491860 pid=1 cpus=1
4 acpi_evaluate_object+0x1c9/0x1e5 age=491102/491106/491113 pid=1 cpus=0
4 acpi_ut_delete_internal_obj+0x151/0x162 age=491102/491110/491119 pid=1
cpus=0
1 reserve_range+0x8f/0xe7 age=491764 pid=1 cpus=1
1 device_del+0xbd/0x2ea age=490824 pid=1 cpus=0
10 device_add+0x210/0x630 age=480597/487754/490824 pid=1-1058 cpus=0-2
3 bus_add_driver+0xb1/0x19d age=487847/489734/491079 pid=1-596 cpus=0-2
4 class_device_add+0x26e/0x3ca age=488592/488938/489289 pid=596 cpus=0,3
3 usb_get_configuration+0xe86/0xee3 age=489654/489807/489950 pid=516-526
cpus=1-2
1 get_modalias+0xb9/0xf2 age=492073 pid=1 cpus=1
kmalloc-16: No NUMA information available.
The allocation entry that is increasing most is security_netlbl_sid_to_secattr,
and the freeing entries that are increasing most are <not-available> and __fput.
I rebooted with SELinux disabled, and kmalloc-16 is now sitting as 1200 objects
and holding (and everything else looks to be steady as well).
Was the above trace from kernel 2.6.23.1-10? I'm trying to track down the leak. Yes, it was from 2.6.23-1.10.fc7.x86_64. Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. I have not tried it again; this is a production system so I can't just poke at it. I was waiting to see if someone thought the problem might be fixed before I tried again; if something has changed that should fix it, let me know and I'll try it. *** Bug 428950 has been marked as a duplicate of this bug. *** I've just posted a quick (aka untested) patch which I believe should solve the problem, if you are able to test the patch please let me know if this solves your problem. The patch can be found here: * http://marc.info/?l=linux-netdev&m=120071344429617&w=2 Chris, I can create a test kernel for you with just this additional patch applied if you wish. I take it you are still running Fedora 7? I'm building kernel-2.6.23.12-52 right now with this patch applied. I'll have to re-label the filesystem as well (since I've loaded updates and such since setting SELinux to disabled), but I think I will be able to install this today. I'll let you know what I see. It has been running about an hour now, and it is looking okay. One question for Red Hat kernel people on this CC: I'm planning on switching this box from Fedora 7 to RHEL 5; will this bug also affect the current RHEL 5 kernel (and if so, will this patch be applied there as well)? selinux_netlbl_socket_setsid() in RHEL5 does not have this leak. I looked through all of the struct netlbl_lsm_secattr users in security/selinux/ss/services.c and they all seem to init/destroy. Good to know about RHEL - thanks for checking that. Fix is in 2.6.23.14-66. kernel-2.6.23.15-80.fc7 has been submitted as an update for Fedora 7 kernel-2.6.23.15-80.fc7 has been pushed to the Fedora 7 stable repository. If problems still persist, please make note of it in this bug report. |