Created attachment 315498 [details] /var/log/messages from run with 'oom freeze/crash' Description of problem: Hard disk problem caused me to reinstall: F9->rawhide over the weekend. I noticed a hard freeze/crash with 0.290, and on rebooting noticed 'oom' messages as one process after another was murdered. Noticed 0.295 in koji; downloaded/installed that but experienced the same problem: oom freeze/crash after hours of running. In both cases, I had the system lightly loaded: a couple of gnome-terminal windows, rhythmbox playing, firefox just sitting there, etc. This time I captured /var/log/messages and /var/log/audit/audit.log for the run. I attach them below. System is Thinkpad X60 running full rawhide (except for 0.295). Some other debugging info useful? Version-Release number of selected component (if applicable): kernel-2.6.27-0.290.rc5.fc10.i686 kernel-2.6.27-0.295.rc5.git2.fc10.i686 How reproducible: Repeatable, but unpredictable Steps to Reproduce: 1. Boot to gnome 2. Bring up a few terminal windows, firefox, rhythmbox (playing from a mounted ntfs-3g partition), etc. 3. Go away/wait Actual results: Expected results: Additional info:
Created attachment 315499 [details] /var/log/audit/audit.log from run with 'oom freeze/crash'
Possibly related to the kmalloc-32 leak that a few people have reported. What does /sys/kernel/slab/kmalloc-32/alloc_calls look like?
What it is currently below. Not sure how to grab that "5 seconds before the system freezes" ;) [root@tlondon ~]# cat /sys/kernel/slab/kmalloc-32/alloc_calls 42 alternatives_smp_module_add+0x6a/0x110 age=4284403/4307242/4324988 pid=0-2769 cpus=0-1 1 mtrr_file_add+0x37/0x8c age=4284027 pid=2755 cpus=1 83 setup_modinfo_srcversion+0x17/0x20 age=4239398/4305803/4321090 pid=656-3325 cpus=0-1 300 load_module+0x14f9/0x1870 age=4239397/4305973/4321058 pid=656-3325 cpus=0-1 2 audit_log_start+0x1ae/0x277 age=2380831/3352400/4323970 pid=1-2255 cpus=1 20 request_irq+0x6b/0x117 age=4283750/4312806/4324781 pid=1-2755 cpus=0-1 3 devm_request_irq+0x34/0x7c age=4308608/4312714/4320927 pid=663-1124 cpus=1 1 krealloc+0x21/0x38 age=4301972 pid=2129 cpus=1 154 kmalloc_node+0xd/0xf age=4239398/4307504/4324994 pid=0-3325 cpus=0-1 1 add_swap_extent+0x6c/0xa6 age=4302997 pid=2047 cpus=0 16 dma_pool_alloc+0x5f/0x1ba age=4321698/4322949/4323354 pid=1 cpus=0 1 do_kern_mount+0x87/0xbf age=4257511 pid=3121 cpus=0 29 alloc_fdtable+0x3d/0xac age=5/3919798/4287097 pid=2254-4231 cpus=0-1 1 alloc_vfsmnt+0x7a/0xf2 age=4257513 pid=3121 cpus=0 1 bioset_create+0x22/0x95 age=4324796 pid=1 cpus=0 6 proc_reg_open+0x26/0xf8 age=1274274/3790836/4300621 pid=2232-5802 cpus=0-1 79 sysfs_new_dirent+0x2e/0xe0 age=4295740/4317324/4323962 pid=1-2455 cpus=0-1 2 __key_link+0x258/0x305 age=4181528/4221930/4262332 pid=2921-3493 cpus=1 4 avc_add_callback+0x20/0x54 age=4323915/4323915/4323915 pid=1 cpus=1 240 selinux_task_alloc_security+0x27/0x6d age=6/4161432/4324904 pid=0-4232 cpus=0-1 11 sel_netport_sid+0xff/0x1c5 age=3791935/4225075/4294841 pid=2499-3854 cpus=0-1 8895 ebitmap_read+0x103/0x1d0 age=4314001/4314048/4314640 pid=1 cpus=0-1 4326 ebitmap_cpy+0x39/0x96 age=4262373/4275955/4311990 pid=712-2921 cpus=0-1 3811530 ebitmap_set_bit+0x109/0x17c age=26/2085013/4299425 pid=2254-3054 cpus=0-1 85 hashtab_create+0x24/0x78 age=4314638/4314638/4314640 pid=1 cpus=0 181 read_cons_helper+0x96/0x15a age=4314638/4314639/4314640 pid=1 cpus=0 1033 type_read+0x6a/0xad age=4314631/4314634/4314638 pid=1 cpus=0 10 role_read+0x28/0x111 age=4314638/4314638/4314638 pid=1 cpus=0 4 perm_read+0x62/0xa9 age=4314638/4314639/4314640 pid=1 cpus=0 73 class_read+0x2b/0x1d2 age=4314638/4314638/4314640 pid=1 cpus=0 15 class_read+0x93/0x1d2 age=4314638/4314638/4314640 pid=1 cpus=0 1 policydb_read+0x7e/0x1024 age=4314640 pid=1 cpus=0 1 policydb_read+0x76d/0x1024 age=4314015 pid=1 cpus=1 1 policydb_read+0x79b/0x1024 age=4314015 pid=1 cpus=1 1 policydb_read+0xaee/0x1024 age=4314014 pid=1 cpus=1 2 policydb_read+0xc41/0x1024 age=4314014/4314014/4314014 pid=1 cpus=1 83 security_get_bools+0xb3/0x130 age=4313969/4313969/4313969 pid=1 cpus=1 135 cond_read_list+0x6c/0x1dc age=4314016/4314194/4314437 pid=1 cpus=0-1 83 cond_read_bool+0xaa/0xb5 age=4314631/4314631/4314631 pid=1 cpus=0 1 devcgroup_create+0x50/0x133 age=4324992 pid=0 cpus=0 21 register_blkdev+0x60/0xe4 age=4305715/4321809/4324409 pid=1-1877 cpus=0-1 50 kvasprintf+0x28/0x45 age=4301489/4318650/4323962 pid=1-2185 cpus=0-1 6 pci_save_state+0x7e/0x1ce age=4309568/4319282/4323910 pid=1-825 cpus=0-1 2 pcim_enable_device+0x40/0x84 age=4308608/4314768/4320928 pid=663-1124 cpus=1 1 pci_create_sysfs_dev_files+0x180/0x2ca age=4321686 pid=1 cpus=0 12 acpi_os_validate_address+0x31/0x95 age=4324696/4324736/4324752 pid=1 cpus=0 132 acpi_os_allocate_zeroed+0x34/0x36 age=4323770/4324659/4324766 pid=1 cpus=0 1 acpi_os_allocate_zeroed+0x34/0x36 age=4324695 pid=1 cpus=0 1 acpi_ut_initialize_buffer+0x58/0xd0 age=4309815 pid=934 cpus=1 30 acpi_add_single_object+0x969/0xc13 age=4324551/4324608/4324647 pid=1 cpus=0 54 acpi_ec_add_query_handler+0x24/0x72 age=4324548/4324548/4324548 pid=1 cpus=0 1 find_dock+0x23a/0x358 age=4324811 pid=1 cpus=0 9 find_dock_devices+0x52/0x79 age=4324811/4324811/4324812 pid=1 cpus=0 37 acpi_pci_irq_add_prt+0x150/0x2bf age=4324527/4324532/4324537 pid=1 cpus=0 [root@tlondon ~]#
Is it worth me collecting this, say, every 30 or 60 seconds?
3811530 ebitmap_set_bit+0x109/0x17c age=26/2085013/4299425 pid=2254-3054 cpus=0-1 that looks a bit suspect. Wonder what's making all those calls. Hmm. Will see if I can come up with some better instrumentation.
For completeness, here is another "dump" (I believe about 20 minutes later). Now up to 5002188. [root@tlondon ~]# cat /sys/kernel/slab/kmalloc-32/alloc_calls 42 alternatives_smp_module_add+0x6a/0x110 age=5467518/5490357/5508103 pid=0-2769 cpus=0-1 1 mtrr_file_add+0x37/0x8c age=5467142 pid=2755 cpus=1 83 setup_modinfo_srcversion+0x17/0x20 age=5422513/5488918/5504205 pid=656-3325 cpus=0-1 300 load_module+0x14f9/0x1870 age=5422512/5489088/5504173 pid=656-3325 cpus=0-1 2 audit_log_start+0x1ae/0x277 age=3563946/4535515/5507085 pid=1-2255 cpus=1 20 request_irq+0x6b/0x117 age=5466865/5495921/5507896 pid=1-2755 cpus=0-1 3 devm_request_irq+0x34/0x7c age=5491723/5495829/5504042 pid=663-1124 cpus=1 1 krealloc+0x21/0x38 age=5485087 pid=2129 cpus=1 154 kmalloc_node+0xd/0xf age=5422513/5490619/5508109 pid=0-3325 cpus=0-1 1 add_swap_extent+0x6c/0xa6 age=5486112 pid=2047 cpus=0 16 dma_pool_alloc+0x5f/0x1ba age=5504813/5506064/5506469 pid=1 cpus=0 1 do_kern_mount+0x87/0xbf age=5440626 pid=3121 cpus=0 28 alloc_fdtable+0x3d/0xac age=6/5155120/5470212 pid=2254-4231 cpus=0-1 1 alloc_vfsmnt+0x7a/0xf2 age=5440628 pid=3121 cpus=0 1 bioset_create+0x22/0x95 age=5507911 pid=1 cpus=0 5 proc_reg_open+0x26/0xf8 age=5468540/5477263/5483736 pid=2232-2755 cpus=0-1 79 sysfs_new_dirent+0x2e/0xe0 age=5478855/5500439/5507077 pid=1-2455 cpus=0-1 2 __key_link+0x258/0x305 age=5364643/5405045/5445447 pid=2921-3493 cpus=1 4 avc_add_callback+0x20/0x54 age=5507030/5507030/5507030 pid=1 cpus=1 237 selinux_task_alloc_security+0x27/0x6d age=6/5374722/5508019 pid=0-4232 cpus=0-1 11 sel_netport_sid+0xff/0x1c5 age=4975050/5408190/5477956 pid=2499-3854 cpus=0-1 8895 ebitmap_read+0x103/0x1d0 age=5497116/5497163/5497755 pid=1 cpus=0-1 4326 ebitmap_cpy+0x39/0x96 age=5445488/5459070/5495105 pid=712-2921 cpus=0-1 5002188 ebitmap_set_bit+0x109/0x17c age=47/2642918/5482540 pid=2254-3054 cpus=0-1 85 hashtab_create+0x24/0x78 age=5497753/5497753/5497755 pid=1 cpus=0 181 read_cons_helper+0x96/0x15a age=5497753/5497754/5497755 pid=1 cpus=0 1033 type_read+0x6a/0xad age=5497746/5497749/5497753 pid=1 cpus=0 10 role_read+0x28/0x111 age=5497753/5497753/5497753 pid=1 cpus=0 4 perm_read+0x62/0xa9 age=5497753/5497754/5497755 pid=1 cpus=0 73 class_read+0x2b/0x1d2 age=5497753/5497753/5497755 pid=1 cpus=0 15 class_read+0x93/0x1d2 age=5497753/5497753/5497755 pid=1 cpus=0 1 policydb_read+0x7e/0x1024 age=5497755 pid=1 cpus=0 1 policydb_read+0x76d/0x1024 age=5497130 pid=1 cpus=1 1 policydb_read+0x79b/0x1024 age=5497130 pid=1 cpus=1 1 policydb_read+0xaee/0x1024 age=5497129 pid=1 cpus=1 2 policydb_read+0xc41/0x1024 age=5497129/5497129/5497129 pid=1 cpus=1 83 security_get_bools+0xb3/0x130 age=5497084/5497084/5497084 pid=1 cpus=1 135 cond_read_list+0x6c/0x1dc age=5497131/5497309/5497552 pid=1 cpus=0-1 83 cond_read_bool+0xaa/0xb5 age=5497746/5497746/5497746 pid=1 cpus=0 1 devcgroup_create+0x50/0x133 age=5508107 pid=0 cpus=0 21 register_blkdev+0x60/0xe4 age=5488830/5504924/5507524 pid=1-1877 cpus=0-1 50 kvasprintf+0x28/0x45 age=5484604/5501765/5507077 pid=1-2185 cpus=0-1 6 pci_save_state+0x7e/0x1ce age=5492683/5502397/5507025 pid=1-825 cpus=0-1 2 pcim_enable_device+0x40/0x84 age=5491723/5497883/5504043 pid=663-1124 cpus=1 1 pci_create_sysfs_dev_files+0x180/0x2ca age=5504801 pid=1 cpus=0 12 acpi_os_validate_address+0x31/0x95 age=5507811/5507851/5507867 pid=1 cpus=0 132 acpi_os_allocate_zeroed+0x34/0x36 age=5506885/5507774/5507881 pid=1 cpus=0 1 acpi_os_allocate_zeroed+0x34/0x36 age=5507810 pid=1 cpus=0 1 acpi_ut_initialize_buffer+0x58/0xd0 age=5492930 pid=934 cpus=1 30 acpi_add_single_object+0x969/0xc13 age=5507666/5507723/5507762 pid=1 cpus=0 54 acpi_ec_add_query_handler+0x24/0x72 age=5507663/5507663/5507663 pid=1 cpus=0 1 find_dock+0x23a/0x358 age=5507926 pid=1 cpus=0 9 find_dock_devices+0x52/0x79 age=5507926/5507926/5507927 pid=1 cpus=0 37 acpi_pci_irq_add_prt+0x150/0x2bf age=5507642/5507647/5507652 pid=1 cpus=0 [root@tlondon ~]#
looks like the only callers of that function are in selinux. James & Eric, does that look normal ?
Created attachment 315501 [details] 60 calls of 'cat /sys/kernel/slab/kmalloc-32/alloc_calls' collected 'cat /sys/kernel/slab/kmalloc-32/alloc_calls' once a minute for about 29 minutes. about 40,000-50,000 calls per minute to ebitmap_set_bit
I'm not seeing ebitmpa_set_bit anyhere in any alloc_calls for 2.6.27-0.284.rc4.git6.fc10.x86_64, & will test an updated kernel.
The latest kernels don't boot on my system. "avcstat 1" output would be interesting.
Here is output just after booting when system was pretty idle (just 2 console windows, firefox, pidgiin, rhythmbox) [root@tlondon ~]# avcstat 1 lookups hits misses allocs reclaims frees 3663720 3657012 6708 6708 6176 6206 3369 3369 0 0 0 0 3179 3179 0 0 0 0 5247 5247 0 0 0 0 3277 3277 0 0 0 0 3833 3833 0 0 0 0 11134 11134 0 0 0 0 11350 11350 0 0 0 0 4953 4953 0 0 0 0 4026 4026 0 0 0 0 19115 19115 0 0 0 0 11503 11503 0 0 0 0 4636 4636 0 0 0 0 4031 4031 0 0 0 0 3227 3227 0 0 0 0 2937 2937 0 0 0 0 3122 3122 0 0 0 0 5918 5916 2 2 0 0 15918 15918 0 0 0 0 8532 8532 0 0 0 0 9718 9718 0 0 0 0 3261 3261 0 0 0 0 lookups hits misses allocs reclaims frees 5890 5890 0 0 0 0 4320 4320 0 0 0 0 4564 4564 0 0 0 0 2971 2971 0 0 0 0 3248 3248 0 0 0 0 3636 3636 0 0 0 0 4315 4315 0 0 0 0 3340 3340 0 0 0 0 4800 4800 0 0 0 0 4235 4221 14 14 16 0 3955 3955 0 0 0 16 4422 4422 0 0 0 0 3233 3233 0 0 0 0 3339 3339 0 0 0 0 4358 4358 0 0 0 0 3133 3133 0 0 0 0 4502 4502 0 0 0 0 3269 3269 0 0 0 0 4587 4587 0 0 0 0 3231 3226 5 5 0 0 4650 4650 0 0 0 0 4849 4849 0 0 0 0 lookups hits misses allocs reclaims frees 11062 11062 0 0 0 0 9856 9856 0 0 0 0 3623 3623 0 0 0 0 6557 6557 0 0 0 0 14470 14470 0 0 0 0 10160 10160 0 0 0 0 10263 10263 0 0 0 0 14610 14610 0 0 0 0 8794 8794 0 0 0 0 16743 16743 0 0 0 0 9094 9093 1 1 0 0 3892 3892 0 0 0 0 3582 3582 0 0 0 0 I'll set this up to run in the background. Anything else?
I get about 2000 avc/s with rhythmbox playing (otherwise it's < 1000), so it's not unusual, although still higher. It can jump over 10k avc/s when switching virtual desktops.
Came back from lunch-time walk and found another. /var/log/messages attached.
Created attachment 315584 [details] /var/log/messages showing another "oom slow suicide...." Not sure what is triggering this....
Created attachment 315585 [details] code review, this looks like a leaked context untested, uncompiled. Still working on getting a machine that reproduces.
In the meanwhile, I'll patch 0.295 and see if I can build/test.
ajax was kind enough to test on a system without SE-X windows and didn't see the leak. leads me to believe that my diagnosis that this is mainly a result of excessive use of selinuxfs interfaces and that patch will probably help/solve the leak. I'm looking for other similar places...
Looks correct, and bug was introduced by the deferred context mapping patch.
I believe I patched/built 0.295 with this fix. Here is output of "cat /sys/kernel/slab/kmalloc-32/alloc_calls" Don't see huge number for ebitmap_set_bit, in fact, I don't see that listed at all. Is there some other test/data ? tom [root@tlondon ~]# cat /sys/kernel/slab/kmalloc-32/alloc_calls 42 alternatives_smp_module_add+0x6a/0x110 age=549577/572758/589336 pid=0-2761 cpus=0-1 1 mtrr_file_add+0x37/0x8c age=549185 pid=2744 cpus=1 82 setup_modinfo_srcversion+0x17/0x20 age=549581/573084/584506 pid=675-2761 cpus=0-1 297 load_module+0x14f9/0x1870 age=549576/572722/584474 pid=675-2761 cpus=0-1 1 audit_log_start+0x1ae/0x277 age=588316 pid=1 cpus=1 20 request_irq+0x6b/0x117 age=548738/577684/589130 pid=1-2744 cpus=0-1 3 devm_request_irq+0x34/0x7c age=574789/577975/584348 pid=682-1100 cpus=1 1 krealloc+0x21/0x38 age=567793 pid=2122 cpus=1 168 kmalloc_node+0xd/0xf age=134655/535004/589342 pid=0-3725 cpus=0-1 1 add_swap_extent+0x6c/0xa6 age=568925 pid=2038 cpus=0 16 dma_pool_alloc+0x5f/0x1ba age=585133/587238/587701 pid=1 cpus=0 1 do_kern_mount+0x87/0xbf age=523565 pid=3061 cpus=0 30 alloc_fdtable+0x3d/0xac age=6/409558/552077 pid=2247-3799 cpus=0-1 1 alloc_vfsmnt+0x7a/0xf2 age=523567 pid=3061 cpus=0 1 bioset_create+0x22/0x95 age=589144 pid=1 cpus=0 6 proc_reg_open+0x26/0xf8 age=115344/485483/566123 pid=2225-3766 cpus=0-1 79 sysfs_new_dirent+0x2e/0xe0 age=560289/582126/588308 pid=1-2462 cpus=0-1 2 __key_link+0x258/0x305 age=142812/335345/527878 pid=2916-3683 cpus=0-1 4 avc_add_callback+0x20/0x54 age=588261/588261/588261 pid=1 cpus=1 246 selinux_task_alloc_security+0x27/0x6d age=6/513447/589251 pid=0-3916 cpus=0-1 16 sel_netport_sid+0xff/0x1c5 age=31772/406819/559228 pid=2506-3720 cpus=0-1 8895 ebitmap_read+0x103/0x1d0 age=580280/580320/580804 pid=1 cpus=0 4326 ebitmap_cpy+0x39/0x96 age=527919/541052/578026 pid=725-2916 cpus=0-1 85 hashtab_create+0x24/0x78 age=580801/580802/580804 pid=1 cpus=0 181 read_cons_helper+0x96/0x15a age=580801/580803/580804 pid=1 cpus=0 1033 type_read+0x6a/0xad age=580795/580798/580802 pid=1 cpus=0 10 role_read+0x28/0x111 age=580802/580802/580802 pid=1 cpus=0 4 perm_read+0x62/0xa9 age=580802/580803/580804 pid=1 cpus=0 73 class_read+0x2b/0x1d2 age=580801/580802/580804 pid=1 cpus=0 15 class_read+0x93/0x1d2 age=580801/580802/580803 pid=1 cpus=0 1 policydb_read+0x7e/0x1024 age=580801 pid=1 cpus=0 1 policydb_read+0x76d/0x1024 age=580296 pid=1 cpus=0 1 policydb_read+0x79b/0x1024 age=580296 pid=1 cpus=0 1 policydb_read+0xaee/0x1024 age=580295 pid=1 cpus=0 2 policydb_read+0xc41/0x1024 age=580295/580295/580295 pid=1 cpus=0 83 security_get_bools+0xb3/0x130 age=580199/580199/580199 pid=1 cpus=0 135 cond_read_list+0x6c/0x1dc age=580297/580469/580656 pid=1 cpus=0-1 83 cond_read_bool+0xaa/0xb5 age=580795/580795/580795 pid=1 cpus=0 1 devcgroup_create+0x50/0x133 age=589340 pid=0 cpus=0 21 register_blkdev+0x60/0xe4 age=571454/586288/588758 pid=1-1932 cpus=0-1 50 kvasprintf+0x28/0x45 age=567238/583347/588308 pid=1-2177 cpus=0-1 6 pci_save_state+0x7e/0x1ce age=575262/584081/588256 pid=1-842 cpus=0-1 2 pcim_enable_device+0x40/0x84 age=574790/579569/584349 pid=682-1100 cpus=1 1 pci_create_sysfs_dev_files+0x180/0x2ca age=585102 pid=1 cpus=0 12 acpi_os_validate_address+0x31/0x95 age=589045/589085/589101 pid=1 cpus=0 132 acpi_os_allocate_zeroed+0x34/0x36 age=588117/589008/589116 pid=1 cpus=0 1 acpi_os_allocate_zeroed+0x34/0x36 age=589044 pid=1 cpus=0 1 acpi_ut_initialize_buffer+0x58/0xd0 age=575408 pid=981 cpus=0 30 acpi_add_single_object+0x969/0xc13 age=588899/588956/588995 pid=1 cpus=0 54 acpi_ec_add_query_handler+0x24/0x72 age=588897/588897/588897 pid=1 cpus=0 1 find_dock+0x23a/0x358 age=589159 pid=1 cpus=0 9 find_dock_devices+0x52/0x79 age=589159/589159/589160 pid=1 cpus=0 37 acpi_pci_irq_add_prt+0x150/0x2bf age=588876/588881/588886 pid=1 cpus=0 20 rand_initialize_disk+0x1a/0x27 age=572082/585901/587717 pid=1-1784 cpus=0-1 1 agpioc_acquire_wrap+0x148/0x157 age=549710 pid=2744 cpus=1 2 kobj_map_init+0x33/0x7e age=589144/589242/589341 pid=0-1 cpus=0 [root@tlondon ~]#
I left my laptop running all night and experienced no problems: [root@tlondon ~]# uptime 06:53:06 up 11:05, 3 users, load average: 1.23, 0.76, 0.50 [root@tlondon ~]# cat /sys/kernel/slab/kmalloc-32/alloc_calls | grep ebit 8895 ebitmap_read+0x103/0x1d0 age=39934905/39934944/39935425 pid=1 cpus=0-1 4326 ebitmap_cpy+0x39/0x96 age=39880680/39894994/39932707 pid=706-2864 cpus=0-1 [root@tlondon ~]# Really appreciate the quick review/patch. Thanks!
Er, look at that patch: - if (rc) - context_destroy(&context); out: + context_destroy(&context); You've removed the conditional call to context_destroy, which now can mean it's passed a NULL pointer. That NULL can exist if sidtab_context_to_sid returns NULL, which it can do, and then context_destroy will happily go poking it, resulting an an oops/panic. You'll want to check for rc on the out path before sending this upstream. Jon.
Re-opening as the patch is not correct.
no try again, context is on the stack. no panic's here. context_destroy is safe to run twice. eventual patch upstream was fine, and even the early version of the patch commited to rawhide (although not the same) is fine. no issues here..... reclosing, if I can't read code reopen an explain better.
Ignore me, I was clearly smoking something on Friday afternoon. Sorry, I skimmed the code and decided something was wrong before I turned my brain back on ;)