Red Hat Bugzilla – Bug 233157
Kernel memory leak in audit subsystem
Last modified: 2007-11-30 17:07:42 EST
+++ This bug was initially created as a clone of Bug #228409 +++ Description of problem: When running regular ipsec in upstream linux-2.6.20, after a few minutes of sending streams of packets, the kernel crashes. Version-Release number of selected component (if applicable): linux-2.6.20 How reproducible: Happens every time. Steps to Reproduce: 1.Configure regular ipsec on machines, A & B. In /etc/racoon/racoon.conf path include "/etc/racoon"; path pre_shared_key "/etc/racoon/psk.txt"; path certificate "/etc/racoon/certs"; remote anonymous { exchange_mode main,aggressive; doi ipsec_doi; situation identity_only; my_identifier address; lifetime time 10 minutes; # sec,min,hour initial_contact on; proposal_check obey; # obey, strict or claim proposal { encryption_algorithm 3des; hash_algorithm sha1; authentication_method pre_shared_key ; dh_group 2 ; } } sainfo anonymous { pfs_group 2; lifetime time 3 minutes ; encryption_algorithm 3des, blowfish 448, rijndael ; authentication_algorithm hmac_sha1, hmac_md5 ; compression_algorithm deflate ; } In /etc/racoon/psk.txt: 10.1.1.2 flibbertigibbet 10.1.1.3 flibbertigibbet On machine A: # echo "spdadd 10.1.1.2 10.1.1.3 any -P in ipsec esp/transport//require; spdadd 10.1.1.3 10.1.1.2 any -P out ipsec esp/transport//require;" | setkey -c # racoon On machine B: # echo "spdadd 10.1.1.2 10.1.1.3 any -P out ipsec esp/transport//require; spdadd 10.1.1.3 10.1.1.2 any -P in ipsec esp/transport//require;" | setkey -c # racoon 2. Now that ipsec is configured, do a ping to ensure connection is up. 3. Let ping run for awhile (about 3 minutes, time it takes for a new re-key), and eventually system will crash. 64 bytes from 9.3.189.55: icmp_seq=175 ttl=63 time=0.978 ms 64 bytes from 9.3.189.55: icmp_seq=176 ttl=63 time=0.885 ms 64 bytes from 9.3.189.55: icmp_seq=177 ttl=63 time=0.793 ms 64 bytes from 9.3.189.55: icmp_seq=178 ttl=63 time=0.710 ms 64 bytes from 9.3.189.55: icmp_seq=179 ttl=63 time=0.724 ms BUG: scheduling while atomic: swapper/0x10000200/0 Call Trace: [C00000000FFFF860] [C00000000000F808] .show_stack+0x68/0x1b0 (unreliable) [C00000000FFFF900] [C00000000035CB04] .schedule+0xac/0xd0c [C00000000FFFFA10] [C000000000063070] .__cond_resched+0x24/0x50 [C00000000FFFFA90] [C00000000035D844] .cond_resched+0x48/0x60 [C00000000FFFFB10] [C0000000000DB254] .__kmalloc+0x6c/0x154 [C00000000FFFFBB0] [C0000000000A4890] .audit_log_task_context+0x88/0x128 [C00000000FFFFC50] [C000000000342A68] .xfrm_audit_log+0x148/0x36c [C00000000FFFFDB0] [C0000000003491C8] .xfrm_timer_handler+0x22c/0x280 [C00000000FFFFE40] [C000000000077578] .run_timer_softirq+0x194/0x264 [C00000000FFFFEF0] [C0000000000716E8] .__do_softirq+0xa8/0x164 [C00000000FFFFF90] [C000000000027740] .call_do_softirq+0x14/0x24 [C000000000593910] [C00000000000C1E8] .do_softirq+0x68/0xac [C0000000005939A0] [C0000000000717F8] .irq_exit+0x54/0x6c [C000000000593A20] [C000000000024904] .timer_interrupt+0x478/0x4c4 [C000000000593B00] [C000000000003608] decrementer_common+0x108/0x180 --- Exception: 901 at .local_irq_restore+0x3c/0x40 LR = .cpu_idle+0x114/0x1e0 [C000000000593DF0] [C000000000011CD4] .cpu_idle+0x108/0x1e0 (unreliable) [C000000000593E70] [C000000000009200] .rest_init+0x44/0x5c [C000000000593EF0] [C000000000430918] .start_kernel+0x354/0x370 [C000000000593F90] [C000000000008528] .start_here_common+0x54/0xac Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0xc00000000ffff8a0 cpu 0x0: Vector: 400 (Instruction Access) at [c00000000ffff5f0] pc: c00000000ffff8a0 lr: c00000000ffff8a0 sp: c00000000ffff870 msr: 8000000010009032 current = 0xc0000000004a4840 paca = 0xc0000000004a5100 pid = 0, comm = swapper enter ? for help [link register ] c00000000ffff8a0 [c00000000ffff870] c000000000010048 .__switch_to+0x12c/0x160 (unreliable) [c00000000ffff900] ffffffffe0000000 [c00000000ffffa10] c0000000005533a0 [c00000000ffffac0] 0000000000128000 SP (1) is in userspace 0:mon> t [link register ] c00000000ffff8a0 [c00000000ffff870] c000000000010048 .__switch_to+0x12c/0x160 (unreliable) [c00000000ffff900] ffffffffe0000000 [c00000000ffffa10] c0000000005533a0 [c00000000ffffac0] 0000000000128000 SP (1) is in userspace 0:mon> -- Additional comment from sgrubb@redhat.com on 2007-02-12 18:28 EST -- Al do you think audit_log_task_context() needs to take the task and memory pool as passed parameters? -- Additional comment from aviro@redhat.com on 2007-02-13 07:24 EST -- Either that, or just make that allocation GFP_ATOMIC unconditionally. BTW, the use of getprocattr() is an atrocity wrt allocations; we end up calculating size, calling selinux_getsecurity(), calling security_sid_to_context() that does allocation (atomic) and puts the string there; then we free what we'd allocated, return size, do allocation in audit_log_task_context(), get through exactly the same work *again* (including recalculation of size and atomic allocation), copy string from atomically allocated into what we'd allocated in audit_log_task_context() and free atomically allocated. Revolting. What we need is an analog of getprocattr (and getsecurity) that would _not_ take buffer+len as an argument but just return whatever security_sid_to_context() allocated and filled. Simple and sane...
A patch for this issue has been included in zstream build 2.6.18-8.1.2.el5.
So far, I'm not able to reproduce the crash with the testcase, however we can verify that the fix is in the 8.1.3 kernel.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2007-0169.html