Bug 233157

Summary: Kernel memory leak in audit subsystem
Product: Red Hat Enterprise Linux 5 Reporter: Steve Grubb <sgrubb>
Component: kernelAssignee: Alexander Viro <aviro>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: medium    
Version: 5.0CC: aviro, eparis, iboverma, sgrubb
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2007-0169 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-30 16:38:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 228409    
Bug Blocks:    

Description Steve Grubb 2007-03-20 18:28:55 UTC
+++ This bug was initially created as a clone of Bug #228409 +++

Description of problem:
When running regular ipsec in upstream linux-2.6.20, after a few minutes of
sending streams of packets, the kernel crashes. 

Version-Release number of selected component (if applicable):
linux-2.6.20

How reproducible:
Happens every time.

Steps to Reproduce:
1.Configure regular ipsec on machines, A & B.
In /etc/racoon/racoon.conf

path include "/etc/racoon";
path pre_shared_key "/etc/racoon/psk.txt";
path certificate "/etc/racoon/certs";

remote anonymous
{
        exchange_mode main,aggressive;
        doi ipsec_doi;
        situation identity_only;

        my_identifier address;

        lifetime time 10 minutes;   # sec,min,hour
        initial_contact on;
        proposal_check obey;    # obey, strict or claim


        proposal {
                encryption_algorithm 3des;
                hash_algorithm sha1;
                authentication_method pre_shared_key ;
                dh_group 2 ;
        }
}

sainfo anonymous
{
        pfs_group 2;
        lifetime time 3 minutes ;
        encryption_algorithm 3des, blowfish 448, rijndael ;
        authentication_algorithm hmac_sha1, hmac_md5 ;
        compression_algorithm deflate ;
}

In /etc/racoon/psk.txt:
10.1.1.2                      flibbertigibbet
10.1.1.3                      flibbertigibbet

On machine A:
# echo "spdadd 10.1.1.2 10.1.1.3 any -P in ipsec esp/transport//require; spdadd
10.1.1.3 10.1.1.2 any -P out ipsec esp/transport//require;" | setkey -c

# racoon

On machine B:
# echo "spdadd 10.1.1.2 10.1.1.3 any -P out ipsec esp/transport//require; spdadd
10.1.1.3 10.1.1.2 any -P in ipsec esp/transport//require;" | setkey -c

# racoon

2. Now that ipsec is configured, do a ping to ensure connection is up.

3. Let ping run for awhile (about 3 minutes, time it takes for a new re-key),
and eventually system will crash.

64 bytes from 9.3.189.55: icmp_seq=175 ttl=63 time=0.978 ms
64 bytes from 9.3.189.55: icmp_seq=176 ttl=63 time=0.885 ms
64 bytes from 9.3.189.55: icmp_seq=177 ttl=63 time=0.793 ms
64 bytes from 9.3.189.55: icmp_seq=178 ttl=63 time=0.710 ms
64 bytes from 9.3.189.55: icmp_seq=179 ttl=63 time=0.724 ms
BUG: scheduling while atomic: swapper/0x10000200/0
Call Trace:
[C00000000FFFF860] [C00000000000F808] .show_stack+0x68/0x1b0 (unreliable)
[C00000000FFFF900] [C00000000035CB04] .schedule+0xac/0xd0c
[C00000000FFFFA10] [C000000000063070] .__cond_resched+0x24/0x50
[C00000000FFFFA90] [C00000000035D844] .cond_resched+0x48/0x60
[C00000000FFFFB10] [C0000000000DB254] .__kmalloc+0x6c/0x154
[C00000000FFFFBB0] [C0000000000A4890] .audit_log_task_context+0x88/0x128
[C00000000FFFFC50] [C000000000342A68] .xfrm_audit_log+0x148/0x36c
[C00000000FFFFDB0] [C0000000003491C8] .xfrm_timer_handler+0x22c/0x280
[C00000000FFFFE40] [C000000000077578] .run_timer_softirq+0x194/0x264
[C00000000FFFFEF0] [C0000000000716E8] .__do_softirq+0xa8/0x164
[C00000000FFFFF90] [C000000000027740] .call_do_softirq+0x14/0x24
[C000000000593910] [C00000000000C1E8] .do_softirq+0x68/0xac
[C0000000005939A0] [C0000000000717F8] .irq_exit+0x54/0x6c
[C000000000593A20] [C000000000024904] .timer_interrupt+0x478/0x4c4
[C000000000593B00] [C000000000003608] decrementer_common+0x108/0x180
--- Exception: 901 at .local_irq_restore+0x3c/0x40
    LR = .cpu_idle+0x114/0x1e0
[C000000000593DF0] [C000000000011CD4] .cpu_idle+0x108/0x1e0 (unreliable)
[C000000000593E70] [C000000000009200] .rest_init+0x44/0x5c
[C000000000593EF0] [C000000000430918] .start_kernel+0x354/0x370
[C000000000593F90] [C000000000008528] .start_here_common+0x54/0xac
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0xc00000000ffff8a0
cpu 0x0: Vector: 400 (Instruction Access) at [c00000000ffff5f0]
    pc: c00000000ffff8a0
    lr: c00000000ffff8a0
    sp: c00000000ffff870
   msr: 8000000010009032
  current = 0xc0000000004a4840
  paca    = 0xc0000000004a5100
    pid   = 0, comm = swapper
enter ? for help
[link register   ] c00000000ffff8a0
[c00000000ffff870] c000000000010048 .__switch_to+0x12c/0x160 (unreliable)
[c00000000ffff900] ffffffffe0000000
[c00000000ffffa10] c0000000005533a0
[c00000000ffffac0] 0000000000128000
SP (1) is in userspace
0:mon> t
[link register   ] c00000000ffff8a0
[c00000000ffff870] c000000000010048 .__switch_to+0x12c/0x160 (unreliable)
[c00000000ffff900] ffffffffe0000000
[c00000000ffffa10] c0000000005533a0
[c00000000ffffac0] 0000000000128000
SP (1) is in userspace
0:mon>

-- Additional comment from sgrubb on 2007-02-12 18:28 EST --
Al do you think audit_log_task_context() needs to take the task and memory pool
as passed parameters?

-- Additional comment from aviro on 2007-02-13 07:24 EST --
Either that, or just make that allocation GFP_ATOMIC unconditionally.
BTW, the use of getprocattr() is an atrocity wrt allocations; we
end up calculating size, calling selinux_getsecurity(), calling
security_sid_to_context() that does allocation (atomic) and puts
the string there; then we free what we'd allocated, return size, do
allocation in audit_log_task_context(), get through exactly the same
work *again* (including recalculation of size and atomic allocation),
copy string from atomically allocated into what we'd allocated in
audit_log_task_context() and free atomically allocated.  Revolting.

What we need is an analog of getprocattr (and getsecurity) that would
_not_ take buffer+len as an argument but just return whatever
security_sid_to_context() allocated and filled.  Simple and sane...

Comment 9 Don Howard 2007-03-29 21:41:49 UTC
A patch for this issue has been included in zstream build 2.6.18-8.1.2.el5.

Comment 12 Mike Gahagan 2007-04-27 18:27:36 UTC
So far, I'm not able to reproduce the crash with the testcase, however we can
verify that the fix is in the 8.1.3 kernel.


Comment 14 Red Hat Bugzilla 2007-04-30 16:38:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2007-0169.html