Description of problem: When running regular ipsec in upstream linux-2.6.20, after a few minutes of sending streams of packets, the kernel crashes. Version-Release number of selected component (if applicable): linux-2.6.20 How reproducible: Happens every time. Steps to Reproduce: 1.Configure regular ipsec on machines, A & B. In /etc/racoon/racoon.conf path include "/etc/racoon"; path pre_shared_key "/etc/racoon/psk.txt"; path certificate "/etc/racoon/certs"; remote anonymous { exchange_mode main,aggressive; doi ipsec_doi; situation identity_only; my_identifier address; lifetime time 10 minutes; # sec,min,hour initial_contact on; proposal_check obey; # obey, strict or claim proposal { encryption_algorithm 3des; hash_algorithm sha1; authentication_method pre_shared_key ; dh_group 2 ; } } sainfo anonymous { pfs_group 2; lifetime time 3 minutes ; encryption_algorithm 3des, blowfish 448, rijndael ; authentication_algorithm hmac_sha1, hmac_md5 ; compression_algorithm deflate ; } In /etc/racoon/psk.txt: 10.1.1.2 flibbertigibbet 10.1.1.3 flibbertigibbet On machine A: # echo "spdadd 10.1.1.2 10.1.1.3 any -P in ipsec esp/transport//require; spdadd 10.1.1.3 10.1.1.2 any -P out ipsec esp/transport//require;" | setkey -c # racoon On machine B: # echo "spdadd 10.1.1.2 10.1.1.3 any -P out ipsec esp/transport//require; spdadd 10.1.1.3 10.1.1.2 any -P in ipsec esp/transport//require;" | setkey -c # racoon 2. Now that ipsec is configured, do a ping to ensure connection is up. 3. Let ping run for awhile (about 3 minutes, time it takes for a new re-key), and eventually system will crash. 64 bytes from 9.3.189.55: icmp_seq=175 ttl=63 time=0.978 ms 64 bytes from 9.3.189.55: icmp_seq=176 ttl=63 time=0.885 ms 64 bytes from 9.3.189.55: icmp_seq=177 ttl=63 time=0.793 ms 64 bytes from 9.3.189.55: icmp_seq=178 ttl=63 time=0.710 ms 64 bytes from 9.3.189.55: icmp_seq=179 ttl=63 time=0.724 ms BUG: scheduling while atomic: swapper/0x10000200/0 Call Trace: [C00000000FFFF860] [C00000000000F808] .show_stack+0x68/0x1b0 (unreliable) [C00000000FFFF900] [C00000000035CB04] .schedule+0xac/0xd0c [C00000000FFFFA10] [C000000000063070] .__cond_resched+0x24/0x50 [C00000000FFFFA90] [C00000000035D844] .cond_resched+0x48/0x60 [C00000000FFFFB10] [C0000000000DB254] .__kmalloc+0x6c/0x154 [C00000000FFFFBB0] [C0000000000A4890] .audit_log_task_context+0x88/0x128 [C00000000FFFFC50] [C000000000342A68] .xfrm_audit_log+0x148/0x36c [C00000000FFFFDB0] [C0000000003491C8] .xfrm_timer_handler+0x22c/0x280 [C00000000FFFFE40] [C000000000077578] .run_timer_softirq+0x194/0x264 [C00000000FFFFEF0] [C0000000000716E8] .__do_softirq+0xa8/0x164 [C00000000FFFFF90] [C000000000027740] .call_do_softirq+0x14/0x24 [C000000000593910] [C00000000000C1E8] .do_softirq+0x68/0xac [C0000000005939A0] [C0000000000717F8] .irq_exit+0x54/0x6c [C000000000593A20] [C000000000024904] .timer_interrupt+0x478/0x4c4 [C000000000593B00] [C000000000003608] decrementer_common+0x108/0x180 --- Exception: 901 at .local_irq_restore+0x3c/0x40 LR = .cpu_idle+0x114/0x1e0 [C000000000593DF0] [C000000000011CD4] .cpu_idle+0x108/0x1e0 (unreliable) [C000000000593E70] [C000000000009200] .rest_init+0x44/0x5c [C000000000593EF0] [C000000000430918] .start_kernel+0x354/0x370 [C000000000593F90] [C000000000008528] .start_here_common+0x54/0xac Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0xc00000000ffff8a0 cpu 0x0: Vector: 400 (Instruction Access) at [c00000000ffff5f0] pc: c00000000ffff8a0 lr: c00000000ffff8a0 sp: c00000000ffff870 msr: 8000000010009032 current = 0xc0000000004a4840 paca = 0xc0000000004a5100 pid = 0, comm = swapper enter ? for help [link register ] c00000000ffff8a0 [c00000000ffff870] c000000000010048 .__switch_to+0x12c/0x160 (unreliable) [c00000000ffff900] ffffffffe0000000 [c00000000ffffa10] c0000000005533a0 [c00000000ffffac0] 0000000000128000 SP (1) is in userspace 0:mon> t [link register ] c00000000ffff8a0 [c00000000ffff870] c000000000010048 .__switch_to+0x12c/0x160 (unreliable) [c00000000ffff900] ffffffffe0000000 [c00000000ffffa10] c0000000005533a0 [c00000000ffffac0] 0000000000128000 SP (1) is in userspace 0:mon>
Al do you think audit_log_task_context() needs to take the task and memory pool as passed parameters?
Either that, or just make that allocation GFP_ATOMIC unconditionally. BTW, the use of getprocattr() is an atrocity wrt allocations; we end up calculating size, calling selinux_getsecurity(), calling security_sid_to_context() that does allocation (atomic) and puts the string there; then we free what we'd allocated, return size, do allocation in audit_log_task_context(), get through exactly the same work *again* (including recalculation of size and atomic allocation), copy string from atomically allocated into what we'd allocated in audit_log_task_context() and free atomically allocated. Revolting. What we need is an analog of getprocattr (and getsecurity) that would _not_ take buffer+len as an argument but just return whatever security_sid_to_context() allocated and filled. Simple and sane...
Patch for this problem has been found to show no problems in the LSPP kernel. Submitted internally on 3/20 Moving to POST
*** Bug 231690 has been marked as a duplicate of this bug. ***
This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST.
in 2.6.18-13.el5
changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jon.thomas.com ------- Additional Comments From jon.thomas.com (prefers email at jrthomas.com) 2007-08-01 13:29 EDT ------- Joy, Have you tested 2.6.18-13.el5 or rhel5.1beta1 tosee if this is fixed? If so, please close this bug.
yes, i recalled testing this ok in last lspp kernel.
changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ACCEPTED |CLOSED ------- Additional Comments From jon.thomas.com (prefers email at jrthomas.com) 2007-08-01 15:22 EDT ------- Closing based on the last comment
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html