Bug 235475 - LSPP: Panic when running IPSEC labeled loopback on LSPP kernel
Summary: LSPP: Panic when running IPSEC labeled loopback on LSPP kernel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: James Morris
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: RHEL5LSPPCertTracker
TreeView+ depends on / blocked
 
Reported: 2007-04-06 00:06 UTC by Joe Nall
Modified: 2007-11-30 22:07 UTC (History)
9 users (show)

Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-07 19:46:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Test script (881 bytes, text/plain)
2007-04-06 00:11 UTC, Joe Nall
no flags Details
setrans.conf (24.55 KB, text/plain)
2007-04-06 06:01 UTC, Joe Nall
no flags Details
Better test script (5.29 KB, text/plain)
2007-04-06 15:17 UTC, Joe Nall
no flags Details
localhost/eth0 init.d ipsec configuration script (2.12 KB, application/octet-stream)
2007-04-10 01:29 UTC, Joe Nall
no flags Details
ipsec-tools loopback patch (1.66 KB, text/plain)
2007-04-10 03:31 UTC, Joe Nall
no flags Details
src rpm (682.04 KB, application/x-rpm)
2007-04-11 02:58 UTC, Joe Nall
no flags Details
/etc/racoon/racoon.conf (414 bytes, text/plain)
2007-04-11 03:47 UTC, Joe Nall
no flags Details
setkey -D (15.79 KB, text/plain)
2007-04-11 16:51 UTC, Joe Nall
no flags Details
Attached contents of my racoon.conf, psk.txt, and ipsec policy for labeled ipsec over loopback (1.76 KB, text/plain)
2007-04-11 19:50 UTC, Joy Latten
no flags Details
patch for xfrm_send_acquire() (1.25 KB, patch)
2007-04-12 19:57 UTC, Joy Latten
no flags Details | Diff
Patch for socket leak (551 bytes, patch)
2007-04-14 17:15 UTC, Joe Nall
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0959 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5 Update 1 2007-11-08 00:47:37 UTC

Description Joe Nall 2007-04-06 00:06:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3

Description of problem:
Kernel panic when using LSPP kernel, patched ipsec-tools for labeled loopback

Version-Release number of selected component (if applicable):
kernel-2.6.18-8.1.1.lspp.72.el5

How reproducible:
Always


Steps to Reproduce:
install mls policy
install attached /etc/selinux/mls/setrans.conf
switch to mls/permissive
enable IPSec based labeled networking for localhost
install xinetd
enable discard service in xinetd
run multiple copies of attached testnet script


Actual Results:
kernel BUG at net/xfrm/xfrm_user.c:1781!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /class/usb_device/usbdev1.6/dev
Modules linked in: xfrm4_mode_transport esp4 autofs4 hidp rfcomm l2cap bluetooth deflate zlib_deflate twofish serpent aes blowfish des sha256 md5 crypto_null af_key sunrpc ipv6 dm_multipath video sbs i2c_ec i2c_core button battery asus_acpi ac lp sg snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm parport_pc parport i82875p_edac serio_raw snd_timer snd pcspkr edac_mc tg3 soundcore snd_page_alloc ide_cd cdrom dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<c06084cc>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18-8.1.1.lspp.72.el5 #1)
EIP is at xfrm_send_acquire+0x1fd/0x22e
eax: f5424da0   ebx: f5424db0   ecx: 00000000   edx: f1c659cc
esi: f21c8e04   edi: f5424e10   ebp: f56f8c68   esp: f56f8c40
ds: 007b   es: 007b   ss: 0068
Process icm (pid: 4466, ti=f56f8000 task=f1cdeab0 task.ti=f56f8000)
Stack: f5424da0 f6ed903c f6ed9130 f21c8d84 f1c659cc 00000029 f5424da0 c06b5c60
       ffffffea f6ed903c f56f8c88 c06042ad 00000001 f6ed9130 f21c8d84 c06906e0
       f21c8d84 f7ea4b40 f56f8cf8 c0605994 00028f0b f56f8e48 f56f8e54 f56f8e50
Call Trace:
[<c04051ff>] show_trace_log_lvl+0x12/0x25
[<c040529c>] show_stack_log_lvl+0x8a/0x95
[<c04053d4>] show_registers+0x12d/0x19a
[<c04055d1>] die+0x190/0x293
[<c06143da>] do_trap+0x7c/0x96
[<c0405dfe>] do_invalid_op+0x89/0x93
[<c0404c09>] error_code+0x39/0x40
[<c06042ad>] km_query+0x35/0x64
[<c0605994>] xfrm_state_find+0x83c/0xb2b
[<c0602445>] xfrm_tmpl_resolve+0xd0/0x180
[<c0602698>] xfrm_lookup+0x14f/0x3fa
[<c05d0480>] ip_route_output_flow+0x4b/0x54
[<c05e9e5d>] tcp_v4_connect+0x195/0x686
[<c05f3a86>] inet_stream_connect+0x83/0x206
[<c05af839>] sys_connect+0x67/0x84
[<c05afeab>] sys_socketcall+0x8c/0x186
[<c0403fd7>] syscall_call+0x7/0xb
=======================
Code: 68 f1 8e 63 c0 68 ef c8 62 c0 e8 7a c7 e1 ff e8 58 d3 df ff 83 c4 14 eb 12 8b 55 e8 89 42 60 03 82 a0 00 00 00 89 82 a4 00 00 00 <0f> 0b f5 06 e5 55 66 c0 8b 45 e8 31 c9 c7 40 44 01 00 00 00 89
EIP: [<c06084cc>] xfrm_send_acquire+0x1fd/0x22e SS:ESP 0068:f56f8c40
<0>Kernel panic - not syncing: Fatal exception in interrupt


Expected Results:


Additional info:

Comment 1 Joe Nall 2007-04-06 00:11:46 UTC
Created attachment 151821 [details]
Test script

Comment 2 Joe Nall 2007-04-06 06:01:21 UTC
Created attachment 151825 [details]
setrans.conf

Comment 3 Joe Nall 2007-04-06 15:17:03 UTC
Created attachment 151880 [details]
Better test script

This test script adds two large context translations to the existing
setrans.conf using semanage and then uses netcat to panic the system by
connecting to the discard service at the new context.

Comment 4 George C. Wilson 2007-04-09 20:43:27 UTC
No target date yet for this one.

Comment 5 George C. Wilson 2007-04-09 20:45:35 UTC
This is an blocker issue for the HP evaluation.

Comment 6 James Morris 2007-04-10 00:10:01 UTC
Joe, where are the userland patches for loopback ipsec, and what is your ipsec
configuration ?

Comment 7 Eric Paris 2007-04-10 01:06:38 UTC
see BZ 218386 for the loopback patches to ipsec-tools, i think the patches are
against the upstream ipsec-tools so it might be easiest for you to start just
trying to reproduce on a rawhide machine.

All of the networking changes in the LSPP kernel are in the latest upstream
kernels.  Obviously upstream has more changes than the LSPP kernels, but all the
LSPP networking changes are in.  (There a number of audit changes in LSPP that
are not upstream but I don't have any reason to think those are the problem ATM.
 If you start to feel that way I believe all the audit patches in the the src
rpm will apply to an upstream kernel.)

In an offline discussion joy seemed to indicate that she didn't think
ipsec-tools ever exercised this code path.  Since joe seemed to indicate that he
was using a real mash-up of a system (packages from FC6, FC7 and RHEL5 on the
same machine) I was wondering if you were running some other ipsec userspace
program at the same time (aka is openswan doing something at the same time?). 
Is that possible?

Comment 8 Joe Nall 2007-04-10 01:29:45 UTC
Created attachment 152078 [details]
localhost/eth0 init.d ipsec configuration script

Comment 9 James Morris 2007-04-10 01:50:12 UTC
Joy's patch does not apply to current cvs, and it's not entirely clear what the
correct code is supposed to look like.  I tried to get a copy of the patch from
the ipsec-tools-devel archive, but their archives are broken and it appears that
there was no inline version of the patch.

Please provide a patch which applies to current cvs.

Comment 10 Joe Nall 2007-04-10 03:31:22 UTC
Created attachment 152084 [details]
ipsec-tools loopback patch

This is the racoon patch I used. It applies cleanly against 0.6.5 and 0.6.6. It
differs significantly from the one that Joy submitted upstream. I replicated
the panic with this patch applied to ipsec-tools-0.6.6,
kernel-2.6.18-8.1.1.lspp.72.el5 and the supplied configuration script.

Comment 11 James Morris 2007-04-11 00:07:38 UTC
I'm unable to get racoon to negotiate an SA via loopback with this patch.  I've
verified it works with static keying.  What is in your key.conf file ?  I don't
have that on my system at all (FC6 rawhide).

All I see is a larval SA, then racoon eventually gives up with:

Apr 10 20:00:39 xeon racoon: ERROR: phase1 negotiation failed due to time up.
18e37efc2d266cb1:0000000000000000 
Apr 10 20:00:48 xeon racoon: ERROR: phase2 negotiation failed due to time up
waiting for phase1. ESP 127.0.0.1[500]->127.0.0.1[500] 

This is with the ispec-tools 0.6.5 srpm patched with your patch & rebuilt.

Can you supply an srpm with your patch integrated ?  Then we'll be closer to
testing the same thing.



Comment 12 Joe Nall 2007-04-11 02:52:30 UTC
Did you use the init script in comment #8?


Comment 13 Joe Nall 2007-04-11 02:58:41 UTC
Created attachment 152194 [details]
src rpm

Requested source rpm built off of fc7. Requires config from comment #8.

Comment 14 James Morris 2007-04-11 03:15:29 UTC
(In reply to comment #12)
> Did you use the init script in comment #8?
> 

It won't work on my development system as-is, but what I have is pretty close. 
I don't know what your key.conf file is, and there is not one on my system.

Comment 15 James Morris 2007-04-11 03:23:09 UTC
What does your racoon.conf look like?

Comment 16 Joe Nall 2007-04-11 03:47:01 UTC
Created attachment 152196 [details]
/etc/racoon/racoon.conf

Comment 17 Joe Nall 2007-04-11 05:41:12 UTC
Duplicated on lspp.73 X86_64

Comment 18 Joe Nall 2007-04-11 05:44:01 UTC
My key.conf is empty.

Comment 19 James Morris 2007-04-11 16:28:33 UTC
(In reply to comment #16)
> Created an attachment (id=152196) [edit]
> /etc/racoon/racoon.conf
> 

With this configuration, I don't see any negotiation happen:

Apr 11 12:24:32  xeon racoon: ERROR: no configuration found for 127.0.0.1. 
Apr 11 12:24:32  xeon racoon: ERROR: failed to begin ipsec sa negotication.

Can you post the output of /var/log/messages from when racoon starts, and also
tcpdump -i lo and setkey -D if/when the SA is established.



Comment 20 Joe Nall 2007-04-11 16:40:01 UTC
Did you do this from the setup script?

    echo 0 > /proc/sys/net/ipv4/conf/lo/disable_xfrm
    echo 0 > /proc/sys/net/ipv4/conf/lo/disable_policy
    /sbin/setkey -F -FP
    /sbin/setkey -c <<EoF
spdadd 127.0.0.1 127.0.0.1 any
-ctx 1 1 "system_u:object_r:ipsec_spd_t:s0-s15:c0.c1023"
-P out ipsec
esp/transport//require;

spdadd 127.0.0.1 127.0.0.1 any
-ctx 1 1 "system_u:object_r:ipsec_spd_t:s0-s15:c0.c1023"
-P in ipsec
esp/transport//require;

EoF


Comment 21 James Morris 2007-04-11 16:44:45 UTC
(In reply to comment #20)
> Did you do this from the setup script?
> 
>     echo 0 > /proc/sys/net/ipv4/conf/lo/disable_xfrm
>     echo 0 > /proc/sys/net/ipv4/conf/lo/disable_policy
>     /sbin/setkey -F -FP
>     /sbin/setkey -c <<EoF
> spdadd 127.0.0.1 127.0.0.1 any
> -ctx 1 1 "system_u:object_r:ipsec_spd_t:s0-s15:c0.c1023"
> -P out ipsec
> esp/transport//require;
> 
> spdadd 127.0.0.1 127.0.0.1 any
> -ctx 1 1 "system_u:object_r:ipsec_spd_t:s0-s15:c0.c1023"
> -P in ipsec
> esp/transport//require;
> 
> EoF
> 

Yes.



Comment 22 Joe Nall 2007-04-11 16:51:29 UTC
Created attachment 152289 [details]
setkey -D

Comment 23 Joe Nall 2007-04-11 16:55:54 UTC
What selinux-mls-policy version are you running?
Does it have the ipsec support patches?
I'm running 2.5.2 with Eamon's trusted X patches applied

Comment 24 Joy Latten 2007-04-11 19:19:47 UTC
Let me try with sgrubb's latest ipsec-tools in his repo and verify that all was
built correctly and is working. Give me a few minutes to do this and I will
respond here with my results. 

Also, I am currently running a labeled ipsec stress test for almost 24 hours...
my goal is for 48 hours with no problems. When I am done, I will then do a
stress test for loopback.

I did one last week for loopback and didn't get a panic... but will run another
one for 48 hours.

Also, I am using ppc64-bit on pSeries 520 model...

Comment 25 Joe Nall 2007-04-11 19:37:15 UTC
Joy - are you testing with a large context like the ones in comment #8?

Comment 26 Joe Nall 2007-04-11 19:38:19 UTC
Joy - are you testing with a large context like the ones in comment #3
?

Comment 27 James Morris 2007-04-11 19:49:10 UTC
FYI, I now have loopback ipsec working with the latest lspp repo ipsec-utils,
the lspp kernel and fc rawhide, mls policy and only in permissive mode.

Will try the large context file & test scripts next.


Comment 28 Joy Latten 2007-04-11 19:50:12 UTC
Created attachment 152320 [details]
Attached contents of my racoon.conf, psk.txt, and ipsec policy for labeled ipsec over loopback

OK, I updated to latest stuff in sgrubb's repo.
I setup labeled ipsec over loopback and did a ping and 
racoon did setup the SAs. 

Please let me know if the attachment with my racoon.conf, etc... helps or not.

Comment 29 Joy Latten 2007-04-11 19:52:14 UTC
oh great, I see you have it working. Sorry for the delay.
Let me know how else I can help.

Comment 30 Joy Latten 2007-04-11 20:58:17 UTC
Joe, does the ACQUIRE that racoon receive has this large context string which he
must negotiate?

Asking because right now racoon can only handle context strings up to 50
characters. This is an ugly limitation that I have not had time to fix in racoon. 

Have a meeting to go to, but if I have time afterwards, I will try this.
Otherwise, I will try it tomorrow.

Comment 31 Joe Nall 2007-04-11 21:29:59 UTC
I don't know. The original test created translations for "EVEN" and "ODD" that set all of the even categories 
and odd categories respectively. 'EVEN' and 'ODD" are less than 50 characters, the translations are not.

runcon "really large context" -- ping localhost 

causes a panic, so the 50 character limit doesn't seem to be a factor.

We should open another bug on the 50 character limit.

Comment 32 Joe Nall 2007-04-11 22:02:30 UTC
Maybe we don't need a new bug :(

After looking at the racoon code briefly. I don't see any checks for buffer overflow on the 50 character 
ctx_str. A few science experiments:
 a 48 character context works,
 52 - 136 character contexts fail (but do not panic),
 a 140 character context causes a panic

Is it possible that the kernel is not validating the pfkey interface adequately?
 

Comment 33 James Morris 2007-04-11 23:24:11 UTC
Interesting -- the large context is killing my racoon before it can do anything
(via the buffer overflow detector).

*** buffer overflow detected ***: racoon terminated
======= Backtrace: =========
/lib/libc.so.6(__chk_fail+0x41)[0x961501]
racoon[0x8069ec2]
racoon[0x8069028]
racoon[0x804c90c]
racoon[0x804bf81]
/lib/libc.so.6(__libc_start_main+0xe0)[0x892ef0]
racoon[0x804bbe1]


Comment 34 Joy Latten 2007-04-11 23:31:23 UTC
ok, I was able to reproduce on an lspp 73 kernel with latest stuff in sgrubb's
repo. I also don't think this has anything to do with loopback because I also
tried it with ipsec policy and racoon between two pseries boxes and got same
kernel BUG.

kernel BUG in xfrm_send_acquire at net/xfrm/xfrm_user.c:1781!
cpu 0x6: Vector: 700 (Program Check) at [c000000043ae32e0]
    pc: c00000000033b074: .xfrm_send_acquire+0x240/0x2c8
    lr: c00000000033b014: .xfrm_send_acquire+0x1e0/0x2c8
    sp: c000000043ae3560
   msr: 8000000000029032
  current = 0xc00000003c0b3510
  paca    = 0xc000000000465100
    pid   = 1985, comm = ping
kernel BUG in xfrm_send_acquire at net/xfrm/xfrm_user.c:1781!
enter ? for help
6:mon> t
[c000000043ae3650] c00000000033538c .km_query+0x6c/0xec
[c000000043ae36f0] c000000000337374 .xfrm_state_find+0x7f4/0xb88
[c000000043ae37f0] c000000000332350 .xfrm_tmpl_resolve+0xc4/0x21c
[c000000043ae38d0] c0000000003326e8 .xfrm_lookup+0x1a0/0x5b0
[c000000043ae3a00] c0000000002e6ea0 .ip_route_output_flow+0x88/0xb4
[c000000043ae3aa0] c0000000003106d8 .ip4_datagram_connect+0x218/0x374
[c000000043ae3bd0] c00000000031bc00 .inet_dgram_connect+0xac/0xd4
[c000000043ae3c60] c0000000002b11ac .sys_connect+0xd8/0x120
[c000000043ae3d90] c0000000002d38d0 .compat_sys_socketcall+0xdc/0x214
[c000000043ae3e30] c00000000000869c syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 0000000007f0ca9c
SP (ffadf910) is in userspace



Comment 35 James Morris 2007-04-11 23:36:40 UTC
What is verify_sec_ctx_len() in the kernel code supposed to be doing?  Looks
like it's just validating data against itself.

Comment 36 Joy Latten 2007-04-11 23:41:24 UTC
james, I just looked at racoon code, and there is a bug and bad programming on
my part. really bad, because racoon should check length of security context when
he receives an ACQUIRE since he knows he has a limitation on what he can put in
that buffer to begin with.

Also, the fact that you were able to get that far sounds great to me. Doesn't
that mean your ACQUIRES got through from the kernel?

One sec, i'll go look at verify_sec_ctx_len...

Comment 37 Joy Latten 2007-04-12 00:13:08 UTC
My guess is that since data follows this structure, additional steps are taken
to ensure the length is correct. I noticed the same is done for sadb_address
since structure is followed by data containing the ip address.  Not all the
sadb_xxx do this since not all are followed by additional data.

Comment 38 Joy Latten 2007-04-12 00:21:20 UTC
Opening a separate bug to fix the buffer overflow in racoon.

Comment 39 James Morris 2007-04-12 14:50:02 UTC
(In reply to comment #37)
> My guess is that since data follows this structure, additional steps are taken
> to ensure the length is correct. I noticed the same is done for sadb_address
> since structure is followed by data containing the ip address.  Not all the
> sadb_xxx do this since not all are followed by additional data.

The addresess are fixed length.  Contexts are variable length.  Where is the
variable length field being validated against the actual supplied data?

All it's doing at the moment is validating a field against itself.

Comment 40 Joy Latten 2007-04-12 17:07:08 UTC
I am noticing something weird. In xfrm_send_acquire(), the xfrm_ctx->ctx_len is
ALWAYS 46, no matter what. I am using Joe Nalls method of:
runcon "root:sysadm_r:sysadm_t:s2:c0,c2,c4,c6,c8,c10,c12,c14,c16,c18,c20" --
ping localhost
AND
runcon
"root:sysadm_r:sysadm_t:s2:c0,c2,c4,c6,c8,c10,c12,c14,c16,c18,c20,c22,c23,c24,c25,c26,c27,c28,c30"
-- ping localhost

These clearly should have produced different context string lengths in 
xfrm_ctx->ctx_len

Looking at xfrm_send_acquire(), looks like we are allocating space in skb based
on security context of policy and not of flow or something. Looking further into
this... 

Comment 41 Joy Latten 2007-04-12 18:54:20 UTC
Ok, I noticed that in xfrm_send_acquire() we do 
len += RTA_SPACE(xfrm_user_sec_ctx_size(xp));

len is later used to allocate skb.

I think we should be doing:
len += RTA_SPACE(xfrm_user_sec_ctx_size(x));
                                        ^
basing our len off of the security context in xfrm_state
which was allocated based on the security context of the flow.

I noticed in af_key.c we use security context in xfrm_state when
sending ACQUIRE and I think this is correct after looking at xfrm_state.c code.

Will create a proposed patch and attach shortly. 




Comment 42 Joy Latten 2007-04-12 19:57:40 UTC
Created attachment 152503 [details]
patch for xfrm_send_acquire()

The security context sent in the ACQUIRE is taken from the xfrm_state.
xfrm_send_acquire() seem to be computing the size/length of the security
context using the one from the xfrm_policy AND sending the one in the
xfrm_state. When ading the security context into the skb, in the case of
security contexts larger than the one in the policy, this fails. 
This patch computes size using xfrm_state's security context.

I tried this with S.Grubb's lates ipsec-tools package and Joe Nall's large
context (EVEN) test and it worked. 

Please let me know if this patch is ok and acceptable.

Comment 43 James Morris 2007-04-12 20:09:08 UTC
(In reply to comment #42)
> Created an attachment (id=152503) [edit]
> patch for xfrm_send_acquire()
> 
> The security context sent in the ACQUIRE is taken from the xfrm_state.
> xfrm_send_acquire() seem to be computing the size/length of the security
> context using the one from the xfrm_policy AND sending the one in the
> xfrm_state. When ading the security context into the skb, in the case of
> security contexts larger than the one in the policy, this fails. 
> This patch computes size using xfrm_state's security context.
> 
> I tried this with S.Grubb's lates ipsec-tools package and Joe Nall's large
> context (EVEN) test and it worked. 
> 
> Please let me know if this patch is ok and acceptable.
> 

Looks ok to me.  Can you please verify this with current a current upstream kernel ?

(I'm hitting an oops with mls + current git at the moment).


Comment 44 Joy Latten 2007-04-12 20:26:02 UTC
ok, I am downloading an upstream kernel right now and trying.

Comment 45 Joy Latten 2007-04-12 21:43:52 UTC
Ok, tried patch on upstream kernel with Joe's large context and it appears to be
working ok. I will run a 30 minute stress test on upstream kernel with a smaller
context than Joe Nall's just for assurance. I will run it over loopback although
the bug happens on loopback and between two hosts. 

I am using version 2.6.21-rc6-git5 of the kernel on a ppc64.
I used the kernel config for lspp 73 kernel.

Comment 46 Joy Latten 2007-04-12 23:21:04 UTC
ok my short stress test sent udp and tcp packets and has run ok. Should I go
ahead and send this patch upstream?

Comment 47 James Morris 2007-04-12 23:31:24 UTC
Yes, please.

Comment 48 Steve Grubb 2007-04-13 01:44:33 UTC
lspp.75 kernel has been built with this patch applied. It could use a re-test
statement on this bz so we can close it. Thanks.

Comment 49 Joe Nall 2007-04-13 14:44:17 UTC
I'll test this more today, but initial results are promising

setkey -D
...
127.0.0.1 127.0.0.1 
        esp mode=transport spi=214056136(0x0cc23cc8) reqid=0(0x00000000)
        E: 3des-cbc  7e365029 9523908c ab5549b8 724fec49 89bf98c0 00cc28b3
        A: hmac-sha1  e53892ca 483db930 42521bf9 3f95dc3c 4c2917fd
        seq=0x00000000 replay=4 flags=0x00000000 state=mature 
        created: Apr 13 09:40:10 2007   current: Apr 13 09:40:24 2007
        diff: 14(s)     hard: 1800(s)   soft: 1440(s)
        last:                           hard: 0(s)      soft: 0(s)
        current: 0(bytes)       hard: 0(bytes)  soft: 0(bytes)
        allocated: 0    hard: 0 soft: 0
        security context doi: 1
        security context algorithm: 1
        security context length: 2543
        security context:
root:sysadm_r:sysadm_t:s2:c0,c2,c4,c6,c8,c10,c12,c14,c16,c18,c20,c22,c24,c26,c28,c30,c32,c34,c36,c38,c40,c42,c44,c46,c48,c50,c52,c54,c56,c5
8,c60,c62,c64,c66,c68,c70,c72,c74,c76,c78,c80,c82,c84,c86,c88,c90,c92,c94,c96,c98,c100,c102,c104,c106,c108,c110,c112,c114,c116,c118,c120,c122,c124,c126,c128,c130,c13
2,c134,c136,c138,c140,c142,c144,c146,c148,c150,c152,c154,c156,c158,c160,c162,c164,c166,c168,c170,c172,c174,c176,c178,c180,c182,c184,c186,c188,c190,c192,c194,c196,c19
8,c200,c202,c204,c206,c208,c210,c212,c214,c216,c218,c220,c222,c224,c226,c228,c230,c232,c234,c236,c238,c240,c242,c244,c246,c248,c250,c252,c254,c256,c258,c260,c262,c26
4,c266,c268,c270,c272,c274,c276,c278,c280,c282,c284,c286,c288,c290,c292,c294,c296,c298,c300,c302,c304,c306,c308,c310,c312,c314,c316,c318,c320,c322,c324,c326,c328,c33
0,c332,c334,c336,c338,c340,c342,c344,c346,c348,c350,c352,c354,c356,c358,c360,c362,c364,c366,c368,c370,c372,c374,c376,c378,c380,c382,c384,c386,c388,c390,c392,c394,c39
6,c398,c400,c402,c404,c406,c408,c410,c412,c414,c416,c418,c420,c422,c424,c426,c428,c430,c432,c434,c436,c438,c440,c442,c444,c446,c448,c450,c452,c454,c456,c458,c460,c46
2,c464,c466,c468,c470,c472,c474,c476,c478,c480,c482,c484,c486,c488,c490,c492,c494,c496,c498,c500,c502,c504,c506,c508,c510,c512,c514,c516,c518,c520,c522,c524,c526,c52
8,c530,c532,c534,c536,c538,c540,c542,c544,c546,c548,c550,c552,c554,c556,c558,c560,c562,c564,c566,c568,c570,c572,c574,c576,c578,c580,c582,c584,c586,c588,c590,c592,c59
4,c596,c598,c600,c602,c604,c606,c608,c610,c612,c614,c616,c618,c620,c622,c624,c626,c628,c630,c632,c634,c636,c638,c640,c642,c644,c646,c648,c650,c652,c654,c656,c658,c66
0,c662,c664,c666,c668,c670,c672,c674,c676,c678,c680,c682,c684,c686,c688,c690,c692,c694,c696,c698,c700,c702,c704,c706,c708,c710,c712,c714,c716,c718,c720,c722,c724,c72
6,c728,c730,c732,c734,c736,c738,c740,c742,c744,c746,c748,c750,c752,c754,c756,c758,c760,c762,c764,c766,c768,c770,c772,c774,c776,c778,c780,c782,c784,c786,c788,c790,c79
2,c794,c796,c798,c800,c802,c804,c806,c808,c810,c812,c814,c816,c818,c820,c822,c824,c826,c828,c830,c832,c834,c836,c838,c840,c842,c844,c846,c848,c850,c852,c854,c856,c85
8,c860,c862,c864,c866,c868,c870,c872,c874,c876,c878,c880,c882,c884,c886,c888,c890,c892,c894,c896,c898,c900,c902,c904,c906,c908,c910,c912,c914,c916,c918,c920,c922,c92
4,c926,c928,c930,c932,c934,c936,c938,c940,c942,c944,c946,c948,c950,c952,c954,c956,c958,c960,c962,c964,c966,c968,c970,c972,c974,c976,c978,c980,c982,c984,c986,c988,c99
0,c992,c994,c996,c998,c1000,c1002,c1004,c1006,c1008,c1010,c1012,c1014,c1016,c1018,c1020,c1022
        sadb_seq=43 pid=3970 refcnt=0


Comment 50 Joe Nall 2007-04-14 01:11:22 UTC
Now I'm experiencing a different problem. racoon is leaking sockets

[root@fc6work ~]# lsof | grep racoon | grep soc | wc -l
623
[root@fc6work ~]# lsof | grep racoon | grep soc | wc -l
629
...

until it runs out of resources and no longer can open psk.txt - after which the
SAs expire and no new ones occur.

Each socket looks like

racoon    1981      root  629u     sock        0,5                87089 can't
identify protocol

in lsof.





Comment 51 Steve Grubb 2007-04-14 13:06:42 UTC
Depending on what their state is, this might be OK. Sockets in a wait state are
just waiting for a time period to pass before being recycled. You should check
their status (6th column) with:

netstat -taunp | grep racoon

Let us know what you see when you have a lot of sockets you think is leaked.

Comment 52 Joe Nall 2007-04-14 17:15:41 UTC
Created attachment 152618 [details]
Patch for socket leak

avc_init apparently opens a new socket every time it is called. racoon was
calling it many times. Added a static to remember when avc_init has been
called. fd usage is down to 11.

This is not a great fix, avc_init should get called during racoon init, not
during SA negotiation.

Comment 53 Steve Grubb 2007-04-14 20:29:21 UTC
A patch based on the finding of comment #52 was attached to bz 235680. (That's
where the socket leak is being troubleshot.)

Joe, do you agree that the panic is solved at this point?

Comment 54 Joe Nall 2007-04-14 20:59:51 UTC
Yes

Comment 55 Steve Grubb 2007-04-14 21:10:09 UTC
Removing this bug from the LSPP trackers. Thanks for the feedback.

Comment 56 Joy Latten 2007-04-16 16:27:52 UTC
The kernel fix has been accepted into the upstream kernel.

Comment 58 RHEL Program Management 2007-04-30 17:41:56 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 59 Don Zickus 2007-05-01 18:09:54 UTC
in 2.6.18-17.el5

Comment 62 errata-xmlrpc 2007-11-07 19:46:18 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html



Note You need to log in before you can comment on or make changes to this bug.