Bug 204756 - LSPP: kernel oops when doing an sftp with plain ipsec configured
LSPP: kernel oops when doing an sftp with plain ipsec configured
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
ppc64 Linux
medium Severity medium
: ---
: ---
Assigned To: Herbert Xu
Brian Brock
:
Depends On: 203593
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-31 09:44 EDT by Tim Burke
Modified: 2007-11-30 17:07 EST (History)
6 users (show)

See Also:
Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-11-17 09:22:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Tim Burke 2006-08-31 09:44:16 EDT
+++ This bug was initially created as a clone of Bug #203593 +++

LTC Owner is: latten@us.ibm.com
LTC Originator is: latten@us.ibm.com


Problem description:

Installed rawhide on August 18 onto two lpars.

Configured IPSec as follows:

add x.x.x.55 x.x.x.206 esp 35590
-m transport
-E 3des-cbc "06183223c23a21e8b36c566b"
-A hmac-md5 "TAHITEST89ABCDEF";

add x.x.x.206 x.x.x.55 esp 12360
-m transport
-E 3des-cbc "06183223c23a21e8b36c566b"
-A hmac-md5 "TAHITEST89ABCDEF";

spdadd x.x.x.55 x.x.x.206 any -P in ipsec
        esp/transport//require;

spdadd x.x.x.206 x.x.x.55 any -P out ipsec
        esp/transport//require;

Same config on both machines, except for  spdadd entry. The "in" and "out"
are swapped on the other machine.

Then I tried to do an "sftp xxx" from one machine to the other, and the
following oops occurred:

root@joy-hv4 jml]# sftp hvracer1
Connecting to hvracer1...
kernel BUG in skb_to_sgvec at net/xfrm/xfrm_algo.c:611!
cpu 0x0: Vector: 700 (Program Check) at [c000000047967250]
    pc: c000000000369a38: .skb_to_sgvec+0x288/0x2ec
    lr: d000000000b205f0: .esp_output+0x350/0x4e4 [esp4]
    sp: c0000000479674d0
   msr: 8000000000029032
  current = 0xc000000002511270
  paca    = 0xc000000000494380
    pid   = 24005, comm = ssh
kernel BUG in skb_to_sgvec at net/xfrm/xfrm_algo.c:611!
enter ? for help
0:mon>t
[c0000000479675a0] d000000000b205f0 .esp_output+0x350/0x4e4 [esp4]
[c000000047967680] c000000000362358 .xfrm4_output_finish2+0x2d0/0x3ec
[c000000047967720] c000000000362628 .xfrm4_output+0x74/0x88
[c0000000479677a0] c000000000322af8 .ip_queue_xmit+0x4ac/0x544
[c0000000479678a0] c0000000003360f8 .tcp_transmit_skb+0x820/0x890
[c000000047967960] c0000000003392dc .tcp_connect+0x308/0x3b0
[c000000047967a00] c00000000033d95c .tcp_v4_connect+0x53c/0x6d4
[c000000047967b80] c00000000034c03c .inet_stream_connect+0x10c/0x358
[c000000047967c60] c0000000002e3ebc .sys_connect+0xd8/0x120
[c000000047967d90] c000000000305d2c .compat_sys_socketcall+0xdc/0x214
[c000000047967e30] c00000000000871c syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 0000000007aef8ec
SP (f969f230) is in userspace

uname -a
Linux joy-hv4.ltc.austin.ibm.com 2.6.17-1.2571.fc6 #1 SMP Wed Aug 16 16:31:40
EDT 2006 ppc64 ppc64 ppc64 GNU/Linux

Hardware Environment
    Machine type: pSeries
    Cpu type: power 5

-- Additional comment from gjohnson@austin.ibm.com on 2006-08-23 11:56 EST --
----- Additional Comments From latten@us.ibm.com  2006-08-23 12:00 EDT -------
I updated to latest rawhide yesterday, 2.6.17-1.2573.fc6
I am seeing the same problem when I configure IPSec.
In redhat-lspp community, someone tried this also, but did not get
the oops. I believe they were not using ppc64. They also
could not get ping to work.

I am going to dowload and install latest vanilla kernel with mls-nethook patches
and see what happens. 

-- Additional comment from jmorris@redhat.com on 2006-08-24 10:45 EST --
What happens with selinux=0 at the kernel boot prompt?

-- Additional comment from gjohnson@austin.ibm.com on 2006-08-24 19:36 EST --
----- Additional Comments From latten@us.ibm.com  2006-08-24 19:40 EDT -------
I compiled latest kernel from kernel.org without NETLABEL=n in .config
file. When doing a ping I got the same message. So it isn't
because of cipso. Forgot to do an sftp.

I will try with selinux=0 next. 

-- Additional comment from gjohnson@austin.ibm.com on 2006-08-25 00:11 EST --
----- Additional Comments From latten@us.ibm.com  2006-08-25 00:17 EDT -------
When I set selinux=0 on boot prompt such that selinux becomes disabled,
I still get the oops when I try to sftp with IPSec configured.

Also with the same configuration, I am unable to ping. My ping seems to hang
and there aren't any packets leaving my machine. From what I can tell, 
the kernel does find the policy, but I am not sure what happens after that.
I suspect my packet is getting dropped... perhaps something fails during esp
processing... just guessing at this point. 

From what I can understand, the IPSec behaviour is the same with or without
selinux. 

-- Additional comment from jmorris@redhat.com on 2006-08-25 12:26 EST --
I've tested this configuration with current rawhide between an x86 and x86_64
box with no problems.

The point where you're seeing the oops is:

        BUG_ON(len);

at the end of skb_to_sgvec().

This code and the immediate calling code appears to be identical to the upstream
code.

What kind of network card are you using, and can you provide the output of:
ethtool -k <device>

It'd be interesting to know if you see this problem with the current upstream
-mm kernel.



-- Additional comment from gjohnson@austin.ibm.com on 2006-08-25 15:51 EST --
----- Additional Comments From latten@us.ibm.com  2006-08-25 15:47 EDT -------
On both lpars, I get following output:
[root@joy-hv4 ipv4]# ethtool -k eth0
Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
Cannot get device tcp segmentation offload settings: Operation not supported
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off

Also, I have downloaded 2.6.17 from kernel.org and have applied
patch-2.6.18-rc4 and 2.6.18-rc4-mm2. I used the kernel config that was in 
rawhide, config-2.6.17-1.2573.fc6. I turned off CONFIG_NETLABEL, just to 
verify CIPSO was not the reason my pings were hanging.
In my text for this bugreport where I refer to a vanilla kernel,
I am referring to my kernel.org kernel with the two mentioned patch sets 
applied. I apologize for not being more explicit. 

I used this kernel with selinux=0 at boot prompt.
Also, I am using it to debug. So far from what I can tell,
my pings are hanging because in xfrm4_output_one(), the call,
x->type->output() returns a -22. I am using AH only right now.

I tried this with IPSec configured for ESP and then tried it
for AH. Both times my ping hung. 

-- Additional comment from jmorris@redhat.com on 2006-08-25 16:00 EST --
Are you seeing the oops with the vanilla kernel?


-- Additional comment from gjohnson@austin.ibm.com on 2006-08-25 17:11 EST --
----- Additional Comments From latten@us.ibm.com  2006-08-25 17:07 EDT -------
Yes, I see the oops with the vanilla kernel.
vanilla kernel = 2.6.17 + patch-2.6.18-rc4 + 2.6.18-rc4-mm2

If I should try a different set of patches or without -mm2,
let me know. 

-- Additional comment from jmorris@redhat.com on 2006-08-25 17:23 EST --
Ok, as it's an upstream problem, it might be worth also posting a report to
netdev, as more people can look at it.

-- Additional comment from herbert.xu@redhat.com on 2006-08-31 01:41 EST --
This bug turns out to be caused by a broken memcpy/memmove implementation on
ppc64.  Paul Mackerras has a fix for it.
Comment 2 Herbert Xu 2006-09-03 20:35:30 EDT
Dave Jones has resynced the kernel so the fix is now in CVS.
Comment 4 RHEL Product and Program Management 2006-09-07 15:00:37 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 6 Don Zickus 2006-09-15 17:00:14 EDT
kernel-2.6.17-1.2654.el5

Note You need to log in before you can comment on or make changes to this bug.