Bug 204756

Summary: LSPP: kernel oops when doing an sftp with plain ipsec configured
Product: Red Hat Enterprise Linux 5 Reporter: Tim Burke <tburke>
Component: kernelAssignee: Herbert Xu <herbert.xu>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: herbert.xu, iboverma, jmorris, sgrubb, tgraf, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: 5.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-17 14:22:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 203593    
Bug Blocks:    

Description Tim Burke 2006-08-31 13:44:16 UTC
+++ This bug was initially created as a clone of Bug #203593 +++

LTC Owner is: latten.com
LTC Originator is: latten.com


Problem description:

Installed rawhide on August 18 onto two lpars.

Configured IPSec as follows:

add x.x.x.55 x.x.x.206 esp 35590
-m transport
-E 3des-cbc "06183223c23a21e8b36c566b"
-A hmac-md5 "TAHITEST89ABCDEF";

add x.x.x.206 x.x.x.55 esp 12360
-m transport
-E 3des-cbc "06183223c23a21e8b36c566b"
-A hmac-md5 "TAHITEST89ABCDEF";

spdadd x.x.x.55 x.x.x.206 any -P in ipsec
        esp/transport//require;

spdadd x.x.x.206 x.x.x.55 any -P out ipsec
        esp/transport//require;

Same config on both machines, except for  spdadd entry. The "in" and "out"
are swapped on the other machine.

Then I tried to do an "sftp xxx" from one machine to the other, and the
following oops occurred:

root@joy-hv4 jml]# sftp hvracer1
Connecting to hvracer1...
kernel BUG in skb_to_sgvec at net/xfrm/xfrm_algo.c:611!
cpu 0x0: Vector: 700 (Program Check) at [c000000047967250]
    pc: c000000000369a38: .skb_to_sgvec+0x288/0x2ec
    lr: d000000000b205f0: .esp_output+0x350/0x4e4 [esp4]
    sp: c0000000479674d0
   msr: 8000000000029032
  current = 0xc000000002511270
  paca    = 0xc000000000494380
    pid   = 24005, comm = ssh
kernel BUG in skb_to_sgvec at net/xfrm/xfrm_algo.c:611!
enter ? for help
0:mon>t
[c0000000479675a0] d000000000b205f0 .esp_output+0x350/0x4e4 [esp4]
[c000000047967680] c000000000362358 .xfrm4_output_finish2+0x2d0/0x3ec
[c000000047967720] c000000000362628 .xfrm4_output+0x74/0x88
[c0000000479677a0] c000000000322af8 .ip_queue_xmit+0x4ac/0x544
[c0000000479678a0] c0000000003360f8 .tcp_transmit_skb+0x820/0x890
[c000000047967960] c0000000003392dc .tcp_connect+0x308/0x3b0
[c000000047967a00] c00000000033d95c .tcp_v4_connect+0x53c/0x6d4
[c000000047967b80] c00000000034c03c .inet_stream_connect+0x10c/0x358
[c000000047967c60] c0000000002e3ebc .sys_connect+0xd8/0x120
[c000000047967d90] c000000000305d2c .compat_sys_socketcall+0xdc/0x214
[c000000047967e30] c00000000000871c syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 0000000007aef8ec
SP (f969f230) is in userspace

uname -a
Linux joy-hv4.ltc.austin.ibm.com 2.6.17-1.2571.fc6 #1 SMP Wed Aug 16 16:31:40
EDT 2006 ppc64 ppc64 ppc64 GNU/Linux

Hardware Environment
    Machine type: pSeries
    Cpu type: power 5

-- Additional comment from gjohnson.com on 2006-08-23 11:56 EST --
----- Additional Comments From latten.com  2006-08-23 12:00 EDT -------
I updated to latest rawhide yesterday, 2.6.17-1.2573.fc6
I am seeing the same problem when I configure IPSec.
In redhat-lspp community, someone tried this also, but did not get
the oops. I believe they were not using ppc64. They also
could not get ping to work.

I am going to dowload and install latest vanilla kernel with mls-nethook patches
and see what happens. 

-- Additional comment from jmorris on 2006-08-24 10:45 EST --
What happens with selinux=0 at the kernel boot prompt?

-- Additional comment from gjohnson.com on 2006-08-24 19:36 EST --
----- Additional Comments From latten.com  2006-08-24 19:40 EDT -------
I compiled latest kernel from kernel.org without NETLABEL=n in .config
file. When doing a ping I got the same message. So it isn't
because of cipso. Forgot to do an sftp.

I will try with selinux=0 next. 

-- Additional comment from gjohnson.com on 2006-08-25 00:11 EST --
----- Additional Comments From latten.com  2006-08-25 00:17 EDT -------
When I set selinux=0 on boot prompt such that selinux becomes disabled,
I still get the oops when I try to sftp with IPSec configured.

Also with the same configuration, I am unable to ping. My ping seems to hang
and there aren't any packets leaving my machine. From what I can tell, 
the kernel does find the policy, but I am not sure what happens after that.
I suspect my packet is getting dropped... perhaps something fails during esp
processing... just guessing at this point. 

From what I can understand, the IPSec behaviour is the same with or without
selinux. 

-- Additional comment from jmorris on 2006-08-25 12:26 EST --
I've tested this configuration with current rawhide between an x86 and x86_64
box with no problems.

The point where you're seeing the oops is:

        BUG_ON(len);

at the end of skb_to_sgvec().

This code and the immediate calling code appears to be identical to the upstream
code.

What kind of network card are you using, and can you provide the output of:
ethtool -k <device>

It'd be interesting to know if you see this problem with the current upstream
-mm kernel.



-- Additional comment from gjohnson.com on 2006-08-25 15:51 EST --
----- Additional Comments From latten.com  2006-08-25 15:47 EDT -------
On both lpars, I get following output:
[root@joy-hv4 ipv4]# ethtool -k eth0
Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
Cannot get device tcp segmentation offload settings: Operation not supported
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off

Also, I have downloaded 2.6.17 from kernel.org and have applied
patch-2.6.18-rc4 and 2.6.18-rc4-mm2. I used the kernel config that was in 
rawhide, config-2.6.17-1.2573.fc6. I turned off CONFIG_NETLABEL, just to 
verify CIPSO was not the reason my pings were hanging.
In my text for this bugreport where I refer to a vanilla kernel,
I am referring to my kernel.org kernel with the two mentioned patch sets 
applied. I apologize for not being more explicit. 

I used this kernel with selinux=0 at boot prompt.
Also, I am using it to debug. So far from what I can tell,
my pings are hanging because in xfrm4_output_one(), the call,
x->type->output() returns a -22. I am using AH only right now.

I tried this with IPSec configured for ESP and then tried it
for AH. Both times my ping hung. 

-- Additional comment from jmorris on 2006-08-25 16:00 EST --
Are you seeing the oops with the vanilla kernel?


-- Additional comment from gjohnson.com on 2006-08-25 17:11 EST --
----- Additional Comments From latten.com  2006-08-25 17:07 EDT -------
Yes, I see the oops with the vanilla kernel.
vanilla kernel = 2.6.17 + patch-2.6.18-rc4 + 2.6.18-rc4-mm2

If I should try a different set of patches or without -mm2,
let me know. 

-- Additional comment from jmorris on 2006-08-25 17:23 EST --
Ok, as it's an upstream problem, it might be worth also posting a report to
netdev, as more people can look at it.

-- Additional comment from herbert.xu on 2006-08-31 01:41 EST --
This bug turns out to be caused by a broken memcpy/memmove implementation on
ppc64.  Paul Mackerras has a fix for it.

Comment 2 Herbert Xu 2006-09-04 00:35:30 UTC
Dave Jones has resynced the kernel so the fix is now in CVS.

Comment 4 RHEL Program Management 2006-09-07 19:00:37 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Don Zickus 2006-09-15 21:00:14 UTC
kernel-2.6.17-1.2654.el5