Bug 203593 - LSPP: kernel oops when doing an sftp with plain ipsec configured
LSPP: kernel oops when doing an sftp with plain ipsec configured
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
ppc64 Linux
medium Severity medium
: ---
: ---
Assigned To: James Morris
Brian Brock
Depends On:
Blocks: 204756
  Show dependency treegraph
Reported: 2006-08-22 13:10 EDT by IBM Bug Proxy
Modified: 2007-11-30 17:11 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-09-18 22:19:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 26517 None None None Never

  None (edit)
Description IBM Bug Proxy 2006-08-22 13:10:35 EDT
LTC Owner is: latten@us.ibm.com
LTC Originator is: latten@us.ibm.com

Problem description:

Installed rawhide on August 18 onto two lpars.

Configured IPSec as follows:

add x.x.x.55 x.x.x.206 esp 35590
-m transport
-E 3des-cbc "06183223c23a21e8b36c566b"
-A hmac-md5 "TAHITEST89ABCDEF";

add x.x.x.206 x.x.x.55 esp 12360
-m transport
-E 3des-cbc "06183223c23a21e8b36c566b"
-A hmac-md5 "TAHITEST89ABCDEF";

spdadd x.x.x.55 x.x.x.206 any -P in ipsec

spdadd x.x.x.206 x.x.x.55 any -P out ipsec

Same config on both machines, except for  spdadd entry. The "in" and "out"
are swapped on the other machine.

Then I tried to do an "sftp xxx" from one machine to the other, and the
following oops occurred:

root@joy-hv4 jml]# sftp hvracer1
Connecting to hvracer1...
kernel BUG in skb_to_sgvec at net/xfrm/xfrm_algo.c:611!
cpu 0x0: Vector: 700 (Program Check) at [c000000047967250]
    pc: c000000000369a38: .skb_to_sgvec+0x288/0x2ec
    lr: d000000000b205f0: .esp_output+0x350/0x4e4 [esp4]
    sp: c0000000479674d0
   msr: 8000000000029032
  current = 0xc000000002511270
  paca    = 0xc000000000494380
    pid   = 24005, comm = ssh
kernel BUG in skb_to_sgvec at net/xfrm/xfrm_algo.c:611!
enter ? for help
[c0000000479675a0] d000000000b205f0 .esp_output+0x350/0x4e4 [esp4]
[c000000047967680] c000000000362358 .xfrm4_output_finish2+0x2d0/0x3ec
[c000000047967720] c000000000362628 .xfrm4_output+0x74/0x88
[c0000000479677a0] c000000000322af8 .ip_queue_xmit+0x4ac/0x544
[c0000000479678a0] c0000000003360f8 .tcp_transmit_skb+0x820/0x890
[c000000047967960] c0000000003392dc .tcp_connect+0x308/0x3b0
[c000000047967a00] c00000000033d95c .tcp_v4_connect+0x53c/0x6d4
[c000000047967b80] c00000000034c03c .inet_stream_connect+0x10c/0x358
[c000000047967c60] c0000000002e3ebc .sys_connect+0xd8/0x120
[c000000047967d90] c000000000305d2c .compat_sys_socketcall+0xdc/0x214
[c000000047967e30] c00000000000871c syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 0000000007aef8ec
SP (f969f230) is in userspace

uname -a
Linux joy-hv4.ltc.austin.ibm.com 2.6.17-1.2571.fc6 #1 SMP Wed Aug 16 16:31:40
EDT 2006 ppc64 ppc64 ppc64 GNU/Linux

Hardware Environment
    Machine type: pSeries
    Cpu type: power 5
Comment 1 IBM Bug Proxy 2006-08-23 11:56:33 EDT
----- Additional Comments From latten@us.ibm.com  2006-08-23 12:00 EDT -------
I updated to latest rawhide yesterday, 2.6.17-1.2573.fc6
I am seeing the same problem when I configure IPSec.
In redhat-lspp community, someone tried this also, but did not get
the oops. I believe they were not using ppc64. They also
could not get ping to work.

I am going to dowload and install latest vanilla kernel with mls-nethook patches
and see what happens. 
Comment 2 James Morris 2006-08-24 10:45:36 EDT
What happens with selinux=0 at the kernel boot prompt?
Comment 3 IBM Bug Proxy 2006-08-24 19:36:31 EDT
----- Additional Comments From latten@us.ibm.com  2006-08-24 19:40 EDT -------
I compiled latest kernel from kernel.org without NETLABEL=n in .config
file. When doing a ping I got the same message. So it isn't
because of cipso. Forgot to do an sftp.

I will try with selinux=0 next. 
Comment 4 IBM Bug Proxy 2006-08-25 00:11:45 EDT
----- Additional Comments From latten@us.ibm.com  2006-08-25 00:17 EDT -------
When I set selinux=0 on boot prompt such that selinux becomes disabled,
I still get the oops when I try to sftp with IPSec configured.

Also with the same configuration, I am unable to ping. My ping seems to hang
and there aren't any packets leaving my machine. From what I can tell, 
the kernel does find the policy, but I am not sure what happens after that.
I suspect my packet is getting dropped... perhaps something fails during esp
processing... just guessing at this point. 

From what I can understand, the IPSec behaviour is the same with or without selinux. 
Comment 5 James Morris 2006-08-25 12:26:01 EDT
I've tested this configuration with current rawhide between an x86 and x86_64
box with no problems.

The point where you're seeing the oops is:


at the end of skb_to_sgvec().

This code and the immediate calling code appears to be identical to the upstream

What kind of network card are you using, and can you provide the output of:
ethtool -k <device>

It'd be interesting to know if you see this problem with the current upstream
-mm kernel.

Comment 6 IBM Bug Proxy 2006-08-25 15:51:36 EDT
----- Additional Comments From latten@us.ibm.com  2006-08-25 15:47 EDT -------
On both lpars, I get following output:
[root@joy-hv4 ipv4]# ethtool -k eth0
Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
Cannot get device tcp segmentation offload settings: Operation not supported
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off

Also, I have downloaded 2.6.17 from kernel.org and have applied
patch-2.6.18-rc4 and 2.6.18-rc4-mm2. I used the kernel config that was in 
rawhide, config-2.6.17-1.2573.fc6. I turned off CONFIG_NETLABEL, just to 
verify CIPSO was not the reason my pings were hanging.
In my text for this bugreport where I refer to a vanilla kernel,
I am referring to my kernel.org kernel with the two mentioned patch sets 
applied. I apologize for not being more explicit. 

I used this kernel with selinux=0 at boot prompt.
Also, I am using it to debug. So far from what I can tell,
my pings are hanging because in xfrm4_output_one(), the call,
x->type->output() returns a -22. I am using AH only right now.

I tried this with IPSec configured for ESP and then tried it
for AH. Both times my ping hung. 
Comment 7 James Morris 2006-08-25 16:00:33 EDT
Are you seeing the oops with the vanilla kernel?
Comment 8 IBM Bug Proxy 2006-08-25 17:11:15 EDT
----- Additional Comments From latten@us.ibm.com  2006-08-25 17:07 EDT -------
Yes, I see the oops with the vanilla kernel.
vanilla kernel = 2.6.17 + patch-2.6.18-rc4 + 2.6.18-rc4-mm2

If I should try a different set of patches or without -mm2,
let me know. 
Comment 9 James Morris 2006-08-25 17:23:45 EDT
Ok, as it's an upstream problem, it might be worth also posting a report to
netdev, as more people can look at it.
Comment 10 Herbert Xu 2006-08-31 01:41:38 EDT
This bug turns out to be caused by a broken memcpy/memmove implementation on
ppc64.  Paul Mackerras has a fix for it.
Comment 11 IBM Bug Proxy 2006-08-31 11:56:08 EDT
----- Additional Comments From latten@us.ibm.com  2006-08-31 11:53 EDT -------
Paul Mackerras' patch fixes the problem. 
Comment 12 IBM Bug Proxy 2006-08-31 11:56:33 EDT
----- Additional Comments From latten@us.ibm.com  2006-08-31 11:53 EDT -------
Paul's patch was posted to netdev. 
Comment 13 IBM Bug Proxy 2006-09-29 15:15:54 EDT
----- Additional Comments From latten@us.ibm.com  2006-09-29 15:11 EDT -------
This bug is fixed in rawhide. However, it is not fixed in rhel5 beta1 update.
I was able to reproduce it on rhel5 beta1 update on a pseries. 

Note You need to log in before you can comment on or make changes to this bug.