Bug 759073

Summary: ipsec ipv6 tunnels won't start after reboot
Product: Red Hat Enterprise Linux 5 Reporter: Adam Okuliar <aokuliar>
Component: openswanAssignee: Paul Wouters <pwouters>
Status: CLOSED WONTFIX QA Contact: Aleš Mareček <amarecek>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.8CC: amarecek, aokuliar, eparis, ksrot, ovasik, pwouters, tis
Target Milestone: rcFlags: aokuliar: needinfo+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-23 19:26:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1049888    
Attachments:
Description Flags
log file
none
other end log file none

Comment 1 Paul Wouters 2013-03-20 02:12:51 UTC
I need a little bit more information to investigate this.

Did this happen once? Does it happen all the time?
Does the configuration for the tunnel have auto=start ?

If possible, show the ipsec.conf (and its include files). Possibly a log made by setting plutodebug=all in /etc/ipsec.conf and rebooting.

Comment 2 RHEL Program Management 2013-05-01 06:52:07 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 5 RHEL Program Management 2014-01-22 16:34:25 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 10 Paul Wouters 2014-02-11 02:06:31 UTC
I've tested this, and I i have some times  reproduced this using.

These seems to be a race condition somewhere. tcpdump shows that the ike packets for the ipv6 tunnel never leave the host. However, once I am stuck in this, I can also not just issue an ipsec auto --add v6 or ipsec auto --up v6 either

the problem seems related to:

| *received kernel message
| netlink_get: XFRM_MSG_ACQUIRE message
| xfrm netlink msg len 364
| xfrm:nlmsghdr= 16
| xfrm:acquire= 280
| xfrm:rtattr= 4
| rtattr len= 68
| xfrm: found XFRMA_TMPL
| xfrm: did not found XFRMA_SEC_CTX, trying next one
| xfrm: rta->len=68
| xfrm: remaining=0 , rta->len = 28274
| xfrm: not found anything, seems wierd
| xfrm: not found sec ctx still, perhaps not a labeled ipsec connection
| add bare shunt 0x2b501e9c7100 2001:db8:1:2::23/128:136 --58--> 2001:db8:1:2::45/128:0 => %hold 0    %acquire-netlink
| received security label string: 
initiate on demand from 2001:db8:1:2::23:136 to 2001:db8:1:2::45:0 proto=58 state: fos_start because: acquire
| find_connection: looking for policy for connection: 2001:db8:1:2::23:58/136 -> 2001:db8:1:2::45:58/0
| find_connection: conn "v6" has compatible peers: 2001:db8:1:2::23/128 -> 2001:db8:1:2::45/128 [pri: 67371018]
| find_connection: comparing best "v6" [pri:67371018]{0x2b501e9a9b00} (child none) to "v6" [pri:67371018]{0x2b501e9a9b00} (child none)
| find_connection: concluding with "v6" [pri:67371018]{0x2b501e9a9b00} kind=CK_PERMANENT
| assign hold, routing was erouted HOLD, needs to be erouted HOLD
| delete bare shunt: null pointer
| Queuing pending Quick Mode with 2001:db8:1:2::45 "v6"
| delete bare shunt 0x2b501e9c7100 2001:db8:1:2::23/128:136 --58--> 2001:db8:1:2::45/128:0 => %hold 0    %acquire-netlink
| * processed 0 messages from cryptographic helpers 
| next event EVENT_RETRANSMIT in 29 seconds for #2
| next event EVENT_RETRANSMIT in 29 seconds for #2
| 
| next event EVENT_RETRANSMIT in 0 seconds for #2
| *time to handle event
| handling event EVENT_RETRANSMIT

Comment 11 Paul Wouters 2014-02-11 02:09:26 UTC
running service ipsec restart on both ends seems to always fix it......

Comment 12 Paul Wouters 2014-02-11 02:19:20 UTC
I just noticed these are not plain v4 and v6 configurations. as the log shows, there are two ipv4 connections:

packet from 192.1.2.45:500: received Vendor ID payload [Openswan (this version) 2.6.32 ]
packet from 192.1.2.45:500: received Vendor ID payload [Dead Peer Detection]
"v4" #3: responding to Main Mode
"v4" #3: transition from state STATE_MAIN_R0 to state STATE_MAIN_R1
"v4" #3: STATE_MAIN_R1: sent MR1, expecting MI2
"v4" #3: transition from state STATE_MAIN_R1 to state STATE_MAIN_R2
"v4" #3: STATE_MAIN_R2: sent MR2, expecting MI3
"v4" #3: Main mode peer ID is ID_IPV4_ADDR: '192.1.2.45'
"v4" #3: transition from state STATE_MAIN_R2 to state STATE_MAIN_R3
"v4" #3: STATE_MAIN_R3: sent MR3, ISAKMP SA established {auth=OAKLEY_PRESHARED_KEY cipher=oakley_3des_cbc_192 prf=oakley_sha group=modp1536}
initiate on demand from 2001:db8:1:2::23:136 to 2001:db8:1:2::45:0 proto=58 state: fos_start because: acquire
"v4" #3: the peer proposed: 192.1.2.23/32:0/0 -> 192.1.2.45/32:0/0
"v4" #4: responding to Quick Mode proposal {msgid:f0ca0ccf}
"v4" #4:     us: 192.1.2.23<192.1.2.23>[+S=C]
"v4" #4:   them: 192.1.2.45<192.1.2.45>[+S=C]
"v4" #4: transition from state STATE_QUICK_R0 to state STATE_QUICK_R1
"v4" #4: STATE_QUICK_R1: sent QR1, inbound IPsec SA installed, expecting QI2
"v4" #4: transition from state STATE_QUICK_R1 to state STATE_QUICK_R2
"v4" #4: STATE_QUICK_R2: IPsec SA established transport mode {ESP=>0x15614de5 <0xf10d87cc xfrm=3DES_0-HMAC_SHA1 NATOA=none NATD=none DPD=none}
"v4" #1: received Vendor ID payload [Openswan (this version) 2.6.32 ]
"v4" #1: received Vendor ID payload [Dead Peer Detection]
"v4" #1: transition from state STATE_MAIN_I1 to state STATE_MAIN_I2
"v4" #1: STATE_MAIN_I2: sent MI2, expecting MR2
"v4" #1: transition from state STATE_MAIN_I2 to state STATE_MAIN_I3
"v4" #1: STATE_MAIN_I3: sent MI3, expecting MR3
"v4" #1: received Vendor ID payload [CAN-IKEv2]
"v4" #1: Main mode peer ID is ID_IPV4_ADDR: '192.1.2.45'
"v4" #1: transition from state STATE_MAIN_I3 to state STATE_MAIN_I4
"v4" #1: STATE_MAIN_I4: ISAKMP SA established {auth=OAKLEY_PRESHARED_KEY cipher=oakley_3des_cbc_192 prf=oakley_sha group=modp1536}
"v4" #5: initiating Quick Mode PSK+ENCRYPT+UP+IKEv2ALLOW+SAREFTRACK {using isakmp#1 msgid:364dc682 proposal=3DES(3)_192-SHA1(2)_160 pfsgroup=no-pfs}
"v4" #5: transition from state STATE_QUICK_I1 to state STATE_QUICK_I2
"v4" #5: STATE_QUICK_I2: sent QI2, IPsec SA established transport mode {ESP=>0x6fc562e2 <0xf70e8d50 xfrm=3DES_0-HMAC_SHA1 NATOA=none NATD=none DPD=none}
initiate on demand from 2001:db8:1:2::23:136 to 2001:db8:1:2::45:0 proto=58 state: fos_start because: acquire
initiate on demand from 2001:db8:1:2::23:136 to 2001:db8:1:2::45:0 proto=58 state: fos_start because: acquire

The config is really doing a v4 tunnel, but for v6 with the left/right being ipv6 and the connaddrfamily= default being ipv4, it's a conn we should never load.

additionally, confusion happens because we cannot turn this into a 6in4 tunnel either, because type=transport

and even more confusingly, adding connaddrfamily=ipv6 to the second conn also does not seem to address the issue.

Attaching both ends logs to this bug for further analyses tomorrow.

Comment 13 Paul Wouters 2014-02-11 02:20:29 UTC
Created attachment 861646 [details]
log file

pluto log file for rhel5a

Comment 14 Paul Wouters 2014-02-11 02:21:36 UTC
Created attachment 861647 [details]
other end log file

other end log file

Comment 15 Paul Wouters 2014-02-13 02:28:02 UTC
I can no longer reproduce this error using:

openswan-2.6.32-5.el5_9
kernel-2.6.18-371.el5

(note that those nexthop's should not be needed and are technically incorrect (as those are only used when needing to route packets, but these two machines are in the same subnet)

my configs used:

[root@rhel5a ~]# cat /etc/ipsec.conf 
# /etc/ipsec.conf - Libreswan IPsec configuration file

version 2.0

config setup
	# put the logs in /tmp for the UMLs, so that we can operate
	# without syslogd, which seems to break on UMLs
	plutostderrlog=/tmp/pluto.log
        plutodebug=all
	plutorestartoncrash=false
	dumpdir=/tmp
	protostack=netkey

conn v4
	left=192.1.2.45
	right=192.1.2.23
	authby=secret
	type=transport
	pfs=no
	ike=3des-sha1
	esp=3des-sha1
	# this tests for on-boot tunnel establishment
	auto=start

conn v6
	connaddrfamily=ipv6
	left=2001:db8:1:2::45
	right=2001:db8:1:2::23
	authby=secret
	type=transport
	pfs=no
	ike=3des-sha1
	esp=3des-sha1
	# this tests for on-boot tunnel establishment
	auto=start

[root@rhel5b ~]# cat /etc/ipsec.secrets 
192.1.2.45 192.1.2.23 : PSK "geheim"
2001:db8:1:2::23 2001:db8:1:2::45 : PSK "geheim"
[root@rhel5b ~]# 

Can you re-test with this version of openswan and kernel?

Comment 16 Eric Paris 2014-04-22 21:10:40 UTC
As this bug was not resolved in 5.11 it will likely be CLOSED WONTFIX in the near future.  If you disagree with this closure please feel free to reopen the bug through your appropriate support contact (GSS).