Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2216154

Summary:

IPsec traffic is down when pod is restarted even though xfrm policies/states are configured in the kernel

Product:

Red Hat Enterprise Linux 8

Reporter:

Yossi Boaron <yboaron>

Component:

libreswan

Assignee:

Daiki Ueno <dueno>

Status:

CLOSED MIGRATED

QA Contact:

BaseOS QE Security Team <qe-baseos-security>

Severity:

unspecified

Docs Contact:

Priority:

high

Version:

8.6

CC:

dueno, nyechiel, paul.wouters, sdubroca, sgaddam, shebburn

Target Milestone:

Keywords:

MigratedToJIRA, Triaged

Target Release:

---

Flags:

pm-rhel: mirror+

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2023-09-20 00:24:03 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
GW pod logs on cluster2 at the time the GW pod on cluster1 was restarted.	none
UDP socket on port 4500 while Pluto process being restarted	none

Description Yossi Boaron 2023-06-20 10:16:44 UTC

Description of problem:

Submariner uses Libreswan/IPsec for setting up secure tunnels between the Gateway nodes. 

One of the nodes in each of the OpenShift clusters will be designated as a Gateway node and submariner-gateway pod (runs on that node with hostNetworking enabled) configures the necessary IPsec connections on the underlying node.


Submariner GW pod runs Pluto [1] and then uses "whack" (check [2] for an example) to set up tunnels to the remote endpoints.

When GW pod is terminated we do not want any datapath to be disrupted, so we do not cleanup any xfrm policies/states from the kernel.

The terminated GW pod will be restarted by Kubernetes according to k8s restartpolicy [3], which means the GW restart time may reach 5 minutes in some cases.

Since the Pod was not aware of the previous connection status, it re-runs Pluto and also the same "whack" commands to establish the connection to the remote endpoint.


So, assuming the following use case:  

T0- GW pod starts running, configure the IPSec tunnels using Pluto + whack, the tunnels are up, and everything seems fine
T1- GW is terminated for some reason
T2- GW pod restarted by Kubernetes, GW pod will reconfigure IPSec tunnels using Pluto + whack tunnels again.

Traffic between clusters is down for the T2-T1 period 

PFA the GW pod logs on cluster2 (with IPsec debug log enabled) at the time the GW pod on cluster1 was restarted.



[1]
set -e

# These are the ExecStartPre lines from the systemd service definition
/usr/libexec/ipsec/addconn --config /etc/ipsec.conf --checkconfig
/usr/libexec/ipsec/_stackmanager start
/usr/sbin/ipsec --checknss

# Start the daemon itself with any additional arguments passed in
exec /usr/libexec/ipsec/pluto --leak-detective --config /etc/ipsec.conf --nofork  

[2]

[90m2023-06-12T12:56:34.663Z [0m [32mINF [0m ..reswan/libreswan.go:419 libreswan            Executing whack with args: [--psk --encrypt --name submariner-cable-local-cluster-10-56-103-213-0-0 --id 10.56.104.242 --host 10.56.104.242 --client 10.205.0.0/16 --ikeport 4500 --to --id 10.56.103.213 --host 10.56.103.213 --client 10.201.0.0/16 --ikeport 4500 --dpdaction=hold]
002 "submariner-cable-local-cluster-10-56-103-213-0-0": added IKEv2 connection

[3]

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

Version-Release number of selected component (if applicable):



Expected results:


Additional info:

Comment 2 Yossi Boaron 2023-06-20 10:20:04 UTC

Version details:

OCP 4.12.x
RHEL 8.6
OS details: Red Hat Enterprise Linux CoreOS 412.86.202304131008-0 (Ootpa) 4.18.0-372.51.1.el8_6.x86_64 amd64
Libreswan version: pluto_version=4.5, pluto_vendorid=OE-Libreswan-4.5

Comment 3 Paul Wouters 2023-06-22 13:24:46 UTC

Can you share the logs? There are a few things that could be happening:

1) the other end has DPD/liveness enabled, and detects the failing node's pluto is gone and brings down the connections to restart them
2) this happened near the "rekey" timer and the other end send an IKE packet and didn't get a reply, resulting in same as 1)
   (since rekying happens about once an hour with some rekeyfuzz, this is about a 1/6th chance of happening with default rekey timers)
3) ip routing/forwarding around the containers is changed and packet flow is blocked
4) pluto has restarted but connection depends on DNS and is failing to restart because of broken DNS.

Comment 4 Sahana Prasad 2023-06-22 14:44:39 UTC

Pasted the logs here:
https://privatebin.net/?b03645e870d78558#FQX2TDER8rm1NsZnymdTzmPDMmVcdxLB8icjemxeU12T

Comment 5 Yossi Boaron 2023-06-25 07:33:25 UTC

It seems that @dueno managed to reproduce a similar issue,

Daiki, could you please elaborate on ^^?

Comment 6 Yossi Boaron 2023-06-25 07:45:12 UTC

Thanks for your reply @paul.wouters , 

I changed the log file to be public, sorry about that.

The issue happens consistently, Pluto doesn't seem to always restart near a "rekey" timer that is about to expire.

Also didn't use DNS, just ran ping to dest IP that should be routed using the IPsec tunnel, we don't change anything in the routing/forwarding around the containers either.

Comment 7 Daiki Ueno 2023-06-26 13:49:42 UTC

(In reply to Yossi Boaron from comment #5)
> It seems that @dueno managed to reproduce a similar issue,
> 
> Daiki, could you please elaborate on ^^?

I'm actually not sure if I reproduced the exact same issue, but here are the steps I took and the results.

The test is conducted on two machines (10.16.56.36 and 10.16.56.37), with the following steps:

With the following PSK secret in /etc/ipsec.d/test.secrets:

  10.16.56.36 10.16.56.37 : PSK "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"

Setup a subnet-to-subnet VPN connection on 10.16.56.36, mimicking the
Submariner code:

  # ip addr add 172.32.0.254/24 dev ens2:1
  # /usr/libexec/ipsec/pluto
  # ipsec whack --psk --encrypt --forceencaps --name a \
          --id 10.16.56.36 --host 10.16.56.36 --client 172.32.0.0/16 \
          --ikeport 4500 \
          --to \
          --id 10.16.56.37 --host 10.16.56.37 --client 172.30.0.0/16 \
          --ikeport 4500 \
          --dpdaction=hold
  # ipsec whack --route --name a
  # ipsec whack --initiate --asynchronous --name a
  181 "a" #1: initiating IKEv2 connection

Same but the other way around on 10.16.56.37:

  # ip addr add 172.30.0.254/24 dev ens2:1
  # /usr/libexec/ipsec/pluto
  # ipsec whack --psk --encrypt --forceencaps --name a \
          --id 10.16.56.37 --host 10.16.56.37 --client 172.30.0.0/16 \
          --ikeport 4500 \
          --to \
          --id 10.16.56.36 --host 10.16.56.36 --client 172.32.0.0/16 \
          --ikeport 4500 --dpdaction=hold
  # ipsec whack --route --name a
  # ipsec whack --initiate --asynchronous --name a
  002 "a" #3: initiating Child SA using IKE SA #1
  189 "a" #3: sent CREATE_CHILD_SA request for new IPsec SA
  004 "a" #3: initiator established Child SA using #1; IPsec tunnel [172.30.0.0-172.30.255.255:0-65535 0] -> [172.32.0.0-172.32.255.255:0-65535 0] {ESPinUDP=>0x5b323540 <0x63012b64 xfrm=AES_GCM_16_256-NONE NATD=10.16.56.36:4500 DPD=passive}

Test running ping on 10.16.56.36:

  # ping -I 172.32.0.254 172.30.0.254
  PING 172.30.0.254 (172.30.0.254) from 172.32.0.254 : 56(84) bytes of data.
  64 bytes from 172.30.0.254: icmp_seq=1 ttl=64 time=0.423 ms
  64 bytes from 172.30.0.254: icmp_seq=2 ttl=64 time=0.432 ms
  64 bytes from 172.30.0.254: icmp_seq=3 ttl=64 time=0.378 ms
  ^C

  # ipsec trafficstatus
  006 #9: "a", type=ESP, add_time=1687531313, inBytes=420, outBytes=420, maxBytes=2^63B, id='10.16.56.37'
  006 #10: "a", type=ESP, add_time=0, inBytes=2352, outBytes=2352,
  maxBytes=2^63B, id='10.16.56.37'

If I keep running ping on 10.16.56.36 and kill pluto process on 10.16.56.37, ping stalls.  Then I restart pluto and run the ipsec whack
commands until "ipsec whack --route --name a", ping starts responding again after a couple of seconds of waiting.

Some observations:

- In the above I used "pkill pluto" to kill the pluto process (which implies a graceful shutdown through the SIGTERM handler). If I do that with "pkill -KILL pluto", ping stalls similarly, but it resumes immediately after pluto is started (no need to reissue the "ipsec whack" commands)
- The count of SAD reported by "ip xfrm state count" is also different: with SIGTERM it downs to 0 (and to 1 on the peer), while it stays 4 on both peers if pluto is killed with SIGKILL

Comment 8 Yossi Boaron 2023-06-27 07:52:40 UTC

Thanks for the update @dueno,

Well, it seems that traffic is down for both cases when Pluto isn't running, 
also in our case 'IP xfrm state' did not change but ping traffic was down.

Any idea why traffic is down altough 'ip xfrm state' didn't change in both ends (like your case when Pluto killed with KILL signal)?

Comment 9 Daiki Ueno 2023-06-28 15:02:25 UTC

(In reply to Yossi Boaron from comment #8)
 
> Well, it seems that traffic is down for both cases when Pluto isn't running, 
> also in our case 'IP xfrm state' did not change but ping traffic was down.
> 
> Any idea why traffic is down altough 'ip xfrm state' didn't change in both
> ends (like your case when Pluto killed with KILL signal)?

I see the same issue, even with `ipsec whack --shutdown --leave-state`. Also on the host where pluto is shutdown, tcpdump indicates that the host still receives ESP packets, while `ip xfrm monitor` doesn't. So I suspect ESP packets are not decryptable within the kernel after the pluto shutdown.

Sabrina, do you have any clue, or would you suggest how to diagnose the issue further?

Comment 10 Sabrina Dubroca 2023-06-28 16:33:50 UTC

(In reply to Daiki Ueno from comment #9)
> Also
> on the host where pluto is shutdown, tcpdump indicates that the host still
> receives ESP packets, while `ip xfrm monitor` doesn't. 

Small note here, `ip xfrm monitor` watches the configuration (new/modified/removed
states and policies), not the flow of packets.

> So I suspect ESP
> packets are not decryptable within the kernel after the pluto shutdown.

The reproducer in comment 7 uses "--forceencaps". If the pluto daemon is gone,
I guess the UDP socket it created is also gone, and UDP encap stops working.

The config in comment 0 doesn't use forceencaps, but it could still be using
UDP encap. @Yossi, can you confirm if your config uses encapsulation?

You should see it in the output of "ip xfrm state":
    src <node-A> dst <node-B>
            [... some lines]
-->         encap type espinudp sport 4500 dport 4500 addr 0.0.0.0

I'd also expect to see a UDP socket on port 4500 in the pod where pluto is
running (or possibly another port that matches the "encap" line from
"ip xfrm state").
I'm not very familiar with kubernetes, but I guess pluto is killed and its
sockets are closed when the GW stops.


Without encap, I don't see a reason for traffic to fail if all the states
and policies (ip xfrm state + ip xfrm policy) remain in place. In that case
we'd have to dig into the kernel's IPsec statistics or do some tracing to
figure out which function drops the packets.

Comment 11 Yossi Boaron 2023-07-02 07:33:52 UTC

Thanks for the feedback @sdubroca and @dueno ,

Short background on Submariner and Libreswan UDP encapsulation configuration:

Submariner Gateway nodes (Endpoint objects) publish both a PrivateIP and a PublicIP. The PrivateIP is the IP assigned to an interface on the gateway node where the Endpoint originated. The PublicIP is the source IP for the packets sent from the gateway to the Internet which is discovered by default via ipify.org, or my-ip.io and seeip.org fallbacks.


Each gateway implements a UDP NAT-T discovery protocol where each gateway queries the gateways of the remote clusters on both the public and private IPs in order to determine the most suitable IP and its NAT characteristics to use for the tunnel connections, with a preference for the private IP.

IPsec in the Libreswan cable driver will be configured for the more performant ESP protocol whenever possible, which is normally when NAT is not detected and connectivity over the private IP is possible.
In case NAT is detected Libreswan will be configured to use UDP encapsulation.


IIRC, my setup also used Libreswan UDP encapsulation, I'll redeploy the environment to double-check that and update.

Comment 12 Yossi Boaron 2023-07-02 10:55:38 UTC

A. I redeployed the environment and yes, since Submariner detects that there is NAT between clusters, Libreswan's configuration uses UDP encapsulation, see [1]
B. I also checked the UDP socket status on port 4500 on GW node while restarting the Submariner-GW pod (Pluto is running on this pod), indeed UDP sockets seem to be deleted when Pluto is not running, PFA a file that includes the output of ' netstat -apn |grep -w 4500 ' every 2 seconds while restarting GW pod

So, it seems that in case Libreswan is configured to use UDP encapsulation traffic will be down as long as Pluto process is not running




[1] 


src 3.133.146.168 dst 10.0.61.21
        proto esp spi 0xff365d4c reqid 16389 mode tunnel
        replay-window 0 flag af-unspec
        aead rfc4106(gcm(aes)) 0x7ef1192334e7d4b00cd8962fa33c16fe54cd531c92326581a3b0c8e6a6676c3db08e3ee4 128
        encap type espinudp sport 4500 dport 4500 addr 0.0.0.0
        anti-replay esn context:
         seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0
         replay_window 128, bitmap-length 4
         00000000 00000000 00000000 00000000

Comment 13 Yossi Boaron 2023-07-02 10:59:23 UTC

Created attachment 1973712 [details]
UDP socket on port 4500 while Pluto process being restarted

Comment 16 RHEL Program Management 2023-09-20 00:23:26 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 17 RHEL Program Management 2023-09-20 00:24:03 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.

Comment 18 Red Hat Bugzilla 2024-01-19 04:25:20 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days