Bug 2002278 - After reboot IPSec communication stops working
Summary: After reboot IPSec communication stops working
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn-2021
Version: FDP 21.F
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Mark Gray
QA Contact: ying xu
URL:
Whiteboard:
Depends On:
Blocks: 1782056
TreeView+ depends on / blocked
 
Reported: 2021-09-08 12:08 UTC by Ales Musil
Modified: 2021-10-28 12:09 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-28 12:09:18 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1526 0 None None None 2021-09-08 12:09:55 UTC

Description Ales Musil 2021-09-08 12:08:36 UTC
Description of problem:
After rebooting two host that are configured with OVN IPSec, the network
stops working and host1 cannot ping host2 trough the geneve tunnel. 
This magically start to work the second we ping the other way around from host2 to host1. After that point the communication is restored again. 



Version-Release number of selected component (if applicable):
ovn-2021-21.06.0-17.el8s.x86_64
ovn-2021-host-21.06.0-17.el8s.x86_64

openvswitch2.15-2.15.0-35.el8s.x86_64
openvswitch2.15-ipsec-2.15.0-35.el8s.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Configure two hosts to use IPSec https://docs.ovn.org/en/latest/tutorials/ovn-ipsec.html
2. Create switch with two ports, one subnet:
switch b5c419d6-bf93-4ebf-922b-b800e730dcd0 (switch0)
    port port1
        addresses: ["00:00:00:00:00:01 192.168.100.10"]
    port port2
        addresses: ["00:00:00:00:00:02 192.168.100.20"]

3. Configure ports on every host:
# ovs-vsctl add-port br-int "port1" -- set Interface "port1" type=internal external_ids:iface-id="port1"
# ip link set port1 address 00:00:00:00:00:01
# ip address add 192.168.100.10/24 dev port1
# ip link set up port1

4. ping 192.168.100.20

Actual results:
ping 192.168.100.20 from host1 to host2 gets stuck

tcpdump on host2 shows a lot of unreachable messages:
13:52:10.000695 IP 192.168.122.228 > 192.168.122.125: ESP(spi=0xa5830a6b,seq=0x2), length 156
13:52:10.000721 IP 192.168.122.228 > 192.168.122.125: ESP(spi=0xa5830a6b,seq=0x2), length 156
13:52:10.000864 IP 192.168.122.125 > 192.168.122.228: ICMP host 192.168.122.125 unreachable - admin prohibited filter, length 184
13:52:10.000874 IP 192.168.122.125 > 192.168.122.228: ICMP host 192.168.122.125 unreachable - admin prohibited filter, length 184


Expected results:
ping should work after reboot

Additional info:
Central:
# ovn-nbctl show
switch b5c419d6-bf93-4ebf-922b-b800e730dcd0 (switch0)
    port port1
        addresses: ["00:00:00:00:00:01 192.168.100.10"]
    port port2
        addresses: ["00:00:00:00:00:02 192.168.100.20"]
# ovn-sbctl show
Chassis "758751a8-5f85-4d23-bbe4-cb1eaec6171e"
    hostname: host2
    Encap geneve
        ip: "192.168.122.125"
        options: {csum="true"}
    Port_Binding port2
Chassis "2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52"
    hostname: host1
    Encap geneve
        ip: "192.168.122.228"
        options: {csum="true"}
    Port_Binding port1

Host1:
# ovs-vsctl show
5a77a52b-cf0f-4013-b6c9-c64eb5a78376
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port ovn-758751-0
            Interface ovn-758751-0
                type: geneve
                options: {csum="true", key=flow, local_ip="192.168.122.228", remote_ip="192.168.122.125", remote_name="758751a8-5f85-4d23-bbe4-cb1eaec6171e"}
        Port port1
            Interface port1
                type: internal
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "2.15.2"

# ovs-appctl -t ovs-monitor-ipsec tunnels/show
Interface name: ovn-758751-0 v1 (CONFIGURED)
  Tunnel Type:    geneve
  Local IP:       192.168.122.228
  Remote IP:      192.168.122.125
  Address Family: IPv4
  SKB mark:       None
  Local cert:     /etc/pki/vdsm/ovn/ovn-cert.pem
  Local name:     2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52
  Local key:      /etc/pki/vdsm/ovn/ovn-key.pem
  Remote cert:    None
  Remote name:    758751a8-5f85-4d23-bbe4-cb1eaec6171e
  CA cert:        /etc/pki/vdsm/ovn/ca-cert.pem
  PSK:            None
  Ofport:         1
  CFM state:      Disabled
Kernel policies installed:
  src 192.168.122.228/32 dst 192.168.122.125/32 proto udp dport 6081
  src 192.168.122.228/32 dst 192.168.122.125/32 proto udp dport 6081
  src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081
  src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081
Kernel security associations installed:
  sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp sport 6081
  sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp dport 6081
  sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081
  sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081
  sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081
  sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081
IPsec connections that are active:
  000 #18: "ovn-758751-0-in-1" esp.cdd821df@192.168.122.125 esp.3948656@192.168.122.228 Traffic: ESPin=0B ESPout=0B! ESPmax=0B
  000 #19: "ovn-758751-0-out-1" esp.a5830a6b@192.168.122.125 esp.e6c285f7@192.168.122.228 Traffic: ESPin=0B ESPout=1KB! ESPmax=0B

Host2:
# ovs-vsctl show
032acdf8-e445-477d-aeb5-db453aa9b693
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port ovn-2866d8-0
            Interface ovn-2866d8-0
                type: geneve
                options: {csum="true", key=flow, local_ip="192.168.122.125", remote_ip="192.168.122.228", remote_name="2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52"}
        Port br-int
            Interface br-int
                type: internal
        Port port2
            Interface port2
                type: internal
    ovs_version: "2.15.2"
# ovs-vsctl show
032acdf8-e445-477d-aeb5-db453aa9b693
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port ovn-2866d8-0
            Interface ovn-2866d8-0
                type: geneve
                options: {csum="true", key=flow, local_ip="192.168.122.125", remote_ip="192.168.122.228", remote_name="2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52"}
        Port br-int
            Interface br-int
                type: internal
        Port port2
            Interface port2
                type: internal
    ovs_version: "2.15.2"
# ovs-appctl -t ovs-monitor-ipsec tunnels/show
Interface name: ovn-2866d8-0 v1 (CONFIGURED)
  Tunnel Type:    geneve
  Local IP:       192.168.122.125
  Remote IP:      192.168.122.228
  Address Family: IPv4
  SKB mark:       None
  Local cert:     /etc/pki/vdsm/ovn/ovn-cert.pem
  Local name:     758751a8-5f85-4d23-bbe4-cb1eaec6171e
  Local key:      /etc/pki/vdsm/ovn/ovn-key.pem
  Remote cert:    None
  Remote name:    2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52
  CA cert:        /etc/pki/vdsm/ovn/ca-cert.pem
  PSK:            None
  Ofport:         1
  CFM state:      Disabled
Kernel policies installed:
  src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081
  src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081
  src 192.168.122.125/32 dst 192.168.122.228/32 proto udp sport 6081
  src 192.168.122.125/32 dst 192.168.122.228/32 proto udp sport 6081
Kernel security associations installed:
  sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp dport 6081
  sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp sport 6081
  sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081
  sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081
IPsec connections that are active:
  000 #18: "ovn-2866d8-0-in-1" esp.e6c285f7@192.168.122.228 esp.a5830a6b@192.168.122.125 Traffic: ESPin=0B ESPout=0B! ESPmax=0B
  000 #17: "ovn-2866d8-0-out-1" esp.b095ec0a@192.168.122.228 esp.a910d31d@192.168.122.125 Traffic: ESPin=0B ESPout=0B! ESPmax=0B

Comment 1 ying xu 2021-09-09 01:58:02 UTC
hi, Ales Musil 
I am a little confused .I didn't find "reboot" in your steps. 

so I should reboot after which step to reproduce it?

Thanks very much!

Comment 2 Ales Musil 2021-09-09 05:08:06 UTC
(In reply to ying xu from comment #1)
> hi, Ales Musil 
> I am a little confused .I didn't find "reboot" in your steps. 
> 
> so I should reboot after which step to reproduce it?
> 
> Thanks very much!

Oh, sorry my bad. 

It should be rebooted after step 3.

Thanks,
Ales

Comment 3 Mark Gray 2021-10-13 13:44:49 UTC
Hi,

When it is failing, could you run the following commands on both hosts and post the output?

ip r
ip a
ps -ef | grep pluto
ps -ef | grep ovs-monitor-ipsec
ovs-appctl -t ovs-monitor-ipsec tunnels/show

Thanks

Comment 4 Mark Gray 2021-10-13 15:25:40 UTC
Can you also try 

`systemctl stop firewalld` on both hosts?

Thanks

Comment 5 Ales Musil 2021-10-14 05:12:03 UTC
Hi,

output for both hosts is attached. It has probably something to do with firewall rules as stopping firewalld helped.

Comment 8 Mark Gray 2021-10-14 08:12:08 UTC
Rather than stopping the firewall, can you add rules to it as specified in https://docs.openvswitch.org/en/latest/tutorials/ipsec/#fedora

Let me know if this resolves your issue.

Comment 9 Ales Musil 2021-10-14 08:23:49 UTC
(In reply to Mark Gray from comment #8)
> Rather than stopping the firewall, can you add rules to it as specified in
> https://docs.openvswitch.org/en/latest/tutorials/ipsec/#fedora
> 
> Let me know if this resolves your issue.

No didn't help, I have added permanent ipsec to both hosts and reloaded firewall.
It might be even worse because the other host cannot see the traffic at all now.

Comment 10 Ales Musil 2021-10-14 08:26:41 UTC
Sorry, ignore my last message I had wrong central started. 

It indeed seems to help. But it is not documented [0].
Would it be possible to document it there to prevent any further confusion?

Thanks

[0] https://docs.ovn.org/en/latest/tutorials/ovn-ipsec.html

Comment 11 Mark Gray 2021-10-14 13:22:49 UTC
Sent patch to update the documentation at https://patchwork.ozlabs.org/project/ovn/patch/20211014132134.67138-1-mark.d.gray@redhat.com/

Comment 12 Ales Musil 2021-10-15 05:05:47 UTC
Thanks, but is it only on fedora? This was happening with RHEL and CentOS.

Comment 13 Mark Gray 2021-10-21 17:10:05 UTC
Yeah but the instruction is the same. We don't specifically call out RHEL or CentOS in the documentation. Do you think we should modify it further?

Comment 14 Mark Gray 2021-10-21 17:43:17 UTC
I added an additional patch specifying RHEL and CentOS.

https://patchwork.ozlabs.org/project/ovn/list/?series=268332

Can you please review and, if you have any issues, can you reply upstream?

Thanks,

Mark

Comment 15 Ales Musil 2021-10-25 05:24:31 UTC
Hi,

it looks good.

Thanks


Note You need to log in before you can comment on or make changes to this bug.