Description of problem: After rebooting two host that are configured with OVN IPSec, the network stops working and host1 cannot ping host2 trough the geneve tunnel. This magically start to work the second we ping the other way around from host2 to host1. After that point the communication is restored again. Version-Release number of selected component (if applicable): ovn-2021-21.06.0-17.el8s.x86_64 ovn-2021-host-21.06.0-17.el8s.x86_64 openvswitch2.15-2.15.0-35.el8s.x86_64 openvswitch2.15-ipsec-2.15.0-35.el8s.x86_64 How reproducible: 100% Steps to Reproduce: 1. Configure two hosts to use IPSec https://docs.ovn.org/en/latest/tutorials/ovn-ipsec.html 2. Create switch with two ports, one subnet: switch b5c419d6-bf93-4ebf-922b-b800e730dcd0 (switch0) port port1 addresses: ["00:00:00:00:00:01 192.168.100.10"] port port2 addresses: ["00:00:00:00:00:02 192.168.100.20"] 3. Configure ports on every host: # ovs-vsctl add-port br-int "port1" -- set Interface "port1" type=internal external_ids:iface-id="port1" # ip link set port1 address 00:00:00:00:00:01 # ip address add 192.168.100.10/24 dev port1 # ip link set up port1 4. ping 192.168.100.20 Actual results: ping 192.168.100.20 from host1 to host2 gets stuck tcpdump on host2 shows a lot of unreachable messages: 13:52:10.000695 IP 192.168.122.228 > 192.168.122.125: ESP(spi=0xa5830a6b,seq=0x2), length 156 13:52:10.000721 IP 192.168.122.228 > 192.168.122.125: ESP(spi=0xa5830a6b,seq=0x2), length 156 13:52:10.000864 IP 192.168.122.125 > 192.168.122.228: ICMP host 192.168.122.125 unreachable - admin prohibited filter, length 184 13:52:10.000874 IP 192.168.122.125 > 192.168.122.228: ICMP host 192.168.122.125 unreachable - admin prohibited filter, length 184 Expected results: ping should work after reboot Additional info: Central: # ovn-nbctl show switch b5c419d6-bf93-4ebf-922b-b800e730dcd0 (switch0) port port1 addresses: ["00:00:00:00:00:01 192.168.100.10"] port port2 addresses: ["00:00:00:00:00:02 192.168.100.20"] # ovn-sbctl show Chassis "758751a8-5f85-4d23-bbe4-cb1eaec6171e" hostname: host2 Encap geneve ip: "192.168.122.125" options: {csum="true"} Port_Binding port2 Chassis "2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52" hostname: host1 Encap geneve ip: "192.168.122.228" options: {csum="true"} Port_Binding port1 Host1: # ovs-vsctl show 5a77a52b-cf0f-4013-b6c9-c64eb5a78376 Bridge br-int fail_mode: secure datapath_type: system Port ovn-758751-0 Interface ovn-758751-0 type: geneve options: {csum="true", key=flow, local_ip="192.168.122.228", remote_ip="192.168.122.125", remote_name="758751a8-5f85-4d23-bbe4-cb1eaec6171e"} Port port1 Interface port1 type: internal Port br-int Interface br-int type: internal ovs_version: "2.15.2" # ovs-appctl -t ovs-monitor-ipsec tunnels/show Interface name: ovn-758751-0 v1 (CONFIGURED) Tunnel Type: geneve Local IP: 192.168.122.228 Remote IP: 192.168.122.125 Address Family: IPv4 SKB mark: None Local cert: /etc/pki/vdsm/ovn/ovn-cert.pem Local name: 2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52 Local key: /etc/pki/vdsm/ovn/ovn-key.pem Remote cert: None Remote name: 758751a8-5f85-4d23-bbe4-cb1eaec6171e CA cert: /etc/pki/vdsm/ovn/ca-cert.pem PSK: None Ofport: 1 CFM state: Disabled Kernel policies installed: src 192.168.122.228/32 dst 192.168.122.125/32 proto udp dport 6081 src 192.168.122.228/32 dst 192.168.122.125/32 proto udp dport 6081 src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081 src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081 Kernel security associations installed: sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp sport 6081 sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp dport 6081 sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081 sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081 sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081 sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081 IPsec connections that are active: 000 #18: "ovn-758751-0-in-1" esp.cdd821df.122.125 esp.3948656.122.228 Traffic: ESPin=0B ESPout=0B! ESPmax=0B 000 #19: "ovn-758751-0-out-1" esp.a5830a6b.122.125 esp.e6c285f7.122.228 Traffic: ESPin=0B ESPout=1KB! ESPmax=0B Host2: # ovs-vsctl show 032acdf8-e445-477d-aeb5-db453aa9b693 Bridge br-int fail_mode: secure datapath_type: system Port ovn-2866d8-0 Interface ovn-2866d8-0 type: geneve options: {csum="true", key=flow, local_ip="192.168.122.125", remote_ip="192.168.122.228", remote_name="2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52"} Port br-int Interface br-int type: internal Port port2 Interface port2 type: internal ovs_version: "2.15.2" # ovs-vsctl show 032acdf8-e445-477d-aeb5-db453aa9b693 Bridge br-int fail_mode: secure datapath_type: system Port ovn-2866d8-0 Interface ovn-2866d8-0 type: geneve options: {csum="true", key=flow, local_ip="192.168.122.125", remote_ip="192.168.122.228", remote_name="2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52"} Port br-int Interface br-int type: internal Port port2 Interface port2 type: internal ovs_version: "2.15.2" # ovs-appctl -t ovs-monitor-ipsec tunnels/show Interface name: ovn-2866d8-0 v1 (CONFIGURED) Tunnel Type: geneve Local IP: 192.168.122.125 Remote IP: 192.168.122.228 Address Family: IPv4 SKB mark: None Local cert: /etc/pki/vdsm/ovn/ovn-cert.pem Local name: 758751a8-5f85-4d23-bbe4-cb1eaec6171e Local key: /etc/pki/vdsm/ovn/ovn-key.pem Remote cert: None Remote name: 2866d843-ef5b-4c20-b5e8-4fdb2cb4bb52 CA cert: /etc/pki/vdsm/ovn/ca-cert.pem PSK: None Ofport: 1 CFM state: Disabled Kernel policies installed: src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081 src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081 src 192.168.122.125/32 dst 192.168.122.228/32 proto udp sport 6081 src 192.168.122.125/32 dst 192.168.122.228/32 proto udp sport 6081 Kernel security associations installed: sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp dport 6081 sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp sport 6081 sel src 192.168.122.228/32 dst 192.168.122.125/32 proto udp sport 6081 sel src 192.168.122.125/32 dst 192.168.122.228/32 proto udp dport 6081 IPsec connections that are active: 000 #18: "ovn-2866d8-0-in-1" esp.e6c285f7.122.228 esp.a5830a6b.122.125 Traffic: ESPin=0B ESPout=0B! ESPmax=0B 000 #17: "ovn-2866d8-0-out-1" esp.b095ec0a.122.228 esp.a910d31d.122.125 Traffic: ESPin=0B ESPout=0B! ESPmax=0B
hi, Ales Musil I am a little confused .I didn't find "reboot" in your steps. so I should reboot after which step to reproduce it? Thanks very much!
(In reply to ying xu from comment #1) > hi, Ales Musil > I am a little confused .I didn't find "reboot" in your steps. > > so I should reboot after which step to reproduce it? > > Thanks very much! Oh, sorry my bad. It should be rebooted after step 3. Thanks, Ales
Hi, When it is failing, could you run the following commands on both hosts and post the output? ip r ip a ps -ef | grep pluto ps -ef | grep ovs-monitor-ipsec ovs-appctl -t ovs-monitor-ipsec tunnels/show Thanks
Can you also try `systemctl stop firewalld` on both hosts? Thanks
Hi, output for both hosts is attached. It has probably something to do with firewall rules as stopping firewalld helped.
Rather than stopping the firewall, can you add rules to it as specified in https://docs.openvswitch.org/en/latest/tutorials/ipsec/#fedora Let me know if this resolves your issue.
(In reply to Mark Gray from comment #8) > Rather than stopping the firewall, can you add rules to it as specified in > https://docs.openvswitch.org/en/latest/tutorials/ipsec/#fedora > > Let me know if this resolves your issue. No didn't help, I have added permanent ipsec to both hosts and reloaded firewall. It might be even worse because the other host cannot see the traffic at all now.
Sorry, ignore my last message I had wrong central started. It indeed seems to help. But it is not documented [0]. Would it be possible to document it there to prevent any further confusion? Thanks [0] https://docs.ovn.org/en/latest/tutorials/ovn-ipsec.html
Sent patch to update the documentation at https://patchwork.ozlabs.org/project/ovn/patch/20211014132134.67138-1-mark.d.gray@redhat.com/
Thanks, but is it only on fedora? This was happening with RHEL and CentOS.
Yeah but the instruction is the same. We don't specifically call out RHEL or CentOS in the documentation. Do you think we should modify it further?
I added an additional patch specifying RHEL and CentOS. https://patchwork.ozlabs.org/project/ovn/list/?series=268332 Can you please review and, if you have any issues, can you reply upstream? Thanks, Mark
Hi, it looks good. Thanks