Bug 2138135
| Summary: | ovs-monitor-ipsec being single-threaded causes long delays in large ipsec OpenShift clusters | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | PX_OpenShift <px_openshift> |
| Component: | openvswitch2.17 | Assignee: | Mohammad Heib <mheib> |
| Status: | CLOSED EOL | QA Contact: | ovs-qe |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | FDP 22.F | CC: | akaris, ctrautma, fleitner, jhsiao, mheib, nstamate, ralongi |
| Target Milestone: | --- | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-10-08 17:49:14 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
PX_OpenShift
2022-10-27 11:12:24 UTC
This can easily be reproduced on a standalone VM with an OVS setup; e.g., something with the following nmstate configuration:
{code}
---
interfaces:
- name: br-ex
description: ovs bridge with eth1 as a port
type: ovs-bridge
state: up
bridge:
options:
stp: false
port:
- name: br-ex
- name: br-ex
type: ovs-interface
state: up
ipv4:
enabled: true
address:
- ip: 192.0.2.11
prefix-length: 24
- name: eth1
type: ethernet
state: up
ipv4:
enabled: true
address:
- ip: 192.168.123.11
prefix-length: 24
{code}
Then, add a bunch of dummy geneve tunnels (the destinations do not have to exist):
{code}
for i in {20..200}; do
ip_1=192.168.123.11
ip_2=192.168.123.$i
ovs-vsctl add-port br-ex tun$i -- set interface tun$i type=geneve options:remote_ip=$ip_2 options:psk=swordfish
ovs-vsctl set Interface tun$i options:local_ip=$ip_1
done
{code}
Then, hack the script to add some debug output as needed:
{code}
vim /usr/share/openvswitch/scripts/ovs-monitor-ipsec
(...)
587 def refresh(self, monitor):
588 vlog.info("Refreshing LibreSwan configuration")
589 start = time.time()
590 subprocess.call([self.IPSEC, "auto", "--ctlsocket", self.IPSEC_CTL,
591 "--config", self.IPSEC_CONF, "--rereadsecrets"])
(...)
677 subprocess.call([self.IPSEC, "auto",
678 "--config", self.IPSEC_CONF,
679 "--ctlsocket", self.IPSEC_CTL,
680 "--delete",
681 "--asynchronous", "prevent_unencrypted_vxlan"])
682 monitor.conf_in_use["skb_mark"] = monitor.conf["skb_mark"]
683 end = time.time()
684 vlog.warn("Refresh elapsed time: %f" % (end - start))
{code}
And start the openvswitch-ipsec service:
{code}
systemctl restart openvswitch-ipsec.service
{code}
Then, trigger a refresh:
{code}
ovs-appctl -t ovs-monitor-ipsec refresh
{code}
And with the aforementioned debug output:
{code}
Oct 25 13:49:01 ovn-ipsec1 ovs-monitor-ips[12385]: ovs| 350 | ovs-monitor-ipsec | INFO | Refreshing LibreSwan configuration
Oct 25 13:51:50 ovn-ipsec1 ovs-monitor-ips[12385]: ovs| 351 | ovs-monitor-ipsec | WARN | Refresh elapsed time: 169.737755
{code}
And with the following one can observe that it's looping through the activation commands:
{code}
watch "ps aux | grep auto"
{code}
Hi @akaris Sorry for the late comments i was in long PTO. dose this issue still relevant? Thanks, Mohammad Disclaimer: I switched teams and I'm not on the code writing side of things any more. But I think this is still relevant. We worked around the issue in OCP by removing some monitoring, but I assume that the initial startup delay is still there - it would be nice is the script / ipsec tunnel activation could be parallelized, if at all possible. But if you want a more definitive answer, reach out to the SDN team :-) thank you for Your quick response, I will try to reproduce it with the above reproduce. Again thank you So much for the response and good luck with the new team 🙏 This bug did not meet the criteria for automatic migration and is being closed. If the issue remains, please open a new ticket in https://issues.redhat.com/browse/FDP |