Description of problem: It seems that during the Refresh firewall rules neutron-openvswitch-agent on the compute nodes deletes the iptables input chain rules. Because of this action, the hypervisor and instances become unreachable. The only logs[1] that seem bind with the issue complain about "Duplicate iptables rule detected" After an iptables restart everything is working again. Version-Release number of selected component (if applicable): - Red Hat OpenStack 16.1 (RHOSP16.1) - non-containerized deployment on RHOSP16.1 (No director) - RHOSP16.1 Neutron Dymanic Routing Agent support for BGP routes - neutron-dynamic-routering plugin (DRAgent) How reproducible: From time to time this event was triggered in an unpredictable way Actual results: - hypervisor and instances become unreachable. Additional info: [1] 0020-sosreport-2022-10-06-ovoamfe.tar.xz/sosreport-2022-10-06-ovoamfe/var/log/neutron/openvswitch-agent.log-20221006: 2022-10-06 01:15:09.043 3985 INFO neutron.agent.securitygroups_rpc [req-2b683a29-2c35-4e8f-a5e6-dea15ab6cae8 - - - - -] Refresh firewall rules 2022-10-06 01:15:09.326 3985 WARNING neutron.agent.linux.iptables_manager [req-2b683a29-2c35-4e8f-a5e6-dea15ab6cae8 - - - - -] Duplicate iptables rule detected. This may indicate a bug in the iptables rule generation code. Line: -A INPUT -s x.x.22.128/26 -i bond0.18 -p tcp -m state --state NEW -m tcp --sport 1025:65535 --dport 22 -j ACCEPT 2022-10-06 01:15:09.327 3985 WARNING neutron.agent.linux.iptables_manager [req-2b683a29-2c35-4e8f-a5e6-dea15ab6cae8 - - - - -] Duplicate iptables rule detected. This may indicate a bug in the iptables rule generation code. Line: -A INPUT -s x.x.21.128/26 -i bond0.18 -p tcp -m state --state NEW -m tcp --dport 5900:6100 -j ACCEPT [...] 2022-10-06 01:15:09.334 3985 WARNING neutron.agent.linux.iptables_manager [req-2b683a29-2c35-4e8f-a5e6-dea15ab6cae8 - - - - -] Duplicate iptables rule detected. This may indicate a bug in the iptables rule generation code. Line: -A INPUT -i bond0.17 -p udp -m udp --dport 4789 -j ACCEPT 2022-10-06 01:15:09.334 3985 WARNING neutron.agent.linux.iptables_manager [req-2b683a29-2c35-4e8f-a5e6-dea15ab6cae8 - - - - -] Duplicate iptables rule detected. This may indicate a bug in the iptables rule generation code. Line: -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 2022-10-06 01:15:09.456 3985 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2b683a29-2c35-4e8f-a5e6-dea15ab6cae8 - - - - -] process_network_ports - iteration:3686503 - agent port security group processed in 0.435 2022-10-06 01:15:09.468 3985 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2b683a29-2c35-4e8f-a5e6-dea15ab6cae8 - - - - -] Configuration for devices up [] and devices down [] completed. [...] 2022-10-06 01:15:59.211 3985 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 110] Connection timed out
I'm wondering why OVS agent implements security groups for bond interface. I'm not able to see OVS configuration because it seems all services were turned off. * neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled) Active: inactive (dead) since Thu 2022-10-06 11:22:08 CEST; 4h 55min ago Process: 3985 ExecStart=/usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent --log-file /var/log/neutron/openvswitch-agent.log (code=exited, status=0/SUCCESS) Process: 3929 ExecStartPre=/usr/bin/neutron-enable-bridge-firewall.sh (code=exited, status=0/SUCCESS) Main PID: 3985 (code=exited, status=0/SUCCESS) * openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/openvswitch.service.d `-flowlimit.conf Active: inactive (dead) since Thu 2022-10-06 11:22:08 CEST; 4h 55min ago Process: 16235 ExecStop=/bin/true (code=exited, status=0/SUCCESS) Process: 3541 ExecStart=/bin/bash -c ovs-appctl upcall/set-flow-limit 32768 (code=exited, status=0/SUCCESS) Process: 3539 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 3541 (code=exited, status=0/SUCCESS) Is it possible to get "ovs-vsctl show" output on the affected node?