1858745 – 2 minutes of traffic loss during openvswitch package update

Bug 1858745 - 2 minutes of traffic loss during openvswitch package update

Summary: 2 minutes of traffic loss during openvswitch package update

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	tripleo-ansible
Sub Component:
Version:	16.1 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z1
Target Release:	16.1 (Train on RHEL 8.2)
Assignee:	Sofer Athlan-Guyot
QA Contact:	Arik Chernetsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-20 09:30 UTC by Alex Katz
Modified:	2020-08-27 15:19 UTC (History)
CC List:	17 users (show)
Fixed In Version:	tripleo-ansible-0.5.1-0.20200611113659.34b8fcc.el8ost openstack-tripleo-heat-templates-11.3.2-0.20200616081535.396affd.el8ost
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-27 15:19:10 UTC
Target Upstream Version:
Embargoed:
Flags:	mbollo: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1888651	None	None	None	2020-07-23 12:25:15 UTC
OpenStack gerrit	742632	None	MERGED	Add openvswitch special treatment to update too.	2020-11-26 15:59:30 UTC
OpenStack gerrit	742968	None	MERGED	Add tripleo_ovs_upgrade module.	2020-11-26 15:59:31 UTC
OpenStack gerrit	747270	None	MERGED	Add OpenFlow10 to protocols for backwards compatability	2020-11-26 15:59:31 UTC
Red Hat Product Errata	RHBA-2020:3542	None	None	None	2020-08-27 15:19:28 UTC

Description Alex Katz 2020-07-20 09:30:14 UTC

Description of problem:
During the update from OSP16 to OSP16.1 there is a 2 minute traffic loss because of the openvswitch package is updated on compute nodes:

Installed: openvswitch2.13-2.13.0-39.el8fdp.x86_64
Installed: rhosp-openvswitch-2.13-8.el8ost.noarch

Removed: openvswitch2.11-2.11.0-35.el8fdp.x86_64
Removed: rhosp-openvswitch-2.11-0.5.el8ost.noarch

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jakub Libosvar 2020-07-21 13:27:16 UTC

Can you please attach sosreports from the affected nodes + update logs from the Undercloud?

Comment 4 Sofer Athlan-Guyot 2020-07-22 15:19:10 UTC

Hi,

one more note about the possible solution.  The upgrade code that
prevents openvswitch reboot is rather involved[1], that why using some
migration mechanism would be better IMHO.

[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/tripleo-packages/tripleo-packages-baremetal-puppet.yaml#L359-L495

Comment 5 Sofer Athlan-Guyot 2020-07-22 17:11:02 UTC

Hi,

so I've made a mistake in the tz used for the different logs especially for the ping test.  Those are in unix time and my conversion was wrong.

Using no TZ when doing the conversion fixed it:

cat  undercloud-0/home/stack/ping_results_202007192119.log | TZ= perl -pe 's/([\d]{10}\.[\d]{3})/localtime $1/eg;' | head -n2
PING 10.0.0.246 (10.0.0.246) 56(84) bytes of data.
[Sun Jul 19 21:19:52 2020130] 64 bytes from 10.0.0.246: icmp_seq=1 ttl=63 time=8.13 ms

which has the same tz than the Controller.log

head -1 undercloud-0/home/stack//overcloud_update_run_Controller.log
2020-07-19 21:19:53 | Running minor update all playbooks for Controller role

Now the cut happen at:
[Sun Jul 19 23:32:13 2020343] 64 bytes from 10.0.0.246: icmp_seq=7930 ttl=63 time=1.96 ms
[Sun Jul 19 23:32:27 2020673] From 10.0.0.246 icmp_seq=7941 Destination Host Unreachable
[Sun Jul 19 23:32:27 2020743] From 10.0.0.246 icmp_seq=7942 Destination Host Unreachable
...
[Sun Jul 19 23:34:14 2020864] From 10.0.0.246 icmp_seq=8046 Destination Host Unreachable
[Sun Jul 19 23:34:14 2020872] From 10.0.0.246 icmp_seq=8047 Destination Host Unreachable
[Sun Jul 19 23:34:15 2020672] 64 bytes from 10.0.0.246: icmp_seq=8048 ttl=63 time=1059 ms
[Sun Jul 19 23:34:15 2020808] 64 bytes from 10.0.0.246: icmp_seq=8049 ttl=63 time=19.0 ms

And this is during Compute update in undercloud-0/home/stack//overcloud_update_run_Compute.log

"Removed: network-scripts-10.00.4-1.el8.x86_64", "Removed:
unbound-libs-1.7.3-8.el8.x86_64", "Removed:
network-scripts-openvswitch2.11-2.11.0-35.el8fdp.x86_64", "Removed:
nftables-1:0.9.0-14.el8.x86_64"]}

2020-07-19 23:34:38 | 
2020-07-19 23:34:38 | TASK [Ensure openvswitch is running after update] ******************************
2020-07-19 23:34:38 | Sunday 19 July 2020  23:34:10 +0000 (0:05:49.926)       0:06:26.658 *********** 
2020-07-19 23:34:38 | changed: [compute-1] => {"changed": true, "enabled": true, "name": "openvswitch", "state": "started", "status": {"ActiveEnterTimestamp": "Sun 2020-07-19 16:10:05 UTC", "ActiveEnterTimestampMonotonic": "10155462", "Ac
tiveExitTimestamp": "Sun 2020-07-19 23:32:15 UTC", "ActiveExitTimestampMonotonic": "26539527478", "ActiveState": "inactive", "After": "network-pre.targe

In compute-0/var/log/openvswitch/ovs-vswitchd.log we can see that all interfaces are deleted:

2020-07-19T23:32:14.188Z|00410|bridge|INFO|bridge br-tun: deleted interface patch-int on port 1
2020-07-19T23:32:14.188Z|00411|bridge|INFO|bridge br-tun: deleted interface br-tun on port 65534
2020-07-19T23:32:14.188Z|00412|bridge|INFO|bridge br-tun: deleted interface vxlan-ac110239 on port 2
2020-07-19T23:32:14.188Z|00413|bridge|INFO|bridge br-tun: deleted interface vxlan-ac110228 on port 4
2020-07-19T23:32:14.188Z|00414|bridge|INFO|bridge br-tun: deleted interface vxlan-ac110258 on port 7
2020-07-19T23:32:14.188Z|00415|bridge|INFO|bridge br-tun: deleted interface vxlan-ac110235 on port 5
2020-07-19T23:32:14.188Z|00416|bridge|INFO|bridge br-tun: deleted interface vxlan-ac110220 on port 6
2020-07-19T23:32:14.188Z|00417|bridge|INFO|bridge br-tun: deleted interface vxlan-ac11023e on port 3
2020-07-19T23:32:14.194Z|00418|bridge|INFO|bridge br-int: deleted interface qvob4b75086-38 on port 71
2020-07-19T23:32:14.194Z|00419|bridge|INFO|bridge br-int: deleted interface int-br-isolated on port 2
2020-07-19T23:32:14.194Z|00420|bridge|INFO|bridge br-int: deleted interface patch-tun on port 3
2020-07-19T23:32:14.195Z|00421|bridge|INFO|bridge br-int: deleted interface br-int on port 65534
2020-07-19T23:32:14.197Z|00422|bridge|INFO|bridge br-int: deleted interface int-br-ex on port 1
2020-07-19T23:32:14.200Z|00423|bridge|INFO|bridge br-ex: deleted interface ens5 on port 1
2020-07-19T23:32:14.200Z|00424|bridge|INFO|bridge br-ex: deleted interface br-ex on port 65534
2020-07-19T23:32:14.200Z|00425|bridge|INFO|bridge br-ex: deleted interface phy-br-ex on port 2
2020-07-19T23:32:14.204Z|00426|bridge|INFO|bridge br-isolated: deleted interface phy-br-isolated on port 5
2020-07-19T23:32:14.205Z|00427|bridge|INFO|bridge br-isolated: deleted interface vlan20 on port 2
2020-07-19T23:32:14.205Z|00428|bridge|INFO|bridge br-isolated: deleted interface vlan30 on port 4
2020-07-19T23:32:14.205Z|00429|bridge|INFO|bridge br-isolated: deleted interface vlan50 on port 3
2020-07-19T23:32:14.205Z|00430|bridge|INFO|bridge br-isolated: deleted interface br-isolated on port 65534
2020-07-19T23:32:14.205Z|00431|bridge|INFO|bridge br-isolated: deleted interface ens4 on port 1


and recreated later:

2020-07-19T23:34:12.185Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2020-07-19T23:34:12.199Z|00002|ovs_numa|INFO|Discovered 8 CPU cores on NUMA node 0
2020-07-19T23:34:12.199Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes and 8 CPU cores
2020-07-19T23:34:12.201Z|00004|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2020-07-19T23:34:12.201Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2020-07-19T23:34:12.206Z|00006|dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable
2020-07-19T23:34:12.216Z|00007|ofproto_dpif|INFO|system@ovs-system: Datapath supports recirculation
2020-07-19T23:34:12.217Z|00008|ofproto_dpif|INFO|system@ovs-system: VLAN header stack length probed as 2
2020-07-19T23:34:12.217Z|00009|ofproto_dpif|INFO|system@ovs-system: MPLS label stack length probed as 1
2020-07-19T23:34:12.217Z|00010|ofproto_dpif|INFO|system@ovs-system: Datapath supports truncate action
2020-07-19T23:34:12.217Z|00011|ofproto_dpif|INFO|system@ovs-system: Datapath supports unique flow ids
2020-07-19T23:34:12.217Z|00012|ofproto_dpif|INFO|system@ovs-system: Datapath supports clone action
2020-07-19T23:34:12.217Z|00013|ofproto_dpif|INFO|system@ovs-system: Max sample nesting level probed as 10
2020-07-19T23:34:12.217Z|00014|ofproto_dpif|INFO|system@ovs-system: Datapath supports eventmask in conntrack action
2020-07-19T23:34:12.217Z|00015|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_clear action
2020-07-19T23:34:12.217Z|00016|ofproto_dpif|INFO|system@ovs-system: Max dp_hash algorithm probed to be 0
2020-07-19T23:34:12.217Z|00017|ofproto_dpif|INFO|system@ovs-system: Datapath does not support check_pkt_len action
2020-07-19T23:34:12.217Z|00018|ofproto_dpif|INFO|system@ovs-system: Datapath does not support timeout policy in conntrack action
2020-07-19T23:34:12.217Z|00019|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state
2020-07-19T23:34:12.217Z|00020|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_zone
2020-07-19T23:34:12.217Z|00021|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_mark
2020-07-19T23:34:12.217Z|00022|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2020-07-19T23:34:12.217Z|00023|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2020-07-19T23:34:12.217Z|00024|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2020-07-19T23:34:12.217Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2020-07-19T23:34:12.217Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2020-07-19T23:34:12.253Z|00027|bridge|INFO|bridge br-ex: added interface ens5 on port 1
2020-07-19T23:34:12.253Z|00028|bridge|INFO|bridge br-ex: added interface br-ex on port 65534
2020-07-19T23:34:12.253Z|00029|bridge|INFO|bridge br-ex: added interface phy-br-ex on port 2
2020-07-19T23:34:12.253Z|00030|bridge|INFO|bridge br-tun: added interface patch-int on port 1
2020-07-19T23:34:12.253Z|00031|bridge|INFO|bridge br-tun: added interface br-tun on port 65534
2020-07-19T23:34:12.254Z|00032|bridge|INFO|bridge br-tun: added interface vxlan-ac110228 on port 4
2020-07-19T23:34:12.254Z|00033|bridge|INFO|bridge br-tun: added interface vxlan-ac110239 on port 2
2020-07-19T23:34:12.255Z|00034|bridge|INFO|bridge br-tun: added interface vxlan-ac110258 on port 7
2020-07-19T23:34:12.255Z|00035|bridge|INFO|bridge br-tun: added interface vxlan-ac110235 on port 5
2020-07-19T23:34:12.255Z|00036|bridge|INFO|bridge br-tun: added interface vxlan-ac110220 on port 6
2020-07-19T23:34:12.255Z|00037|bridge|INFO|bridge br-tun: added interface vxlan-ac11023e on port 3
2020-07-19T23:34:12.255Z|00038|bridge|INFO|bridge br-int: added interface qvob4b75086-38 on port 71


and then in compute-0/var/log/message we can see that openvswitch is stopped.

Jul 19 23:32:14 compute-0 systemd[1]: Stopping Open vSwitch...
Jul 19 23:32:14 compute-0 systemd[1]: Stopped Open vSwitch.
Jul 19 23:32:14 compute-0 systemd[1]: Stopping Open vSwitch Forwarding Unit...
Jul 19 23:32:14 compute-0 ovs-ctl[141379]: Exiting ovs-vswitchd (2731) [  OK  ]
Jul 19 23:32:14 compute-0 kernel: device vxlan_sys_4789 left promiscuous mode
Jul 19 23:32:14 compute-0 NetworkManager[2523]: <info>  [1595201534.2180] device (vxlan_sys_4789): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Jul 19 23:32:14 compute-0 systemd[1]: Stopped Open vSwitch Forwarding Unit.
Jul 19 23:32:14 compute-0 systemd[1]: Stopping Open vSwitch Database Unit...
Jul 19 23:32:14 compute-0 ovs-ctl[141473]: Exiting ovsdb-server (2600) [  OK  ]
Jul 19 23:32:14 compute-0 systemd[1]: Stopped Open vSwitch Database Unit.

and then started again 2 min later:
Jul 19 23:34:11 compute-0 systemd[1]: Reloading.
Jul 19 23:34:11 compute-0 systemd[1]: Starting Open vSwitch Database Unit...
...
Jul 19 23:34:12 compute-0 systemd[1]: Starting Open vSwitch Forwarding Unit...


Bottom line, all this happen during yum upgrade and the recovery is a explicit tasks tasks:

2020-07-19 23:34:07 | TASK [Update all packages] *****************************************************
2020-07-19 23:34:07 | Sunday 19 July 2020  23:28:20 +0000 (0:00:00.179)       0:00:36.732 *********** 

and we get the system started again when we do that task:

2020-07-19 23:34:38 | TASK [Ensure openvswitch is running after update] ******************************
2020-07-19 23:34:38 | Sunday 19 July 2020  23:34:10 +0000 (0:05:49.926)       0:06:26.658 ***********

I need Networking help to understand why updating packages would stop the service and why we need an explicit restart task to have it working again.

In the template this match those tasks https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/tripleo-packages/tripleo-packages-baremetal-puppet.yaml#L582-L596

Comment 6 Brent Eagles 2020-07-22 20:23:00 UTC

Openvswitch packaging is known to possibly disrupt the data plane by restarting services which is why the aforementioned template starting at around https://github.com/openstack/tripleo-heat-templates/blob/3c49cc8281196829882b1342501c6ba78213a40c/deployment/tripleo-packages/tripleo-packages-baremetal-puppet.yaml#L386 handles openvswitch related packaging differently to avoid outages on upgrade. Where this seems to be happening on an update, these tasks should be referenced in the update_tasks as well.

Comment 7 Bernard Cafarelli 2020-07-22 20:37:50 UTC

> I need Networking help to understand why updating packages would stop the
> service and why we need an explicit restart task to have it working again.

This is the result of openvswitch package versioning, actual package name is openvswitch2.13 (for example), and there is a wrapper package rhosp-openvswitch to handle that. So updating major version means removing the previous major version package (stopping the service) and installing the newer version (need an explicit start to re-enable it). This is why upgrade has specific code to remove old version without stopping ovs itself, and enable on new package.

(hopefully I remember it correctly, Brent feel free to correct if details were wrong)

Comment 8 Daniel Alvarez Sanchez 2020-07-23 08:04:24 UTC

(In reply to Bernard Cafarelli from comment #7)
> > I need Networking help to understand why updating packages would stop the
> > service and why we need an explicit restart task to have it working again.
> 
> This is the result of openvswitch package versioning, actual package name is
> openvswitch2.13 (for example), and there is a wrapper package
> rhosp-openvswitch to handle that. So updating major version means removing
> the previous major version package (stopping the service) and installing the
> newer version (need an explicit start to re-enable it). This is why upgrade
> has specific code to remove old version without stopping ovs itself, and
> enable on new package.
> 
> (hopefully I remember it correctly, Brent feel free to correct if details
> were wrong)

I think this is correct.

While we initially thought that Y versions of OVS would be tied to major versions of OSP, this is not true anymore and we'll keep seeing it so looks like the fix is accounting for it during the update task as Brent pointed out.

We faced this in the past and possibly was left unnoticed. For example in OSP 13 we moved from OVS 2.9 to 2.11 at some point (z10 IIRC). Did we experience the same downtime during that update? It looks like the answer is yes but the way we measure the downtime in % of the total job duration possibly hid it. IMO, we must change the downtime SLA from a % of the total job duration to an absolute value in seconds.


nit.: I think that the BZ title might be misleading as it only happens on updates that upgrades OVS (which should be rare compared to all possible updates). Is this right Sofer?

Comment 9 Sofer Athlan-Guyot 2020-07-23 12:43:17 UTC

(In reply to Daniel Alvarez Sanchez from comment #8)
> IMO, we must change the downtime SLA from a % of the total
> job duration to an absolute value in seconds.

Here the review for tripleo-upgarde to change from % to seconds[1].  We're aiming at
a 0 seconds ping loss and see how it goes.

[1] https://review.opendev.org/742626


> nit.: I think that the BZ title might be misleading as it only happens on
> updates that upgrades OVS (which should be rare compared to all possible
> updates). Is this right Sofer?

The ping loss is linked to the update of ovs, and this happen when we're coming from 16.1.  
For 16.1 only we don't really update anything so there should be no cut.  So maybe what should
be pointed out is that:
 - this happen when coming from 16.0
 - this may happen in osp13 as well: we need to check the jobs there.

Comment 14 Sofer Athlan-Guyot 2020-07-30 14:28:54 UTC

To round up the osp13 questions, we have checked it and the workaround is still there for osp13 update, so nothing has to be done there.

Comment 18 Eran Kuris 2020-08-18 09:51:25 UTC

Hi,
According to the last run looks like we have regression and the process failed due packet loss 

core_puddle: RHOS_TRUNK-16.0-RHEL-8-20200204.n.1
core_puddle: RHOS-16.1-RHEL-8-20200813.n.0


TASK [tripleo-upgrade : stop l3 agent connectivity check] **********************
task path: /home/rhos-ci/jenkins/workspace/DFG-network-neutron-16-to-16.1-from-GA-composable-ipv4/infrared/plugins/tripleo-upgrade/infrared_plugin/roles/tripleo-upgrade/tasks/common/l3_agent_connectivity_check_stop_script.yml:2
Monday 17 August 2020  17:42:07 +0000 (0:28:50.778)       3:26:29.490 ********* 
fatal: [undercloud-0]: FAILED! => {
    "changed": true,
    "cmd": "source /home/stack/overcloudrc\n /home/stack/l3_agent_stop_ping.sh 0",
    "delta": "0:00:00.116762",
    "end": "2020-08-17 17:42:08.244176",
    "rc": 1,
    "start": "2020-08-17 17:42:08.127414"
}

STDOUT:

11183 packets transmitted, 11176 received, +3 errors, 0.062595% packet loss, time 11818ms
rtt min/avg/max/mdev = 0.613/1.432/150.010/1.570 ms, pipe 4
Ping loss higher than 0 seconds detected (7 seconds)


MSG:

non-zero return code

	to retry, use: --limit @/home/rhos-ci/jenkins/workspace/DFG-network-neutron-16-to-16.1-from-GA-composable-ipv4/infrared/plugins/tripleo-upgrade/infrared_plugin/main.retry

PLAY RECAP *********************************************************************
undercloud-0               : ok=93   changed=38   unreachable=0    failed=1

Comment 21 Jakub Libosvar 2020-08-18 12:34:00 UTC

All 7 packets were lost in the same timeframe, putting it here to ease the troubleshooting:

[1597677871.932560] 64 bytes from 10.0.0.243: icmp_seq=2940 ttl=63 time=1.22 ms
[1597677872.934097] 64 bytes from 10.0.0.243: icmp_seq=2941 ttl=63 time=1.29 ms
[1597677881.118841] From 10.0.0.28 icmp_seq=2946 Destination Host Unreachable
[1597677881.118957] From 10.0.0.28 icmp_seq=2947 Destination Host Unreachable
[1597677881.118962] From 10.0.0.28 icmp_seq=2948 Destination Host Unreachable
[1597677881.121360] 64 bytes from 10.0.0.243: icmp_seq=2949 ttl=63 time=2.56 ms
[1597677882.121932] 64 bytes from 10.0.0.243: icmp_seq=2950 ttl=63 time=1.62 ms

Comment 29 Bernard Cafarelli 2020-08-20 16:29:40 UTC

OK we spent some time looking at a job with similar errors, we do see a ~9 seconds dataplane downtime. On one specific test setup, it also caused connectivity loss to the node, which required a reboot. It is speficic to ML2/OVS deployments and happens when we start the new neutron_ovs_agent container.

At that time in ovs-vswitchd.log we get:
2020-08-19T19:36:25.866Z|00976|vconn|WARN|unix#2: version negotiation failed (we support versions 0x04, 0x06, peer supports version 0x01)
2020-08-19T19:36:25.866Z|00977|rconn|WARN|br-int<->unix#2: connection dropped (Protocol error)
followed by:
2020-08-19T19:36:33.969Z|00991|bridge|INFO|bridge br-int: deleted interface int-br-ex on port 1
2020-08-19T19:36:33.973Z|00992|bridge|INFO|bridge br-ex: deleted interface phy-br-ex on port 2
2020-08-19T19:36:33.976Z|00993|bridge|INFO|bridge br-int: deleted interface int-br-isolated on port 2
2020-08-19T19:36:33.979Z|00994|bridge|INFO|bridge br-isolated: deleted interface phy-br-isolated on port 6
[...]
2020-08-19T19:36:39.076Z|01003|bridge|INFO|bridge br-int: added interface int-br-ex on port 346
2020-08-19T19:36:39.081Z|01004|bridge|INFO|bridge br-ex: added interface phy-br-ex on port 3

This is apparently caused by the destroy_patch_ports.py script which is run on container start. Usually, it should not do anything when just restarting a container, as ovs is up and a canary check is performed, but for updates from versions with ovs 2.11 (so 16 to 16.1), this check seems to fail (which causes patch ports recreation)
As we see with the version errors in logs, this comes from a workaround needed by ovs 2.11:
https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/ansible_plugins/modules/tripleo_ovs_upgrade.py#L162

We set OpenFlow versions to 1.3 1.5 only while the simple script uses 1.0.

Manually testing with OVS 2.11 shows that after running "ovs-vsctl set bridge br-int protocols=OpenFlow13,OpenFlow15" and restarting neutron_ovs_agent we see similar logs in ovs-vswitchd.log. Adding OpenFlow10 to the list, we do not see similar logs


The fix therefore seems to be to also set OpenFlow10 in tripleo_ovs_upgrade.py workaround. Any potential side-effect here? (I think it was not added in initial workaround as OVN does not need this old version).

Comment 30 Brent Eagles 2020-08-20 18:36:40 UTC

Submitted https://review.opendev.org/#/c/747270/ in case this is the correct fix.

Comment 34 Eran Kuris 2020-08-25 06:47:38 UTC

The bug fixed and verified : 
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-neutron-16-to-16.1-from-GA-composable-ipv4/12/

core_puddle: RHOS-16.1-RHEL-8-20200821.n.0
[stack@undercloud-0 ~]$ rpm -qa | grep tripleo-ansible-0.5.1-0.202
tripleo-ansible-0.5.1-0.20200611113659.34b8fcc.el8ost.noarch

[stack@undercloud-0 ~]$ rpm -qa | grep openstack-tripleo-heat-templates-11.3.2-0.
openstack-tripleo-heat-templates-11.3.2-0.20200616081539.396affd.el8ost.noarch

Comment 36 errata-xmlrpc 2020-08-27 15:19:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3542

Note You need to log in before you can comment on or make changes to this bug.