Bug 1933625

Summary: Booting VM with a Floating IP and Pinging it fails with OVSDB Error: transaction failed messages
Product: Red Hat OpenStack Reporter: Asma Syed Hameed <asyedham>
Component: python-networking-ovnAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: apevec, dalvarez, dceara, egarciar, jlibosva, jraju, lhh, majopela, rkhan, scohen
Target Milestone: ---Keywords: TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-22 20:00:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Asma Syed Hameed 2021-03-01 09:50:36 UTC
Description of problem:
When performing neutron scale testing on OSP16.1, we see OVSDB Error: transaction failed messages in neutron/server.log and even the ovn is dead.

https://gist.github.com/asyedham/ad441ce7ce7c46c2dd0e6d6aec1bd096

This is the rally plugin used(create network, subnet, boot server with fip and ping)for a total of 2000 times at a concurrency of 16:
https://github.com/openstack/browbeat/blob/master/rally/rally-plugins/netcreate-boot-ping/netcreate_nova-boot-fip-ping.py

Note: Neutron resources are still persistent where as nova is able to delete the resource after the workload is run.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.1.4 GA (Train)

How reproducible:
100%

Steps to Reproduce:
1. Run the scenario netcreate-boot-ping
2. Observe the results and neutron logs


Rally test and errors log is at http://perf1.perf.lab.eng.bos.redhat.com/pub/asyedham/neutron-scale-testing/results/20210225-094725/rally/simple-plugins/netcreate-boot-ping/20210225-094725-browbeat-netcreate-boot-ping-16-1-iteration-0.log

PFA attached neutron, openvswitch and nova logs captured during rally run 
http://perf1.perf.lab.eng.bos.redhat.com/pub/asyedham/neutron-scale-testing/neutron-scale-test/

grafana snapshot: https://snapshot.raintank.io/dashboard/snapshot/VaU3D8L7o5HPnCYkim0iXrNblkRYJlZT

Additional info:
ovn2.13 scratch build based on ovn2.13-20.12.0-19 with logical datapath groups enabled by default:

http://brew-task-repos.usersys.redhat.com/repos/scratch/dceara/ovn2.13/20.12.0/20pvt_dp_groups.el8fdp/


Also, we have increased the timeouts from 300s to 3000s
nova_server_boot_timeout = 3000.0
vm_ping_timeout = 3000.0