Bug 2069668

Summary: [PerfCI][ovn][16.2] dynamic-workload job fails randomly while pinging test vms
Product: Red Hat OpenStack Reporter: Yatin Karel <ykarel>
Component: python-networking-ovnAssignee: Arnau Verdaguer <averdagu>
Status: CLOSED CURRENTRELEASE QA Contact: Bharath M V <bmv>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: amusil, apevec, dalvarez, egarciar, lhh, majopela, mburns, scohen
Target Milestone: z3Keywords: Regression, Reopened, TestOnly, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2069711 2069714 (view as bug list) Environment:
Last Closed: 2024-03-28 13:40:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2069714, 2069783    
Bug Blocks:    

Description Yatin Karel 2022-03-29 12:40:07 UTC
Description of problem:
browbeat scenario fails in job DFG-perfscale-PerfCI-OSP16.2-dynamic-workloads-ovn always since some time.

Version-Release number of selected component (if applicable):
ovn-2021-21.12.0-11.el8fdp.x86_64

How reproducible:
always in https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/perfscale/view/PerfCI/job/DFG-perfscale-PerfCI-OSP16.2-dynamic-workloads-ovn/ atleast since 1st March, before that there was some other failure. Last success for the job was seen on 30th Jan, 2022.

This job runs browbeat's dynamic_workload_min[1] scenario's 5 iteration, and then ping to a vm fails in an iteration, the iteration aborts, and next iteration continues and then some vm pass and then one fails.

The similar job is passing for 16.1, so seems some regression in 16.2.


Steps to Reproduce:
1. can be reproduced with https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/perfscale/view/PerfCI/job/DFG-perfscale-PerfCI-custom/65/parameters/
2.
3.

Actual results:
ping to vm fails randomly


Expected results:
all scenarios should succeed.


Additional info:


[1] https://opendev.org/x/browbeat/src/branch/master/rally/rally-plugins/dynamic-workloads/dynamic_workload_min.py#L70-L73

Comment 1 Daniel Alvarez Sanchez 2022-03-29 12:52:51 UTC

*** This bug has been marked as a duplicate of bug 2066413 ***

Comment 2 Daniel Alvarez Sanchez 2022-03-29 13:03:05 UTC
What we saw is that pinging from an external destination to a FIP (non DVR), the ICMP reply packets coming out from the VM were dropped in the integration bridge of the compute node.
Triggering a recompute on ovn-controller fixed the issue.

Please Yatin, can you upload the OVN databases to the BZ so that the core OVN team can try to reproduce it?

Comment 9 Ales Musil 2022-04-21 13:19:56 UTC

*** This bug has been marked as a duplicate of bug 2069783 ***

Comment 13 Elvira 2022-09-07 13:55:01 UTC
ovn-2021-21.12.0-82.el8fdp.x86_64 is now available in puddle RHOS-16.2-RHEL-8-20220902.n.1. Moving to modified