Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2017532

Summary: [ovn-controller] OVN controller is not binding ports
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Surya Seetharaman <surya>
Component: OVNAssignee: Numan Siddique <nusiddiq>
Status: CLOSED WONTFIX QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: FDP 21.CCC: ctrautma, jiji, jlema, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-07 14:21:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Surya Seetharaman 2021-10-26 18:21:31 UTC
Description of problem:

On 500 node scale cluster, we saw one worker node where pods were not getting created. They were failing with the error:


failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:81:10:07 [10.129.16.7/23]


We could see the lsp getting created on nbdb:


sh-4.4# ovn-nbctl find logical-switch-port name=default_client-on-ovn-worker3
_uuid               : b8d7381c-391e-4adb-b69d-2b1b5584a970
addresses           : ["0a:58:0a:81:10:30 10.129.16.48"]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : []
external_ids        : {namespace=default, pod="true"}
ha_chassis_group    : []
name                : default_client-on-ovn-worker3
options             : {iface-id-ver="62170fe5-07bf-486d-a851-2d1938dd32f2", requested-chassis=worker007-fc640}                                                              
parent_name         : []
port_security       : ["0a:58:0a:81:10:30 10.129.16.48"]
tag                 : []
tag_request         : []
type                : ""
up                  : false

The ovs interface was also getting created (and recreated with every CNI_ADD failure because ovn wasn't binding the port correctly)

sh-4.4# ovs-vsctl list interface | grep default_client-on-ovn-worker3 -C20                                                                                                   
ofport_request      : []                                                                                                                                                     
options             : {csum="true", key=flow, remote_ip="192.168.217.57"}                                                                                                    
other_config        : {}                                                                                                                                                     
statistics          : {rx_bytes=13319, rx_packets=141, tx_bytes=13415, tx_packets=141}                                                                                       
status              : {tunnel_egress_iface=br-ex, tunnel_egress_iface_carrier=up}                                                                                            
type                : geneve                                                                                                                                                 
                                                                                                                                                                             
_uuid               : a2592a95-6667-45d8-9bf9-0d4f5b91b384                                                                                                                   
admin_state         : up                                                                                                                                                     
bfd                 : {}                                                                                                                                                     
bfd_status          : {}  
cfm_fault           : []
cfm_fault_status    : []
cfm_flap_count      : []
cfm_health          : []
cfm_mpid            : []
cfm_remote_mpids    : []
cfm_remote_opstate  : []
duplex              : full
error               : []
external_ids        : {attached_mac="0a:58:0a:81:10:30", iface-id=default_client-on-ovn-worker3, iface-id-ver="62170fe5-07bf-486d-a851-2d1938dd32f2", ip_addresses="10.129.16.48/23", sandbox="2670dcaf7ea8444df4feb02226a8f2a695b33b15cf796cbcac84656c0fd3751b"}
ifindex             : 11573
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current        : []
link_resets         : 0
link_speed          : 10000000000
link_state          : up
lldp                : {}
mac                 : []
mac_in_use          : "ce:85:47:39:f2:c7"
mtu                 : 1400
mtu_request         : []
name                : "2670dcaf7ea8444"
ofport              : 11932
ofport_request      : []
options             : {}
other_config        : {}
statistics          : {collisions=0, rx_bytes=516, rx_crc_err=0, rx_dropped=1, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=6, tx_bytes=836, tx_dropped=0, tx_errors=0, tx_packets=8}
status              : {driver_name=veth, driver_version="1.0", firmware_version=""}
type                : ""




Version-Release number of selected component (if applicable):
OCP version: 4.10.0-0.nightly-2021-10-21-105053
ARG ovsver=2.16.0-15.el8fdp
ARG ovnver=21.09.0-20.el8fdp

OVN controller simply wasn't claiming any ovs interfaces and creation was timing out.

How reproducible: not always


Steps to Reproduce:
1. N/A unfortunately. Out of the 500 nodes, we only saw one such ovn-controller having issues
2.
3.