Bug 1969397

Summary: OVN bug causing subports to stay DOWN fails installations
Product: OpenShift Container Platform Reporter: Michał Dulko <mdulko>
Component: NetworkingAssignee: Michał Dulko <mdulko>
Networking sub component: kuryr QA Contact: Itzik Brown <itbrown>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: itbrown
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1972631 (view as bug list) Environment:
Last Closed: 2021-07-27 23:12:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1972631    

Description Michał Dulko 2021-06-08 11:13:49 UTC
Description of problem:
Due to OVN bug [1] OCP installations with Kuryr fail as some pods are unable to start because subports incorrectly stay in DOWN status after they're plugged to a trunk port.

Normally detaching and attaching the subport to the trunk helps and Kuryr should be able to use that workaround when needed.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1937851

Version-Release number of selected component (if applicable):


How reproducible:
~50% of cases?

Steps to Reproduce:
1. Run OCP installation with Kuryr on OSP 16.1 with OVN.

Actual results:
Some pods will randomly fail to start and will be kept on ContainerCreating. kuryr-controller will keep getting restarted. The port associated to the problematic pod will be in DOWN status.

Expected results:
Everything works smoothly and all subports that are associated to the pods are in ACTIVE state.

Comment 1 Michał Dulko 2021-06-08 14:56:13 UTC
I'm putting this as a blocker, it's affecting many of Kuryr installations with OSP 16.1 and OVN. The code of the workaround seems ot be done.

Comment 4 Itzik Brown 2021-06-17 14:53:55 UTC
Verified on 4.8.0-0.nightly-2021-06-16-190035

Saw in kuryr controller log:
2021-06-17 14:33:47.964 1 WARNING kuryr_kubernetes.controller.drivers.nested_vlan_vif [-] Subport ed214b6b-cad7-4be0-a3c6-df7324e00317 is in DOWN status for more than 137 seconds. This is a Neutron issue. Attempting to reattach the subport to trunk 77da3289-5191-4504-9bb3-cb7390fcc50e using VLAN ID 1682 to fix it.: kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: VIFVlanNested(active=False,address=fa:16:3e:87:89:cf,has_traffic_filtering=False,id=ed214b6b-cad7-4be0-a3c6-df7324e00317,network=Network(f18aae10-8e51-498d-a01c-02daa71e4f84),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='taped214b6b-ca',vlan_id=1682)
2021-06-17 14:33:49.167 1 WARNING kuryr_kubernetes.controller.drivers.nested_vlan_vif [-] Reattached subport ed214b6b-cad7-4be0-a3c6-df7324e00317, its state will be rechecked when event will be retried.: kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: VIFVlanNested(active=False,address=fa:16:3e:87:89:cf,has_traffic_filtering=False,id=ed214b6b-cad7-4be0-a3c6-df7324e00317,network=Network(f18aae10-8e51-498d-a01c-02daa71e4f84),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='taped214b6b-ca',vlan_id=1682)

Installation finished successfully and port is active

$ openstack port list |grep ed214b6b-cad7-4be0-a3c6-df7324e00317
 
| ed214b6b-cad7-4be0-a3c6-df7324e00317 |                                                                  | fa:16:3e:87:89:cf | ip_address='10.128.85.86', subnet_id='2c4ed61f-29db-45f5-aa25-c3804ba68884'   | ACTIVE |

Comment 6 errata-xmlrpc 2021-07-27 23:12:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438