Bug 1969397 - OVN bug causing subports to stay DOWN fails installations
Summary: OVN bug causing subports to stay DOWN fails installations
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: All
OS: All
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Michał Dulko
QA Contact: Itzik Brown
URL:
Whiteboard:
Depends On:
Blocks: 1972631
TreeView+ depends on / blocked
 
Reported: 2021-06-08 11:13 UTC by Michał Dulko
Modified: 2021-07-27 23:12 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1972631 (view as bug list)
Environment:
Last Closed: 2021-07-27 23:12:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 521 0 None open Bug 1969397: Workaround OVN bug causing subports to be DOWN 2021-06-09 10:31:40 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:12:17 UTC

Description Michał Dulko 2021-06-08 11:13:49 UTC
Description of problem:
Due to OVN bug [1] OCP installations with Kuryr fail as some pods are unable to start because subports incorrectly stay in DOWN status after they're plugged to a trunk port.

Normally detaching and attaching the subport to the trunk helps and Kuryr should be able to use that workaround when needed.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1937851

Version-Release number of selected component (if applicable):


How reproducible:
~50% of cases?

Steps to Reproduce:
1. Run OCP installation with Kuryr on OSP 16.1 with OVN.

Actual results:
Some pods will randomly fail to start and will be kept on ContainerCreating. kuryr-controller will keep getting restarted. The port associated to the problematic pod will be in DOWN status.

Expected results:
Everything works smoothly and all subports that are associated to the pods are in ACTIVE state.

Comment 1 Michał Dulko 2021-06-08 14:56:13 UTC
I'm putting this as a blocker, it's affecting many of Kuryr installations with OSP 16.1 and OVN. The code of the workaround seems ot be done.

Comment 4 Itzik Brown 2021-06-17 14:53:55 UTC
Verified on 4.8.0-0.nightly-2021-06-16-190035

Saw in kuryr controller log:
2021-06-17 14:33:47.964 1 WARNING kuryr_kubernetes.controller.drivers.nested_vlan_vif [-] Subport ed214b6b-cad7-4be0-a3c6-df7324e00317 is in DOWN status for more than 137 seconds. This is a Neutron issue. Attempting to reattach the subport to trunk 77da3289-5191-4504-9bb3-cb7390fcc50e using VLAN ID 1682 to fix it.: kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: VIFVlanNested(active=False,address=fa:16:3e:87:89:cf,has_traffic_filtering=False,id=ed214b6b-cad7-4be0-a3c6-df7324e00317,network=Network(f18aae10-8e51-498d-a01c-02daa71e4f84),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='taped214b6b-ca',vlan_id=1682)
2021-06-17 14:33:49.167 1 WARNING kuryr_kubernetes.controller.drivers.nested_vlan_vif [-] Reattached subport ed214b6b-cad7-4be0-a3c6-df7324e00317, its state will be rechecked when event will be retried.: kuryr_kubernetes.exceptions.ResourceNotReady: Resource not ready: VIFVlanNested(active=False,address=fa:16:3e:87:89:cf,has_traffic_filtering=False,id=ed214b6b-cad7-4be0-a3c6-df7324e00317,network=Network(f18aae10-8e51-498d-a01c-02daa71e4f84),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='taped214b6b-ca',vlan_id=1682)

Installation finished successfully and port is active

$ openstack port list |grep ed214b6b-cad7-4be0-a3c6-df7324e00317
 
| ed214b6b-cad7-4be0-a3c6-df7324e00317 |                                                                  | fa:16:3e:87:89:cf | ip_address='10.128.85.86', subnet_id='2c4ed61f-29db-45f5-aa25-c3804ba68884'   | ACTIVE |

Comment 6 errata-xmlrpc 2021-07-27 23:12:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.