Bug 2002010 - ovn-kube may never attempt to retry a pod creation
Summary: ovn-kube may never attempt to retry a pod creation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.10.0
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks: 2005462
TreeView+ depends on / blocked
 
Reported: 2021-09-07 17:19 UTC by Tim Rozet
Modified: 2022-03-10 16:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2005462 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:08:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 739 0 None open Bug 2002010: Fixes skipping pods accidentally in retry 2021-09-13 23:20:59 UTC
Github ovn-org ovn-kubernetes pull 2472 0 None None None 2021-09-07 17:21:03 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:08:42 UTC

Description Tim Rozet 2021-09-07 17:19:56 UTC
Description of problem:
In ovnkube-master there is a cache of pods that need to be retried for creating their corresponding OVN logical port. If the initial pod add fails in ovnkube-master...say due to the pod not being scheduled yet. We will add the pod to the cache, but subsequent retries will not happen because we always check if the pod is scheduled based on the version in the cache, and not the latest version of the pod.

The end result of this is that the pod will never get a logical switch port in OVN and never come up.



Steps to Reproduce:
1. Create a pod that cannot be scheduled (marking workers as not ready), ovnkube-master will still get this event and fail to create its logical switch port.
2. Remove taint and allow pod to be scheduled. 
3. See if pod comes up or is infinitely stuck unless ovnkube-master is restarted.

Comment 5 errata-xmlrpc 2022-03-10 16:08:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.