Bug 2058672 - whereabouts IPAM CNI ip-reconciler cronjob specification requires hostnetwork, api-int lb usage & proper backoff
Summary: whereabouts IPAM CNI ip-reconciler cronjob specification requires hostnetwork...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.z
Assignee: Douglas Smith
QA Contact: Weibin Liang
URL:
Whiteboard:
Depends On: 2058671
Blocks: 2058673
TreeView+ depends on / blocked
 
Reported: 2022-02-25 15:17 UTC by Douglas Smith
Modified: 2022-04-12 08:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2058671
: 2058673 (view as bug list)
Environment:
Last Closed: 2022-04-12 08:10:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1322 0 None open Bug 2058672: ip-reconciler cronjob specification requires hostnetwork, api-int lb usage & proper backoff [backport 4.10] 2022-03-16 05:49:14 UTC
Red Hat Product Errata RHBA-2022:1241 0 None None None 2022-04-12 08:11:01 UTC

Description Douglas Smith 2022-02-25 15:17:06 UTC
+++ This bug was initially created as a clone of Bug #2058671 +++

Description of problem: A number of changes related to the ip-reconciler ( need to be properly implemented, these include:

Impact: Without the proper backoff and replacement policies, many failed jobs can build up. Additionally without hostnetworking and use of the api-int lb network connectivity problems which cause errors.

Note: A set of changes to the ip-reconciler itself

Fixes to include in this (and subsequent backports) include:

* auto clean failed jobs (https://github.com/openshift/cluster-network-operator/pull/1318)
* Use host network and api-int (https://github.com/openshift/cluster-network-operator/pull/1302)
* Disable retries on failure (https://github.com/openshift/cluster-network-operator/pull/1290)

Comment 4 Douglas Smith 2022-04-05 16:46:45 UTC
To verify:

run: oc get cronjob ip-reconciler -o yaml -n openshift-multus | grep -Pi "KUBERNETES_SERVICE_PORT|KUBERNETES_SERVICE_HOST|failedJobsHistoryLimit|backoffLimit|hostNetwork"

which should result in:

  failedJobsHistoryLimit: 1
      backoffLimit: 0
            - name: KUBERNETES_SERVICE_PORT
            - name: KUBERNETES_SERVICE_HOST
          hostNetwork: true

Thank you!

Comment 5 Weibin Liang 2022-04-05 17:59:51 UTC
[weliang@weliang ~]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-04-05-063640   True        False         3h53m   Cluster version is 4.10.0-0.nightly-2022-04-05-063640
[weliang@weliang ~]$ oc get cronjob ip-reconciler -o yaml -n openshift-multus | grep -Pi "KUBERNETES_SERVICE_PORT|KUBERNETES_SERVICE_HOST|failedJobsHistoryLimit|backoffLimit|hostNetwork"
  failedJobsHistoryLimit: 1
      backoffLimit: 0
            - name: KUBERNETES_SERVICE_PORT
            - name: KUBERNETES_SERVICE_HOST
          hostNetwork: true
[weliang@weliang ~]$

Comment 8 errata-xmlrpc 2022-04-12 08:10:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.9 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1241


Note You need to log in before you can comment on or make changes to this bug.