Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2058675

Summary:	whereabouts IPAM CNI ip-reconciler cronjob specification requires hostnetwork, api-int lb usage & proper backoff
Product:	OpenShift Container Platform	Reporter:	Douglas Smith <dosmith>
Component:	Networking	Assignee:	Douglas Smith <dosmith>
Networking sub component:	multus	QA Contact:	Weibin Liang <weliang>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	weliang, wking
Version:	4.10
Target Milestone:	---
Target Release:	4.7.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	2058674
Clones:	2058677 (view as bug list)		Environment:
Last Closed:	2022-06-10 05:37:32 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2058674
Bug Blocks:	2058677

Description Douglas Smith 2022-02-25 15:21:06 UTC

+++ This bug was initially created as a clone of Bug #2058674 +++

+++ This bug was initially created as a clone of Bug #2058673 +++

+++ This bug was initially created as a clone of Bug #2058672 +++

+++ This bug was initially created as a clone of Bug #2058671 +++

Description of problem: A number of changes related to the ip-reconciler ( need to be properly implemented, these include:

Impact: Without the proper backoff and replacement policies, many failed jobs can build up. Additionally without hostnetworking and use of the api-int lb network connectivity problems which cause errors.

Note: A set of changes to the ip-reconciler itself

Fixes to include in this (and subsequent backports) include:

* auto clean failed jobs (https://github.com/openshift/cluster-network-operator/pull/1318)
* Use host network and api-int (https://github.com/openshift/cluster-network-operator/pull/1302)
* Disable retries on failure (https://github.com/openshift/cluster-network-operator/pull/1290)

Comment 3 Weibin Liang 2022-05-25 19:23:35 UTC

Tested and verified in 4.7.0-0.nightly-2022-05-25-155733

[weliang@weliang ~]$ oc get cronjob ip-reconciler -o yaml -n openshift-multus | grep -Pi "KUBERNETES_SERVICE_PORT|KUBERNETES_SERVICE_HOST|failedJobsHistoryLimit|backoffLimit|hostNetwork"
  failedJobsHistoryLimit: 1
      backoffLimit: 0
            - name: KUBERNETES_SERVICE_PORT
            - name: KUBERNETES_SERVICE_HOST
          hostNetwork: true
[weliang@weliang ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2022-05-25-155733   True        False         7m17s   Cluster version is 4.7.0-0.nightly-2022-05-25-155733

Comment 6 errata-xmlrpc 2022-06-10 05:37:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.52 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4910