Bug 2065488
Summary: | ip-reconciler job does not complete, halts node drain | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Matt Bargenquast <mbargenq> | |
Component: | Networking | Assignee: | Douglas Smith <dosmith> | |
Networking sub component: | multus | QA Contact: | Weibin Liang <weliang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | wking | |
Version: | 4.10 | Keywords: | ServiceDeliveryImpact | |
Target Milestone: | --- | |||
Target Release: | 4.10.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2065785 (view as bug list) | Environment: | ||
Last Closed: | 2022-04-21 13:16:01 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 2065785 | |||
Bug Blocks: |
Description
Matt Bargenquast
2022-03-18 00:26:31 UTC
Thanks for the report. There's a couple parts for this, as we're in the midst of a backport of a number of fixes for this cronjob. I've gone ahead and add this to our known errors so that hopefully the process exists cleanly in the future. It's also been noted that some of the other pending backports may also aid in this situation as well. The PR to watch is: https://github.com/openshift/whereabouts-cni/pull/88 for 4.10. Following verifying steps from https://gist.github.com/dougbtv/b84c5dec4953f4b85048d16ddcf72c15,testing pass in 4.10.10: [weliang@weliang ~]$ oc get cronjob ip-reconciler -o yaml | grep -vP "creationTimestamp|\- apiVersion|ownerReferences|blockOwnerDeletion|controller|kind\: Network|name\: cluster|uid\:|resourceVersion" | sed 's/name: ip-reconciler/name: test-reconciler/' | sed '/ - -log-level=verbose/a \ \ \ \ \ \ \ \ \ \ \ \ - -timeout=invalid' > /tmp/reconcile.yml Error from server (NotFound): cronjobs.batch "ip-reconciler" not found [weliang@weliang ~]$ oc project openshift-multus Now using project "openshift-multus" on server "https://api.weliang-4142.qe.gcp.devcluster.openshift.com:6443". [weliang@weliang ~]$ oc get cronjob ip-reconciler -o yaml | grep -vP "creationTimestamp|\- apiVersion|ownerReferences|blockOwnerDeletion|controller|kind\: Network|name\: cluster|uid\:|resourceVersion" | sed 's/name: ip-reconciler/name: test-reconciler/' | sed '/ - -log-level=verbose/a \ \ \ \ \ \ \ \ \ \ \ \ - -timeout=invalid' > /tmp/reconcile.yml [weliang@weliang ~]$ oc create -f /tmp/reconcile.yml cronjob.batch/test-reconciler created [weliang@weliang ~]$ oc create job --from=cronjob/test-reconciler -n openshift-multus testrun-ip-reconciler job.batch/testrun-ip-reconciler created [weliang@weliang ~]$ oc get pods | grep testrun testrun-ip-reconciler-pmzs6 0/1 Error 0 6s [weliang@weliang ~]$ oc logs testrun-ip-reconciler-pmzs6 invalid value "invalid" for flag -timeout: parse error Usage of /ip-reconciler: -kubeconfig string the path to the Kubernetes configuration file -log-level ip-reconciler the logging level for the ip-reconciler app. Valid values are: "debug", "verbose", "error", and "panic". (default "error") -timeout int the value for a request timeout in seconds. (default 30) [weliang@weliang ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.10 True False 18m Cluster version is 4.10.10 [weliang@weliang ~]$ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.10 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1356 |