Bug 1736293
| Summary: | GCP machine-controller does not honor requeue delay | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Michael Gugino <mgugino> |
| Component: | Cloud Compute | Assignee: | Michael Gugino <mgugino> |
| Status: | CLOSED ERRATA | QA Contact: | Jianwei Hou <jhou> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.2.0 | CC: | agarcial, zhsun |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:34:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Michael Gugino
2019-08-01 16:47:10 UTC
PR for GCP in master: https://github.com/openshift/cluster-api-provider-gcp/pull/44 This has merged. I looked into AWS, didn't seem like it was affected, but I'll need to spin up a cluster to verify for sure. In case of GCP, how to QE: Before fix: Delete a node that has a pod with PDB on it (such as a master node). Lot at logs of machine-controller oc get pods -n openshift-machine-api #record pod name for machine-api-controllers-* oc logs -n openshift-machine-api $PODNAME -c machine-controller Verify that there is less than 20 seconds between deletion attempts, despite the fact it says 'requeue in: 20s' After fix: Do above, verify the timestamp between delete attempts is 20 sec. Verified. clusterversion: 4.2.0-0.nightly-2019-08-08-103722 delete a machine. check machine-controller logs. The timestamp between delete attempts is 20 sec I0809 07:21:19.794235 1 controller.go:205] Reconciling machine "zhsun0-nks2r-w-c-4hjtt" triggers delete I0809 07:21:19.866608 1 info.go:16] ignoring DaemonSet-managed pods: tuned-kr5sv, dns-default-r6xwk, node-ca-h2w22, machine-config-daemon-9jjpw, node-exporter-zxn72, multus-rmmd2, ovs-wz9jp, sdn-2sgkx I0809 07:21:19.866653 1 info.go:20] drained node "zhsun0-nks2r-w-c-4hjtt.c.openshift-gce-devel.internal" I0809 07:21:19.866666 1 controller.go:302] drain successful for machine "zhsun0-nks2r-w-c-4hjtt" I0809 07:21:19.866696 1 actuator.go:116] zhsun0-nks2r-w-c-4hjtt: Deleting machine I0809 07:21:20.345029 1 reconciler.go:253] Machine "zhsun0-nks2r-w-c-4hjtt" already exists I0809 07:21:20.969860 1 reconciler.go:281] zhsun0-nks2r-w-c-4hjtt: machine status is exists, requeuing... E0809 07:21:20.969940 1 controller.go:220] Failed to delete machine "zhsun0-nks2r-w-c-4hjtt": requeue in: 20s I0809 07:21:20.969949 1 controller.go:364] Actuator returned requeue-after error: requeue in: 20s I0809 07:21:40.970274 1 controller.go:141] Reconciling Machine "zhsun0-nks2r-w-c-4hjtt" I0809 07:21:40.970332 1 controller.go:310] Machine "zhsun0-nks2r-w-c-4hjtt" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0809 07:21:40.970351 1 controller.go:205] Reconciling machine "zhsun0-nks2r-w-c-4hjtt" triggers delete I0809 07:21:41.051958 1 info.go:16] ignoring DaemonSet-managed pods: tuned-kr5sv, dns-default-r6xwk, node-ca-h2w22, machine-config-daemon-9jjpw, node-exporter-zxn72, multus-rmmd2, ovs-wz9jp, sdn-2sgkx I0809 07:21:41.052261 1 info.go:20] drained node "zhsun0-nks2r-w-c-4hjtt.c.openshift-gce-devel.internal" I0809 07:21:41.052286 1 controller.go:302] drain successful for machine "zhsun0-nks2r-w-c-4hjtt" I0809 07:21:41.052331 1 actuator.go:116] zhsun0-nks2r-w-c-4hjtt: Deleting machine I0809 07:21:41.434263 1 reconciler.go:253] Machine "zhsun0-nks2r-w-c-4hjtt" already exists I0809 07:21:41.941530 1 reconciler.go:281] zhsun0-nks2r-w-c-4hjtt: machine status is exists, requeuing... E0809 07:21:41.941872 1 controller.go:220] Failed to delete machine "zhsun0-nks2r-w-c-4hjtt": requeue in: 20s I0809 07:21:41.942631 1 controller.go:364] Actuator returned requeue-after error: requeue in: 20s I0809 07:22:01.943175 1 controller.go:141] Reconciling Machine "zhsun0-nks2r-w-c-4hjtt" I0809 07:22:01.943227 1 controller.go:310] Machine "zhsun0-nks2r-w-c-4hjtt" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0809 07:22:01.943241 1 controller.go:205] Reconciling machine "zhsun0-nks2r-w-c-4hjtt" triggers delete I0809 07:22:02.015331 1 info.go:16] ignoring DaemonSet-managed pods: tuned-kr5sv, dns-default-r6xwk, node-ca-h2w22, machine-config-daemon-9jjpw, node-exporter-zxn72, multus-rmmd2, ovs-wz9jp, sdn-2sgkx I0809 07:22:02.015358 1 info.go:20] drained node "zhsun0-nks2r-w-c-4hjtt.c.openshift-gce-devel.internal" I0809 07:22:02.015370 1 controller.go:302] drain successful for machine "zhsun0-nks2r-w-c-4hjtt" I0809 07:22:02.015407 1 actuator.go:116] zhsun0-nks2r-w-c-4hjtt: Deleting machine I0809 07:22:02.382710 1 reconciler.go:253] Machine "zhsun0-nks2r-w-c-4hjtt" already exists I0809 07:22:02.840781 1 reconciler.go:281] zhsun0-nks2r-w-c-4hjtt: machine status is exists, requeuing... E0809 07:22:02.840858 1 controller.go:220] Failed to delete machine "zhsun0-nks2r-w-c-4hjtt": requeue in: 20s I0809 07:22:02.840870 1 controller.go:364] Actuator returned requeue-after error: requeue in: 20s I0809 07:22:22.841145 1 controller.go:141] Reconciling Machine "zhsun0-nks2r-w-c-4hjtt" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |