Bug 1800314
| Summary: | Machine controller must wait for node to be fully terminated before deleting Kube Node object | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> | |
| Component: | Cloud Compute | Assignee: | Alexander Demicev <ademicev> | |
| Cloud Compute sub component: | Other Providers | QA Contact: | Jianwei Hou <jhou> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | unspecified | CC: | agarcial, mgugino, zhsun | |
| Version: | 4.4 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.4.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1800315 (view as bug list) | Environment: | ||
| Last Closed: | 2020-05-13 21:56:57 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1800315, 1800316, 1800317 | |||
|
Description
Clayton Coleman
2020-02-06 20:54:24 UTC
Can you specify how to reproduce this? The machine controller is expected to drain the node, then ask the provider to delete the instance, then delete the node. https://github.com/openshift/machine-api-operator/blob/master/pkg/controller/machine/controller.go#L209-L229. Moving back to as this needs to be verified only after: https://github.com/openshift/cluster-api-provider-azure/pull/105 https://github.com/openshift/cluster-api-provider-aws/pull/296 https://github.com/openshift/cluster-api-provider-gcp/pull/74 are merged which also fix https://bugzilla.redhat.com/show_bug.cgi?id=1733474 Tested on Azure, the timestamp shows node is deleted after the instance is deleted. I0219 10:10:51.591121 1 controller.go:332] Could not find node from noderef, it may have already been deleted: qe-jhou-rqfpj-worker-centralus2-tqdvz I0219 10:10:51.591141 1 actuator.go:128] Deleting machine qe-jhou-rqfpj-worker-centralus2-tqdvz I0219 10:10:51.591810 1 virtualmachines.go:225] deleting vm qe-jhou-rqfpj-worker-centralus2-tqdvz I0219 10:10:51.815731 1 virtualmachines.go:242] successfully deleted vm qe-jhou-rqfpj-worker-centralus2-tqdvz I0219 10:10:51.815758 1 disks.go:49] deleting disk qe-jhou-rqfpj-worker-centralus2-tqdvz_OSDisk I0219 10:10:51.868334 1 disks.go:65] successfully deleted disk qe-jhou-rqfpj-worker-centralus2-tqdvz_OSDisk I0219 10:10:51.868370 1 networkinterfaces.go:215] deleting nic qe-jhou-rqfpj-worker-centralus2-tqdvz-nic I0219 10:11:02.111140 1 networkinterfaces.go:234] successfully deleted nic qe-jhou-rqfpj-worker-centralus2-tqdvz-nic I0219 10:11:02.111240 1 machine_scope.go:170] qe-jhou-rqfpj-worker-centralus2-tqdvz: status unchanged I0219 10:11:02.111284 1 machine_scope.go:170] qe-jhou-rqfpj-worker-centralus2-tqdvz: status unchanged I0219 10:11:02.111293 1 machine_scope.go:190] qe-jhou-rqfpj-worker-centralus2-tqdvz: patching machine I0219 10:11:02.125843 1 actuator.go:196] Checking if machine qe-jhou-rqfpj-worker-centralus2-tqdvz exists I0219 10:11:02.365445 1 controller.go:241] qe-jhou-rqfpj-worker-centralus2-tqdvz: deleting node "qe-jhou-rqfpj-worker-centralus2-tqdvz" for machine I0219 10:11:02.365493 1 controller.go:388] Node "qe-jhou-rqfpj-worker-centralus2-tqdvz" not found I0219 10:11:02.395353 1 controller.go:255] qe-jhou-rqfpj-worker-centralus2-tqdvz: machine deletion successful Will also cover the rest provider before moving this to verified. Verified on AWS:4.4.0-0.nightly-2020-02-18-005549 Machine controller deletes node object until instance is terminated, we can see logs bellow: can't proceed deleting machine while cloud instance is being terminated Failed to find existing instance by id i-08621c75e167a4f68: instance i-08621c75e167a4f68 state "terminated" is not in running, pending, stopped, stopping, shutting-down Instance does not exist deleting node "zhsun4-s2gcg-worker-us-east-2c-ftk9g" for machine machine deletion successful I0219 03:09:21.531611 1 actuator.go:538] zhsun4-s2gcg-worker-us-east-2c-ftk9g: Found instance by id: i-08621c75e167a4f68 I0219 03:09:21.531646 1 controller.go:236] zhsun4-s2gcg-worker-us-east-2c-ftk9g: can't proceed deleting machine while cloud instance is being terminated, requeuing I0219 03:09:21.531712 1 controller.go:163] zhsun4-s2gcg-worker-us-east-2c-jph8g: reconciling Machine I0219 ... I0219 03:09:51.284449 1 actuator.go:315] zhsun4-s2gcg-worker-us-east-2c-ftk9g: deleting machine W0219 03:09:51.374462 1 actuator.go:535] zhsun4-s2gcg-worker-us-east-2c-ftk9g: Failed to find existing instance by id i-08621c75e167a4f68: instance i-08621c75e167a4f68 state "terminated" is not in running, pending, stopped, stopping, shutting-down E0219 03:09:51.468978 1 utils.go:165] Excluding instance matching zhsun4-s2gcg-worker-us-east-2c-ftk9g: instance i-08621c75e167a4f68 state "terminated" is not in running, pending, stopped, stopping, shutting-down I0219 03:09:51.469018 1 actuator.go:357] zhsun4-s2gcg-worker-us-east-2c-ftk9g: found 0 existing instances for machine W0219 03:09:51.469032 1 actuator.go:359] zhsun4-s2gcg-worker-us-east-2c-ftk9g: no instances found to delete for machine I0219 03:09:51.469042 1 actuator.go:489] zhsun4-s2gcg-worker-us-east-2c-ftk9g: Checking if machine exists W0219 03:09:51.531060 1 actuator.go:535] zhsun4-s2gcg-worker-us-east-2c-ftk9g: Failed to find existing instance by id i-08621c75e167a4f68: instance i-08621c75e167a4f68 state "terminated" is not in running, pending, stopped, stopping, shutting-down E0219 03:09:51.591483 1 utils.go:165] Excluding instance matching zhsun4-s2gcg-worker-us-east-2c-ftk9g: instance i-08621c75e167a4f68 state "terminated" is not in running, pending, stopped, stopping, shutting-down I0219 03:09:51.591514 1 actuator.go:502] zhsun4-s2gcg-worker-us-east-2c-ftk9g: Instance does not exist I0219 03:09:51.591529 1 controller.go:241] ip-10-0-169-219.us-east-2.compute.internal: deleting node "zhsun4-s2gcg-worker-us-east-2c-ftk9g" for machine I0219 03:09:51.631704 1 controller.go:255] zhsun4-s2gcg-worker-us-east-2c-ftk9g: machine deletion successful Verified on gcp, by looking the timestamp node is deleted after the instance is deleted. clusterversion: 4.4.0-0.nightly-2020-02-19-173908 I0221 02:34:07.226447 1 controller.go:321] Could not find node from noderef, it may have already been deleted: zhsung-nrjcj-w-a-fksnz.c.openshift-qe.internal I0221 02:34:07.226487 1 actuator.go:118] zhsung-nrjcj-w-a-fksnz: Deleting machine I0221 02:34:07.441236 1 reconciler.go:302] zhsung-nrjcj-w-a-fksnz: Machine does not exist I0221 02:34:07.441442 1 reconciler.go:319] zhsung-nrjcj-w-a-fksnz: Machine not found during delete, skipping I0221 02:34:07.441547 1 controller.go:230] zhsung-nrjcj-w-a-fksnz.c.openshift-qe.internal: deleting node "zhsung-nrjcj-w-a-fksnz" for machine I0221 02:34:07.441666 1 controller.go:377] Node "zhsung-nrjcj-w-a-fksnz.c.openshift-qe.internal" not found I0221 02:34:07.472003 1 controller.go:244] zhsung-nrjcj-w-a-fksnz: machine deletion successful Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |