Bug 1782176
| Summary: | AWS: terminated Machine adopted by replacement Machine with the same name | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alberto <agarcial> |
| Component: | Cloud Compute | Assignee: | Brad Ison <brad.ison> |
| Status: | CLOSED ERRATA | QA Contact: | Jianwei Hou <jhou> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.3.0 | CC: | brad.ison, jhou, mgugino, wking |
| Target Milestone: | --- | ||
| Target Release: | 4.3.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1781339 | Environment: | |
| Last Closed: | 2020-01-23 11:18:55 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1781339 | ||
| Bug Blocks: | |||
|
Description
Alberto
2019-12-11 10:48:09 UTC
According to a most recent disruptive test for 4.3, it's still failing [1]. Two machines that were deleted and recreated by the disruptive test were https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-disruptive-4.3/74/artifacts/e2e-aws-disruptive/must-gather/registry-svc-ci-openshift-org-ocp-4-3-2019-12-05-183852-sha256-64c63eedf863406fbc6c7515026f909a7221472cf70283708fb7010dd5e6139e/namespaces/openshift-machine-api/machine.openshift.io/machines/ci-op-vxh8i8vn-2770b-v8htb-master-0.yaml https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-disruptive-4.3/74/artifacts/e2e-aws-disruptive/must-gather/registry-svc-ci-openshift-org-ocp-4-3-2019-12-05-183852-sha256-64c63eedf863406fbc6c7515026f909a7221472cf70283708fb7010dd5e6139e/namespaces/openshift-machine-api/machine.openshift.io/machines/ci-op-vxh8i8vn-2770b-v8htb-master-1.yaml Machine controller seems to produce same message[2] as the bug reporting. ``` 2019-12-12T18:09:12.213366746Z I1212 18:09:12.212375 1 actuator.go:499] ci-op-vxh8i8vn-2770b-v8htb-master-0: Checking if machine exists 2019-12-12T18:09:12.355415018Z E1212 18:09:12.355370 1 utils.go:186] Excluding instance matching ci-op-vxh8i8vn-2770b-v8htb-master-0: instance i-03bc8d3394d9ac2a1 state "terminated" is not in running, pending, stopped, stopping, shutting-down 2019-12-12T18:09:12.35548575Z I1212 18:09:12.355466 1 actuator.go:508] ci-op-vxh8i8vn-2770b-v8htb-master-0: Possible eventual-consistency discrepancy; returning an error to requeue 2019-12-12T18:09:12.355588804Z E1212 18:09:12.355522 1 controller.go:279] Failed to check if machine "ci-op-vxh8i8vn-2770b-v8htb-master-0" exists: requeue in: 20s ... 2019-12-12T18:38:32.316281711Z I1212 18:38:32.316267 1 actuator.go:499] ci-op-vxh8i8vn-2770b-v8htb-master-0: Checking if machine exists 2019-12-12T18:38:32.39923898Z E1212 18:38:32.399198 1 utils.go:186] Excluding instance matching ci-op-vxh8i8vn-2770b-v8htb-master-0: instance i-03bc8d3394d9ac2a1 state "terminated" is not in running, pending, stopped, stopping, shutting-down 2019-12-12T18:38:32.39923898Z I1212 18:38:32.399221 1 actuator.go:508] ci-op-vxh8i8vn-2770b-v8htb-master-0: Possible eventual-consistency discrepancy; returning an error to requeue 2019-12-12T18:38:32.399279482Z E1212 18:38:32.399234 1 controller.go:279] Failed to check if machine "ci-op-vxh8i8vn-2770b-v8htb-master-0" exists: requeue in: 20s ``` [1] https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-disruptive-4.3/73 [2] https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-disruptive-4.3/74/artifacts/e2e-aws-disruptive/must-gather/registry-svc-ci-openshift-org-ocp-4-3-2019-12-05-183852-sha256-64c63eedf863406fbc6c7515026f909a7221472cf70283708fb7010dd5e6139e/namespaces/openshift-machine-api/pods/machine-api-controllers-6c667d98d5-v4kwr/machine-controller/machine-controller/logs/current.log I don't think either of the linked runs actually used an image that contained the fix: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-disruptive-4.3/74/artifacts/release-images-latest/release-images-latest That shows: "name": "aws-machine-controllers", "annotations": { "io.openshift.build.commit.id": "a4be8350439b1da03902a65f1399b1269141c137", "io.openshift.build.commit.ref": "release-4.3", "io.openshift.build.source-location": "https://github.com/openshift/cluster-api-provider-aws" } That commit (a4be835) is from November 26th, and doesn't contain the fix. Can we try again with a newer release image? Thank you Brad, sorry for the false alarm.
I inspected a new disruptive test that had picked up the fix[1]. The above issue has been fixed, the eventual-consistency error had not occurred[2].
"name": "aws-machine-controllers",
"annotations": {
"io.openshift.build.commit.id": "21713459ae063978bac32b8ea35dc6d68998ed89",
"io.openshift.build.commit.ref": "release-4.3",
"io.openshift.build.source-location": "https://github.com/openshift/cluster-api-provider-aws"
},
commit 21713459ae063978bac32b8ea35dc6d68998ed89 (HEAD -> release-4.3, origin/release-4.3)
Merge: 1555366a ba72c04d
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date: Wed Dec 11 14:51:41 2019 +0100
Merge pull request #281 from openshift-cherrypick-robot/cherry-pick-280-to-release-4.3
[release-4.3] Bug 1782176: Ensure Spec.ProviderID is not empty string
[1] https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-disruptive-4.3/77/artifacts/release-images-latest/release-images-latest
[2] https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-disruptive-4.3/77/artifacts/e2e-aws-disruptive/must-gather/registry-svc-ci-openshift-org-ocp-4-3-2019-12-14-042216-sha256-64c63eedf863406fbc6c7515026f909a7221472cf70283708fb7010dd5e6139e/namespaces/openshift-machine-api/pods/machine-api-controllers-75d55f5c97-ccn9k/machine-controller/machine-controller/logs/current.log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |