Bug 2106733
Summary: | Machine Controller stuck with Terminated Instances while Provisioning on AWS | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gabriel Stein <gferrazs> |
Component: | Cloud Compute | Assignee: | Radek Maňák <rmanak> |
Cloud Compute sub component: | Cloud Controller Manager | QA Contact: | Huali Liu <huliu> |
Status: | CLOSED ERRATA | Docs Contact: | Jeana Routh <jrouth> |
Severity: | high | ||
Priority: | high | CC: | rmanak, skumari, tsedovic |
Version: | 4.8 | ||
Target Milestone: | --- | ||
Target Release: | 4.12.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
* Previously, there was no check for nil values in the annotations of a machine object before attempting to access the object. This situation was rare, but caused the machine controller to panic when reconciling the machine. With this release, nil values are checked and the machine controller is able to reconcile machines without annotations. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2106733[*BZ#2106733*])
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2023-01-17 19:52:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2108021 |
Description
Gabriel Stein
2022-07-13 12:00:34 UTC
Probably the issue is here: - https://github.com/openshift/machine-api-provider-aws/blob/d701bcb720a12bd7d169d79699962c447a1f026d/pkg/actuators/machine/reconciler.go#L416-L426(the fields referenced are on the file below. Probably duplicate the lines or move here). - https://github.com/openshift/machine-api-provider-aws/blob/d701bcb720a12bd7d169d79699962c447a1f026d/pkg/actuators/machine/reconciler.go#L165 Since issue is in machine-api, moving it to correct team. I am working on a fix for this. Tried several times replacing worker node on 4.12.0-0.nightly-2022-07-17-215842, there is no panic. Move this to Verified. liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-aws412-945hh-master-0 Running m6i.xlarge us-east-2 us-east-2a 99m huliu-aws412-945hh-master-1 Running m6i.xlarge us-east-2 us-east-2b 99m huliu-aws412-945hh-master-2 Running m6i.xlarge us-east-2 us-east-2c 99m huliu-aws412-945hh-worker-us-east-2a-4gwxb Running m6i.xlarge us-east-2 us-east-2a 6m9s huliu-aws412-945hh-worker-us-east-2a-cndjb Running m6i.xlarge us-east-2 us-east-2a 6m22s huliu-aws412-945hh-worker-us-east-2b-t2rvp Running m6i.xlarge us-east-2 us-east-2b 5m55s huliu-aws412-945hh-worker-us-east-2c-t98h4 Running m6i.xlarge us-east-2 us-east-2c 5m39s liuhuali@Lius-MacBook-Pro huali-test % oc get pod NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-77d49d497d-rpwlx 2/2 Running 0 99m cluster-baremetal-operator-8b7bfdf74-2r6g6 2/2 Running 0 99m machine-api-controllers-6f89cc4dcf-vn24l 7/7 Running 0 96m machine-api-operator-675494c444-9l4mn 2/2 Running 0 99m liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-6f89cc4dcf-vn24l -c machine-controller |grep panic liuhuali@Lius-MacBook-Pro huali-test % Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |