Description of problem: Create a machineset with invalid profile, machine stuck in "Provisioning" phase Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2022-03-09-162729 How reproducible: Always Steps to Reproduce: 1.Create a machineset with invalid profile, for example profile: invalid liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml machineset.machine.openshift.io/huliu-ibm410-n5gb4-1 created liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-ibm410-n5gb4-1-smfsb Provisioning 3h59m huliu-ibm410-n5gb4-master-0 Running bx2-4x16 eu-gb eu-gb-1 4h37m huliu-ibm410-n5gb4-master-1 Running bx2-4x16 eu-gb eu-gb-2 4h37m huliu-ibm410-n5gb4-master-2 Running bx2-4x16 eu-gb eu-gb-3 4h37m huliu-ibm410-n5gb4-worker-1-72q6n Running bx2-4x16 eu-gb eu-gb-1 4h30m huliu-ibm410-n5gb4-worker-2-fppjj Running bx2-4x16 eu-gb eu-gb-2 4h30m huliu-ibm410-n5gb4-worker-3-x2ps7 Running bx2-4x16 eu-gb eu-gb-3 4h30m liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-ibm410-n5gb4-1-smfsb -o yaml apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: creationTimestamp: "2022-03-10T02:58:03Z" finalizers: - machine.machine.openshift.io generateName: huliu-ibm410-n5gb4-1- generation: 1 labels: machine.openshift.io/cluster-api-cluster: huliu-ibm410-n5gb4 machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: huliu-ibm410-n5gb4-1 name: huliu-ibm410-n5gb4-1-smfsb namespace: openshift-machine-api ownerReferences: - apiVersion: machine.openshift.io/v1beta1 blockOwnerDeletion: true controller: true kind: MachineSet name: huliu-ibm410-n5gb4-1 uid: 26c09889-d82e-4452-a48c-d6c41f734d57 resourceVersion: "34579" uid: a43a0f9b-f8c5-4448-b38d-5f2886f65f5d spec: lifecycleHooks: {} metadata: {} providerSpec: value: apiVersion: ibmcloudproviderconfig.openshift.io/v1beta1 credentialsSecret: name: ibmcloud-credentials image: huliu-ibm410-n5gb4-rhcos kind: IBMCloudMachineProviderSpec metadata: creationTimestamp: null primaryNetworkInterface: securityGroups: - huliu-ibm410-n5gb4-sg-cluster-wide - huliu-ibm410-n5gb4-sg-openshift-net subnet: huliu-ibm410-n5gb4-subnet-compute-eu-gb-1 profile: invalid region: eu-gb resourceGroup: huliu-ibm410-n5gb4 userDataSecret: name: worker-user-data vpc: huliu-ibm410-n5gb4-vpc zone: eu-gb-1 status: conditions: - lastTransitionTime: "2022-03-10T02:58:05Z" status: "True" type: Drainable - lastTransitionTime: "2022-03-10T02:58:05Z" message: Instance has not been created reason: InstanceNotCreated severity: Warning status: "False" type: InstanceExists - lastTransitionTime: "2022-03-10T02:58:05Z" status: "True" type: Terminable lastUpdated: "2022-03-10T02:59:11Z" phase: Provisioning providerStatus: conditions: - lastProbeTime: "2022-03-10T02:59:11Z" lastTransitionTime: "2022-03-10T02:59:11Z" message: the provided instance profile ID does not exist reason: MachineCreationFailed status: "False" type: MachineCreated liuhuali@Lius-MacBook-Pro huali-test % Actual results: Machine stuck in "Provisioning" phase, no InvalidConfiguration error. Expected results: Machine should in "Failed" phase Additional info: liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-5f9956cd56-xx7bl -c machine-controller |grep huliu-ibm410-n5gb4-1-smfsb I0310 02:58:03.752644 1 controller.go:175] huliu-ibm410-n5gb4-1-smfsb: reconciling Machine I0310 02:58:03.781492 1 controller.go:175] huliu-ibm410-n5gb4-1-smfsb: reconciling Machine I0310 02:58:03.781583 1 actuator.go:122] huliu-ibm410-n5gb4-1-smfsb: Checking if machine exists I0310 02:58:05.239453 1 controller.go:379] huliu-ibm410-n5gb4-1-smfsb: setting phase to Provisioning and requeuing I0310 02:58:05.239489 1 controller.go:504] huliu-ibm410-n5gb4-1-smfsb: going into phase "Provisioning" I0310 02:58:05.261134 1 controller.go:175] huliu-ibm410-n5gb4-1-smfsb: reconciling Machine I0310 02:58:05.261179 1 actuator.go:122] huliu-ibm410-n5gb4-1-smfsb: Checking if machine exists I0310 02:58:06.447862 1 controller.go:386] huliu-ibm410-n5gb4-1-smfsb: reconciling machine triggers idempotent create I0310 02:58:06.447892 1 actuator.go:75] huliu-ibm410-n5gb4-1-smfsb: Creating machine E0310 02:59:02.931441 1 reconciler.go:69] huliu-ibm410-n5gb4-1-smfsb: error occured while creating machine: %!w(*errors.errorString=&{the provided instance profile ID does not exist}) I0310 02:59:02.931580 1 machine_scope.go:156] "huliu-ibm410-n5gb4-1-smfsb": patching machine E0310 02:59:02.968716 1 actuator.go:66] huliu-ibm410-n5gb4-1-smfsb error: huliu-ibm410-n5gb4-1-smfsb: reconciler failed to Create machine: failed to create instance via ibm vpc client: the provided instance profile ID does not exist W0310 02:59:02.968756 1 controller.go:388] huliu-ibm410-n5gb4-1-smfsb: failed to create machine: huliu-ibm410-n5gb4-1-smfsb: reconciler failed to Create machine: failed to create instance via ibm vpc client: the provided instance profile ID does not exist E0310 02:59:02.968803 1 controller.go:317] controller/machine_controller "msg"="Reconciler error" "error"="huliu-ibm410-n5gb4-1-smfsb: reconciler failed to Create machine: failed to create instance via ibm vpc client: the provided instance profile ID does not exist" "name"="huliu-ibm410-n5gb4-1-smfsb" "namespace"="openshift-machine-api" I0310 02:59:02.968855 1 controller.go:175] huliu-ibm410-n5gb4-1-smfsb: reconciling Machine I0310 02:59:02.968865 1 actuator.go:122] huliu-ibm410-n5gb4-1-smfsb: Checking if machine exists I0310 02:59:02.969358 1 logr.go:252] events "msg"="Warning" "message"="huliu-ibm410-n5gb4-1-smfsb: reconciler failed to Create machine: failed to create instance via ibm vpc client: the provided instance profile ID does not exist" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"huliu-ibm410-n5gb4-1-smfsb","uid":"a43a0f9b-f8c5-4448-b38d-5f2886f65f5d","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"34218"} "reason"="FailedCreate" I0310 02:59:04.189279 1 controller.go:386] huliu-ibm410-n5gb4-1-smfsb: reconciling machine triggers idempotent create
By the way, other cloud providers(aws, azure, gcp, alicloud) will turn into Failed phase in the case.
We (IBM) have added an internal tracking issue for this and will attempt to get this addressed for 4.11.
I checked in with Chris and this is still on the backlog for the IBM team
I tested a fix and it properly raises the necessary InvalidConfigurationMachineError so the deployment is labeled as Failed, versus Provisioning. A PR with this update has been opened https://github.com/openshift/machine-api-provider-ibmcloud/pull/4 # oc get machines -n openshift-machine-api; oc get -n openshift-machine-api machine.machine.openshift.io/bz2062579-us-east-3-hzb8t-worker-us-east-tsc48 -o yaml NAME PHASE TYPE REGION ZONE AGE bz2062579-us-east-3-hzb8t-master-0 Running bx2-4x16 us-east us-east-1 41m bz2062579-us-east-3-hzb8t-master-1 Running bx2-4x16 us-east us-east-2 41m bz2062579-us-east-3-hzb8t-master-2 Running bx2-4x16 us-east us-east-3 41m bz2062579-us-east-3-hzb8t-worker-1-bttbr Running bx2-4x16 us-east us-east-1 33m bz2062579-us-east-3-hzb8t-worker-2-5cvvh Running bx2-4x16 us-east us-east-2 33m bz2062579-us-east-3-hzb8t-worker-3-6pw58 Running bx2-4x16 us-east us-east-3 33m bz2062579-us-east-3-hzb8t-worker-us-east-tsc48 Failed 3m59s apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: annotations: machine.openshift.io/instance-state: Unknown creationTimestamp: "2022-08-17T20:39:14Z" finalizers: - machine.machine.openshift.io generateName: bz2062579-us-east-3-hzb8t-worker-us-east- generation: 2 labels: machine.openshift.io/cluster-api-cluster: bz2062579-us-east-3-hzb8t machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: bz2062579-us-east-3-hzb8t-worker-us-east name: bz2062579-us-east-3-hzb8t-worker-us-east-tsc48 namespace: openshift-machine-api ownerReferences: - apiVersion: machine.openshift.io/v1beta1 blockOwnerDeletion: true controller: true kind: MachineSet name: bz2062579-us-east-3-hzb8t-worker-us-east uid: 37b5b59a-0e77-4b4b-a101-894843b7603b resourceVersion: "32873" uid: 23e0e79f-fb98-4757-ba1b-d61b301819e1 spec: lifecycleHooks: {} metadata: labels: node-role.kubernetes.io/infra: "" providerSpec: value: apiVersion: ibmcloudproviderconfig.openshift.io/v1beta1 credentialsSecret: name: ibmcloud-credentials image: bz2062579-us-east-3-hzb8t-rhcos kind: IBMCloudMachineProviderSpec metadata: {} primaryNetworkInterface: securityGroups: - bz2062579-us-east-3-hzb8t-sg-cluster-wide - bz2062579-us-east-3-hzb8t-sg-openshift-net subnet: bz2062579-us-east-3-hzb8t-subnet-compute-us-east-1 profile: bad-profile region: us-east resourceGroup: bz2062579-us-east-3-hzb8t userDataSecret: name: worker-user-data vpc: bz2062579-us-east-3-hzb8t-vpc zone: us-east-1 status: conditions: - lastTransitionTime: "2022-08-17T20:39:16Z" status: "True" type: Drainable - lastTransitionTime: "2022-08-17T20:39:16Z" message: Instance has not been created reason: InstanceNotCreated severity: Warning status: "False" type: InstanceExists - lastTransitionTime: "2022-08-17T20:39:16Z" status: "True" type: Terminable errorMessage: 'could not find instance profile: bad-profile' errorReason: InvalidConfiguration lastUpdated: "2022-08-17T20:39:22Z" phase: Failed providerStatus: conditions: - lastProbeTime: "2022-08-17T20:39:22Z" lastTransitionTime: "2022-08-17T20:39:22Z" message: 'could not find instance profile: bad-profile' reason: MachineCreationFailed status: "False" type: MachineCreated
I am trying to verify this on the newest nightly build: 4.12.0-0.nightly-2022-08-22-143022 But the issue still exists. liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE zhsunibm823-s9l4j-1-llh6d Provisioning 41m zhsunibm823-s9l4j-master-0 Running bx2-4x16 eu-gb eu-gb-1 86m zhsunibm823-s9l4j-master-1 Running bx2-4x16 eu-gb eu-gb-2 86m zhsunibm823-s9l4j-master-2 Running bx2-4x16 eu-gb eu-gb-3 86m zhsunibm823-s9l4j-worker-1-rkqjl Running bx2-4x16 eu-gb eu-gb-1 79m zhsunibm823-s9l4j-worker-2-gcwz6 Running bx2-4x16 eu-gb eu-gb-2 79m zhsunibm823-s9l4j-worker-3-z24vq Running bx2-4x16 eu-gb eu-gb-3 79m Then checking the image, found the repo https://github.com/openshift/machine-api-provider-ibmcloud is not included in the image. Does this repo need to be included in the image? How to do it? $ oc adm release info registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-08-22-143022 --commits | grep ibm ibm-cloud-controller-manager https://github.com/openshift/cloud-provider-ibm 3d22fae892174bcdd99ddf679826b48ee7023a85 ibm-vpc-block-csi-driver https://github.com/openshift/ibm-vpc-block-csi-driver 80636ef324f4500b132a7cc236cc9d6caf95f5d0 ibm-vpc-block-csi-driver-operator https://github.com/openshift/ibm-vpc-block-csi-driver-operator 233dedbe6fd89dd14e3efa15462c0053f7d92ff5 ibm-vpc-node-label-updater https://github.com/openshift/ibm-vpc-node-label-updater 64c1820764f8a7065b03b08a70673b8c125876c1 ibmcloud-machine-controllers https://github.com/openshift/cluster-api-provider-ibmcloud 3bde969f2e83ca3dc2a57ef4001a194916e7fdd9
Waiting on https://issues.redhat.com/browse/ART-4438 to be complete before the image will be available within the payload
Verified on 4.12.0-0.nightly-2022-08-27-164831 1.Create a machineset with invalid profile, for example profile: invalid liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml machineset.machine.openshift.io/huliu-ibm122-46s4v-test created liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-ibm122-46s4v-master-0 Running bx2-4x16 eu-gb eu-gb-1 4h21m huliu-ibm122-46s4v-master-1 Running bx2-4x16 eu-gb eu-gb-2 4h21m huliu-ibm122-46s4v-master-2 Running bx2-4x16 eu-gb eu-gb-3 4h21m huliu-ibm122-46s4v-test-zlsfw Failed 9s huliu-ibm122-46s4v-worker-1-ftfsz Running bx2-4x16 eu-gb eu-gb-1 16m huliu-ibm122-46s4v-worker-2-9szft Running bx2-4x16 eu-gb eu-gb-2 4h14m huliu-ibm122-46s4v-worker-3-bvk9k Running bx2-4x16 eu-gb eu-gb-3 12m liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-ibm122-46s4v-test-zlsfw -o yaml apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: annotations: machine.openshift.io/instance-state: Unknown creationTimestamp: "2022-08-29T05:49:56Z" finalizers: - machine.machine.openshift.io generateName: huliu-ibm122-46s4v-test- generation: 1 labels: machine.openshift.io/cluster-api-cluster: huliu-ibm122-46s4v machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: huliu-ibm122-46s4v-test name: huliu-ibm122-46s4v-test-zlsfw namespace: openshift-machine-api ownerReferences: - apiVersion: machine.openshift.io/v1beta1 blockOwnerDeletion: true controller: true kind: MachineSet name: huliu-ibm122-46s4v-test uid: bfbf39ca-281a-4ed5-8440-be6101089931 resourceVersion: "106337" uid: 3c8e84ec-bb88-438f-94e1-fc94e1270654 spec: lifecycleHooks: {} metadata: {} providerSpec: value: apiVersion: ibmcloudproviderconfig.openshift.io/v1beta1 credentialsSecret: name: ibmcloud-credentials image: huliu-ibm122-46s4v-rhcos kind: IBMCloudMachineProviderSpec metadata: creationTimestamp: null primaryNetworkInterface: securityGroups: - huliu-ibm122-46s4v-sg-cluster-wide - huliu-ibm122-46s4v-sg-openshift-net subnet: huliu-ibm122-46s4v-subnet-compute-eu-gb-1 profile: invalid region: eu-gb resourceGroup: huliu-ibm122-46s4v userDataSecret: name: worker-user-data vpc: huliu-ibm122-46s4v-vpc zone: eu-gb-1 status: conditions: - lastTransitionTime: "2022-08-29T05:49:58Z" status: "True" type: Drainable - lastTransitionTime: "2022-08-29T05:49:58Z" message: Instance has not been created reason: InstanceNotCreated severity: Warning status: "False" type: InstanceExists - lastTransitionTime: "2022-08-29T05:49:58Z" status: "True" type: Terminable errorMessage: 'could not find instance profile: invalid' errorReason: InvalidConfiguration lastUpdated: "2022-08-29T05:50:02Z" phase: Failed providerStatus: conditions: - lastProbeTime: "2022-08-29T05:50:02Z" lastTransitionTime: "2022-08-29T05:50:02Z" message: 'could not find instance profile: invalid' reason: MachineCreationFailed status: "False" type: MachineCreated
Doc Notes: Validation was added to the IBM Cloud Machine-API Provider, to make sure the supplied MachineProvider profile matches an existing profile in the IBM Cloud Region. https://github.com/openshift/machine-api-provider-ibmcloud/commit/fdaa78f59c69630466effd9d212f066409fb93e8 The fix landed in OCP 4.12, but I don't expect the fix to be backported to 4.11 or 4.10, as IBM Cloud IPI was still a Tech Preview in those releases. To pick up this fix, it is recommended to use 4.12 or later.
thanks Christopher, i updated the doc text
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399