Bug 2062579 - [IBMCloud] Provide invalid profile machine stuck in "Provisioning" phase
Summary: [IBMCloud] Provide invalid profile machine stuck in "Provisioning" phase
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.10
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.12.0
Assignee: Christopher J Schaefer
QA Contact: Huali Liu
Jeana Routh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-10 07:59 UTC by Huali Liu
Modified: 2023-01-17 19:48 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* Previously, when creating a new `Machine` resource using a machine profile that does not exist in IBM Cloud, the machines became stuck in the `Provisioning` phase. With this release, validation is added to the IBM Cloud Machine API provider to ensure that a machine profile exists, an machines with an invalid machine profile are rejected by the Machine API. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2062579[*BZ#2062579*])
Clone Of:
Environment:
Last Closed: 2023-01-17 19:47:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-provider-ibmcloud pull 4 0 None open Bug 2062579: IBMCloud: Verify machine profile 2022-08-17 20:52:00 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:48:02 UTC

Description Huali Liu 2022-03-10 07:59:09 UTC
Description of problem:
Create a machineset with invalid profile, machine stuck in "Provisioning" phase

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-03-09-162729

How reproducible:
Always

Steps to Reproduce:
1.Create a machineset with invalid profile, for example
          profile: invalid

liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
machineset.machine.openshift.io/huliu-ibm410-n5gb4-1 created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                PHASE          TYPE       REGION   ZONE      AGE
huliu-ibm410-n5gb4-1-smfsb          Provisioning                                 3h59m
huliu-ibm410-n5gb4-master-0         Running        bx2-4x16   eu-gb    eu-gb-1   4h37m
huliu-ibm410-n5gb4-master-1         Running        bx2-4x16   eu-gb    eu-gb-2   4h37m
huliu-ibm410-n5gb4-master-2         Running        bx2-4x16   eu-gb    eu-gb-3   4h37m
huliu-ibm410-n5gb4-worker-1-72q6n   Running        bx2-4x16   eu-gb    eu-gb-1   4h30m
huliu-ibm410-n5gb4-worker-2-fppjj   Running        bx2-4x16   eu-gb    eu-gb-2   4h30m
huliu-ibm410-n5gb4-worker-3-x2ps7   Running        bx2-4x16   eu-gb    eu-gb-3   4h30m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-ibm410-n5gb4-1-smfsb -o yaml
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  creationTimestamp: "2022-03-10T02:58:03Z"
  finalizers:
  - machine.machine.openshift.io
  generateName: huliu-ibm410-n5gb4-1-
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: huliu-ibm410-n5gb4
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
    machine.openshift.io/cluster-api-machineset: huliu-ibm410-n5gb4-1
  name: huliu-ibm410-n5gb4-1-smfsb
  namespace: openshift-machine-api
  ownerReferences:
  - apiVersion: machine.openshift.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachineSet
    name: huliu-ibm410-n5gb4-1
    uid: 26c09889-d82e-4452-a48c-d6c41f734d57
  resourceVersion: "34579"
  uid: a43a0f9b-f8c5-4448-b38d-5f2886f65f5d
spec:
  lifecycleHooks: {}
  metadata: {}
  providerSpec:
    value:
      apiVersion: ibmcloudproviderconfig.openshift.io/v1beta1
      credentialsSecret:
        name: ibmcloud-credentials
      image: huliu-ibm410-n5gb4-rhcos
      kind: IBMCloudMachineProviderSpec
      metadata:
        creationTimestamp: null
      primaryNetworkInterface:
        securityGroups:
        - huliu-ibm410-n5gb4-sg-cluster-wide
        - huliu-ibm410-n5gb4-sg-openshift-net
        subnet: huliu-ibm410-n5gb4-subnet-compute-eu-gb-1
      profile: invalid
      region: eu-gb
      resourceGroup: huliu-ibm410-n5gb4
      userDataSecret:
        name: worker-user-data
      vpc: huliu-ibm410-n5gb4-vpc
      zone: eu-gb-1
status:
  conditions:
  - lastTransitionTime: "2022-03-10T02:58:05Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2022-03-10T02:58:05Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2022-03-10T02:58:05Z"
    status: "True"
    type: Terminable
  lastUpdated: "2022-03-10T02:59:11Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastProbeTime: "2022-03-10T02:59:11Z"
      lastTransitionTime: "2022-03-10T02:59:11Z"
      message: the provided instance profile ID does not exist
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreated
liuhuali@Lius-MacBook-Pro huali-test % 

Actual results:
Machine stuck in "Provisioning" phase, no InvalidConfiguration error.

Expected results:
Machine should in "Failed" phase

Additional info:
liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-5f9956cd56-xx7bl -c machine-controller |grep huliu-ibm410-n5gb4-1-smfsb
I0310 02:58:03.752644       1 controller.go:175] huliu-ibm410-n5gb4-1-smfsb: reconciling Machine
I0310 02:58:03.781492       1 controller.go:175] huliu-ibm410-n5gb4-1-smfsb: reconciling Machine
I0310 02:58:03.781583       1 actuator.go:122] huliu-ibm410-n5gb4-1-smfsb: Checking if machine exists
I0310 02:58:05.239453       1 controller.go:379] huliu-ibm410-n5gb4-1-smfsb: setting phase to Provisioning and requeuing
I0310 02:58:05.239489       1 controller.go:504] huliu-ibm410-n5gb4-1-smfsb: going into phase "Provisioning"
I0310 02:58:05.261134       1 controller.go:175] huliu-ibm410-n5gb4-1-smfsb: reconciling Machine
I0310 02:58:05.261179       1 actuator.go:122] huliu-ibm410-n5gb4-1-smfsb: Checking if machine exists
I0310 02:58:06.447862       1 controller.go:386] huliu-ibm410-n5gb4-1-smfsb: reconciling machine triggers idempotent create
I0310 02:58:06.447892       1 actuator.go:75] huliu-ibm410-n5gb4-1-smfsb: Creating machine
E0310 02:59:02.931441       1 reconciler.go:69] huliu-ibm410-n5gb4-1-smfsb: error occured while creating machine: %!w(*errors.errorString=&{the provided instance profile ID does not exist})
I0310 02:59:02.931580       1 machine_scope.go:156] "huliu-ibm410-n5gb4-1-smfsb": patching machine
E0310 02:59:02.968716       1 actuator.go:66] huliu-ibm410-n5gb4-1-smfsb error: huliu-ibm410-n5gb4-1-smfsb: reconciler failed to Create machine: failed to create instance via ibm vpc client: the provided instance profile ID does not exist
W0310 02:59:02.968756       1 controller.go:388] huliu-ibm410-n5gb4-1-smfsb: failed to create machine: huliu-ibm410-n5gb4-1-smfsb: reconciler failed to Create machine: failed to create instance via ibm vpc client: the provided instance profile ID does not exist
E0310 02:59:02.968803       1 controller.go:317] controller/machine_controller "msg"="Reconciler error" "error"="huliu-ibm410-n5gb4-1-smfsb: reconciler failed to Create machine: failed to create instance via ibm vpc client: the provided instance profile ID does not exist" "name"="huliu-ibm410-n5gb4-1-smfsb" "namespace"="openshift-machine-api" 
I0310 02:59:02.968855       1 controller.go:175] huliu-ibm410-n5gb4-1-smfsb: reconciling Machine
I0310 02:59:02.968865       1 actuator.go:122] huliu-ibm410-n5gb4-1-smfsb: Checking if machine exists
I0310 02:59:02.969358       1 logr.go:252] events "msg"="Warning"  "message"="huliu-ibm410-n5gb4-1-smfsb: reconciler failed to Create machine: failed to create instance via ibm vpc client: the provided instance profile ID does not exist" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"huliu-ibm410-n5gb4-1-smfsb","uid":"a43a0f9b-f8c5-4448-b38d-5f2886f65f5d","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"34218"} "reason"="FailedCreate"
I0310 02:59:04.189279       1 controller.go:386] huliu-ibm410-n5gb4-1-smfsb: reconciling machine triggers idempotent create

Comment 2 Huali Liu 2022-03-11 03:55:11 UTC
By the way, other cloud providers(aws, azure, gcp, alicloud) will turn into Failed phase in the case.

Comment 4 Christopher J Schaefer 2022-05-06 14:12:38 UTC
We (IBM) have added an internal tracking issue for this and will attempt to get this addressed for 4.11.

Comment 5 Joel Speed 2022-08-08 16:57:26 UTC
I checked in with Chris and this is still on the backlog for the IBM team

Comment 7 Christopher J Schaefer 2022-08-17 20:51:18 UTC
I tested a fix and it properly raises the necessary InvalidConfigurationMachineError so the deployment is labeled as Failed, versus Provisioning.

A PR with this update has been opened
https://github.com/openshift/machine-api-provider-ibmcloud/pull/4


# oc get machines -n openshift-machine-api; oc get -n openshift-machine-api machine.machine.openshift.io/bz2062579-us-east-3-hzb8t-worker-us-east-tsc48 -o yaml
NAME                                             PHASE     TYPE       REGION    ZONE        AGE
bz2062579-us-east-3-hzb8t-master-0               Running   bx2-4x16   us-east   us-east-1   41m
bz2062579-us-east-3-hzb8t-master-1               Running   bx2-4x16   us-east   us-east-2   41m
bz2062579-us-east-3-hzb8t-master-2               Running   bx2-4x16   us-east   us-east-3   41m
bz2062579-us-east-3-hzb8t-worker-1-bttbr         Running   bx2-4x16   us-east   us-east-1   33m
bz2062579-us-east-3-hzb8t-worker-2-5cvvh         Running   bx2-4x16   us-east   us-east-2   33m
bz2062579-us-east-3-hzb8t-worker-3-6pw58         Running   bx2-4x16   us-east   us-east-3   33m
bz2062579-us-east-3-hzb8t-worker-us-east-tsc48   Failed                                     3m59s
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: Unknown
  creationTimestamp: "2022-08-17T20:39:14Z"
  finalizers:
  - machine.machine.openshift.io
  generateName: bz2062579-us-east-3-hzb8t-worker-us-east-
  generation: 2
  labels:
    machine.openshift.io/cluster-api-cluster: bz2062579-us-east-3-hzb8t
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
    machine.openshift.io/cluster-api-machineset: bz2062579-us-east-3-hzb8t-worker-us-east
  name: bz2062579-us-east-3-hzb8t-worker-us-east-tsc48
  namespace: openshift-machine-api
  ownerReferences:
  - apiVersion: machine.openshift.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachineSet
    name: bz2062579-us-east-3-hzb8t-worker-us-east
    uid: 37b5b59a-0e77-4b4b-a101-894843b7603b
  resourceVersion: "32873"
  uid: 23e0e79f-fb98-4757-ba1b-d61b301819e1
spec:
  lifecycleHooks: {}
  metadata:
    labels:
      node-role.kubernetes.io/infra: ""
  providerSpec:
    value:
      apiVersion: ibmcloudproviderconfig.openshift.io/v1beta1
      credentialsSecret:
        name: ibmcloud-credentials
      image: bz2062579-us-east-3-hzb8t-rhcos
      kind: IBMCloudMachineProviderSpec
      metadata: {}
      primaryNetworkInterface:
        securityGroups:
        - bz2062579-us-east-3-hzb8t-sg-cluster-wide
        - bz2062579-us-east-3-hzb8t-sg-openshift-net
        subnet: bz2062579-us-east-3-hzb8t-subnet-compute-us-east-1
      profile: bad-profile
      region: us-east
      resourceGroup: bz2062579-us-east-3-hzb8t
      userDataSecret:
        name: worker-user-data
      vpc: bz2062579-us-east-3-hzb8t-vpc
      zone: us-east-1
status:
  conditions:
  - lastTransitionTime: "2022-08-17T20:39:16Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2022-08-17T20:39:16Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2022-08-17T20:39:16Z"
    status: "True"
    type: Terminable
  errorMessage: 'could not find instance profile: bad-profile'
  errorReason: InvalidConfiguration
  lastUpdated: "2022-08-17T20:39:22Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastProbeTime: "2022-08-17T20:39:22Z"
      lastTransitionTime: "2022-08-17T20:39:22Z"
      message: 'could not find instance profile: bad-profile'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreated

Comment 9 Huali Liu 2022-08-23 05:41:25 UTC
I am trying to verify this on the newest nightly build: 4.12.0-0.nightly-2022-08-22-143022
But the issue still exists.

liuhuali@Lius-MacBook-Pro huali-test % oc get machine                                                                                        
NAME                               PHASE          TYPE       REGION   ZONE      AGE
zhsunibm823-s9l4j-1-llh6d          Provisioning                                 41m
zhsunibm823-s9l4j-master-0         Running        bx2-4x16   eu-gb    eu-gb-1   86m
zhsunibm823-s9l4j-master-1         Running        bx2-4x16   eu-gb    eu-gb-2   86m
zhsunibm823-s9l4j-master-2         Running        bx2-4x16   eu-gb    eu-gb-3   86m
zhsunibm823-s9l4j-worker-1-rkqjl   Running        bx2-4x16   eu-gb    eu-gb-1   79m
zhsunibm823-s9l4j-worker-2-gcwz6   Running        bx2-4x16   eu-gb    eu-gb-2   79m
zhsunibm823-s9l4j-worker-3-z24vq   Running        bx2-4x16   eu-gb    eu-gb-3   79m

Then checking the image, found the repo https://github.com/openshift/machine-api-provider-ibmcloud is not included in the image. Does this repo need to be included in the image? How to do it?

$ oc adm release info registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-08-22-143022 --commits  | grep ibm                     
  ibm-cloud-controller-manager                   https://github.com/openshift/cloud-provider-ibm                             3d22fae892174bcdd99ddf679826b48ee7023a85
  ibm-vpc-block-csi-driver                       https://github.com/openshift/ibm-vpc-block-csi-driver                       80636ef324f4500b132a7cc236cc9d6caf95f5d0
  ibm-vpc-block-csi-driver-operator              https://github.com/openshift/ibm-vpc-block-csi-driver-operator              233dedbe6fd89dd14e3efa15462c0053f7d92ff5
  ibm-vpc-node-label-updater                     https://github.com/openshift/ibm-vpc-node-label-updater                     64c1820764f8a7065b03b08a70673b8c125876c1
  ibmcloud-machine-controllers                   https://github.com/openshift/cluster-api-provider-ibmcloud                  3bde969f2e83ca3dc2a57ef4001a194916e7fdd9

Comment 10 Joel Speed 2022-08-23 14:36:23 UTC
Waiting on https://issues.redhat.com/browse/ART-4438 to be complete before the image will be available within the payload

Comment 11 Huali Liu 2022-08-29 06:02:06 UTC
Verified on 4.12.0-0.nightly-2022-08-27-164831

1.Create a machineset with invalid profile, for example
          profile: invalid

liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
machineset.machine.openshift.io/huliu-ibm122-46s4v-test created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                PHASE     TYPE       REGION   ZONE      AGE
huliu-ibm122-46s4v-master-0         Running   bx2-4x16   eu-gb    eu-gb-1   4h21m
huliu-ibm122-46s4v-master-1         Running   bx2-4x16   eu-gb    eu-gb-2   4h21m
huliu-ibm122-46s4v-master-2         Running   bx2-4x16   eu-gb    eu-gb-3   4h21m
huliu-ibm122-46s4v-test-zlsfw       Failed                                  9s
huliu-ibm122-46s4v-worker-1-ftfsz   Running   bx2-4x16   eu-gb    eu-gb-1   16m
huliu-ibm122-46s4v-worker-2-9szft   Running   bx2-4x16   eu-gb    eu-gb-2   4h14m
huliu-ibm122-46s4v-worker-3-bvk9k   Running   bx2-4x16   eu-gb    eu-gb-3   12m

liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-ibm122-46s4v-test-zlsfw -o yaml
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: Unknown
  creationTimestamp: "2022-08-29T05:49:56Z"
  finalizers:
  - machine.machine.openshift.io
  generateName: huliu-ibm122-46s4v-test-
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: huliu-ibm122-46s4v
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
    machine.openshift.io/cluster-api-machineset: huliu-ibm122-46s4v-test
  name: huliu-ibm122-46s4v-test-zlsfw
  namespace: openshift-machine-api
  ownerReferences:
  - apiVersion: machine.openshift.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachineSet
    name: huliu-ibm122-46s4v-test
    uid: bfbf39ca-281a-4ed5-8440-be6101089931
  resourceVersion: "106337"
  uid: 3c8e84ec-bb88-438f-94e1-fc94e1270654
spec:
  lifecycleHooks: {}
  metadata: {}
  providerSpec:
    value:
      apiVersion: ibmcloudproviderconfig.openshift.io/v1beta1
      credentialsSecret:
        name: ibmcloud-credentials
      image: huliu-ibm122-46s4v-rhcos
      kind: IBMCloudMachineProviderSpec
      metadata:
        creationTimestamp: null
      primaryNetworkInterface:
        securityGroups:
        - huliu-ibm122-46s4v-sg-cluster-wide
        - huliu-ibm122-46s4v-sg-openshift-net
        subnet: huliu-ibm122-46s4v-subnet-compute-eu-gb-1
      profile: invalid
      region: eu-gb
      resourceGroup: huliu-ibm122-46s4v
      userDataSecret:
        name: worker-user-data
      vpc: huliu-ibm122-46s4v-vpc
      zone: eu-gb-1
status:
  conditions:
  - lastTransitionTime: "2022-08-29T05:49:58Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2022-08-29T05:49:58Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2022-08-29T05:49:58Z"
    status: "True"
    type: Terminable
  errorMessage: 'could not find instance profile: invalid'
  errorReason: InvalidConfiguration
  lastUpdated: "2022-08-29T05:50:02Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastProbeTime: "2022-08-29T05:50:02Z"
      lastTransitionTime: "2022-08-29T05:50:02Z"
      message: 'could not find instance profile: invalid'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreated

Comment 13 Christopher J Schaefer 2022-11-23 18:54:30 UTC
Doc Notes:

Validation was added to the IBM Cloud Machine-API Provider, to make sure the supplied MachineProvider profile matches an existing profile in the IBM Cloud Region.
https://github.com/openshift/machine-api-provider-ibmcloud/commit/fdaa78f59c69630466effd9d212f066409fb93e8

The fix landed in OCP 4.12, but I don't expect the fix to be backported to 4.11 or 4.10, as IBM Cloud IPI was still a Tech Preview in those releases.

To pick up this fix, it is recommended to use 4.12 or later.

Comment 14 Michael McCune 2022-11-30 14:07:00 UTC
thanks Christopher, i updated the doc text

Comment 16 errata-xmlrpc 2023-01-17 19:47:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.