Bug 2090182 - [Nutanix]Create a machineset with invalid image, machine stuck in "Provisioning" phase
Summary: [Nutanix]Create a machineset with invalid image, machine stuck in "Provisioni...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.11
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.11.0
Assignee: Yanhua Li
QA Contact: Huali Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-25 10:21 UTC by Huali Liu
Modified: 2022-08-10 11:14 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:14:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-provider-nutanix pull 17 0 None open Bug 2090182: [Nutanix]Create a machineset with invalid image, machine stuck in "Provisioning" phase 2022-06-03 13:49:07 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:14:28 UTC

Description Huali Liu 2022-05-25 10:21:56 UTC
Description of problem:
Create a machineset with invalid image, machine stuck in "Provisioning" phase

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-05-20-213928

How reproducible:
Always

Steps to Reproduce:
1.Create a machineset with invalid image, for example
          image:
            name: invalid
            type: name
liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
machineset.machine.openshift.io/huliu-n9-659pd-t5 created

liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                          PHASE          TYPE   REGION   ZONE   AGE
huliu-n9-659pd-master-0       Running                               7h55m
huliu-n9-659pd-master-1       Running                               7h55m
huliu-n9-659pd-master-2       Running                               7h55m
huliu-n9-659pd-t5-q45k8       Provisioning                          7m29s
huliu-n9-659pd-worker-4747l   Running                               7h52m
huliu-n9-659pd-worker-w4s9r   Running                               7h52m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-n9-659pd-t5-q45k8 -o yaml 
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  creationTimestamp: "2022-05-25T09:37:39Z"
  finalizers:
  - machine.machine.openshift.io
  generateName: huliu-n9-659pd-t5-
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: huliu-n9-659pd
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
    machine.openshift.io/cluster-api-machineset: huliu-n9-659pd-t5
  name: huliu-n9-659pd-t5-q45k8
  namespace: openshift-machine-api
  ownerReferences:
  - apiVersion: machine.openshift.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachineSet
    name: huliu-n9-659pd-t5
    uid: fdc7a156-2e88-44de-b738-81b93fdd433b
  resourceVersion: "186615"
  uid: eb1336f7-2955-4fd8-ac77-c8c2949f586a
spec:
  lifecycleHooks: {}
  metadata: {}
  providerSpec:
    value:
      apiVersion: machine.openshift.io/v1
      cluster:
        type: uuid
        uuid: 0005d9a4-8e4f-7c33-58d1-e9d0e2d48853
      credentialsSecret:
        name: nutanix-credentials
      image:
        name: invalid
        type: name
      kind: NutanixMachineProviderConfig
      memorySize: 16Gi
      metadata:
        creationTimestamp: null
      subnets:
      - type: uuid
        uuid: ae6e2fd8-79fe-4a88-a0d0-7d66cc45bdb1
      systemDiskSize: 120Gi
      userDataSecret:
        name: worker-user-data
      vcpuSockets: 4
      vcpusPerSocket: 1
status:
  conditions:
  - lastTransitionTime: "2022-05-25T09:37:40Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2022-05-25T09:37:40Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2022-05-25T09:37:40Z"
    status: "True"
    type: Terminable
  lastUpdated: "2022-05-25T09:37:40Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - message: 'failed to create VM: Failed to find image by name "invalid". error:
        %!w(<nil>)'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreation
    - message: Machine instance is not ready
      reason: Machine instance is not ready
      status: "False"
      type: MachineInstanceReady
liuhuali@Lius-MacBook-Pro huali-test % 


Actual results:
Machine stuck in "Provisioning" phase, no InvalidConfiguration error.

Expected results:
Machine should in "Failed" phase

Additional info:
liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-8678477b8c-gfdt9  -c machine-controller |grep huliu-n9-659pd-t5-q45k8 
I0525 09:37:39.660291       1 controller.go:175] huliu-n9-659pd-t5-q45k8: reconciling Machine
I0525 09:37:39.670899       1 controller.go:175] huliu-n9-659pd-t5-q45k8: reconciling Machine
I0525 09:37:39.670963       1 actuator.go:114] huliu-n9-659pd-t5-q45k8: actuator checking if machine exists
I0525 09:37:39.671217       1 vm.go:190] Checking if VM with name "huliu-n9-659pd-t5-q45k8" exists.
{"filter":"vm_name==huliu-n9-659pd-t5-q45k8"}
{"api_version":"3.1","metadata":{"filter": "vm_name==huliu-n9-659pd-t5-q45k8", "total_matches": 0, "kind": "vm", "length": 0, "offset": 0},"entities":[]}
E0525 09:37:40.007433       1 vm.go:202] Not Found VM by name "huliu-n9-659pd-t5-q45k8". error: VM_NOT_FOUND
I0525 09:37:40.007453       1 controller.go:379] huliu-n9-659pd-t5-q45k8: setting phase to Provisioning and requeuing
I0525 09:37:40.007467       1 controller.go:504] huliu-n9-659pd-t5-q45k8: going into phase "Provisioning"
I0525 09:37:40.018433       1 controller.go:175] huliu-n9-659pd-t5-q45k8: reconciling Machine
I0525 09:37:40.018456       1 actuator.go:114] huliu-n9-659pd-t5-q45k8: actuator checking if machine exists
I0525 09:37:40.018565       1 vm.go:190] Checking if VM with name "huliu-n9-659pd-t5-q45k8" exists.
{"filter":"vm_name==huliu-n9-659pd-t5-q45k8"}
{"api_version":"3.1","metadata":{"filter": "vm_name==huliu-n9-659pd-t5-q45k8", "total_matches": 0, "kind": "vm", "length": 0, "offset": 0},"entities":[]}
E0525 09:37:40.318424       1 vm.go:202] Not Found VM by name "huliu-n9-659pd-t5-q45k8". error: VM_NOT_FOUND
I0525 09:37:40.318447       1 controller.go:386] huliu-n9-659pd-t5-q45k8: reconciling machine triggers idempotent create
I0525 09:37:40.318455       1 actuator.go:76] huliu-n9-659pd-t5-q45k8: actuator creating machine
I0525 09:37:40.318619       1 reconciler.go:41] huliu-n9-659pd-t5-q45k8: creating machine
E0525 09:37:40.911558       1 reconciler.go:56] huliu-n9-659pd-t5-q45k8: error creating machine vm. error: Failed to find image by name "invalid". error: %!w(<nil>)
I0525 09:37:40.911565       1 machine_scope.go:210] huliu-n9-659pd-t5-q45k8: Updating providerStatus
I0525 09:37:40.911576       1 machine_scope.go:153] huliu-n9-659pd-t5-q45k8: patching machine
E0525 09:37:40.930520       1 actuator.go:67] error: huliu-n9-659pd-t5-q45k8: reconciler failed to Create machine: failed to create VM: Failed to find image by name "invalid". error: %!w(<nil>)
W0525 09:37:40.930586       1 controller.go:388] huliu-n9-659pd-t5-q45k8: failed to create machine: huliu-n9-659pd-t5-q45k8: reconciler failed to Create machine: failed to create VM: Failed to find image by name "invalid". error: %!w(<nil>)
E0525 09:37:40.930635       1 controller.go:317] controller/machine_controller "msg"="Reconciler error" "error"="huliu-n9-659pd-t5-q45k8: reconciler failed to Create machine: failed to create VM: Failed to find image by name \"invalid\". error: %!w(<nil>)" "name"="huliu-n9-659pd-t5-q45k8" "namespace"="openshift-machine-api" 
I0525 09:37:40.930729       1 logr.go:252] events "msg"="Warning"  "message"="huliu-n9-659pd-t5-q45k8: reconciler failed to Create machine: failed to create VM: Failed to find image by name \"invalid\". error: %!w(<nil>)" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"huliu-n9-659pd-t5-q45k8","uid":"eb1336f7-2955-4fd8-ac77-c8c2949f586a","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"186615"} "reason"="FailedCreate"
I0525 09:37:40.931004       1 controller.go:175] huliu-n9-659pd-t5-q45k8: reconciling Machine
I0525 09:37:40.931025       1 actuator.go:114] huliu-n9-659pd-t5-q45k8: actuator checking if machine exists

Similar as https://bugzilla.redhat.com/show_bug.cgi?id=2062579

Comment 1 Yanhua Li 2022-06-07 22:36:21 UTC
The fix is to add validation of the VM configuration fields in the machine.spec.providerSpec before calling the prism API to create a VM. If the validation fails, an InvalidMachineConfiguration error will return with all the VM configuration errors.

Comment 4 Huali Liu 2022-06-16 02:46:20 UTC
Verified on 4.11.0-0.nightly-2022-06-15-161625

Steps:
1.Create a machineset with invalid image
liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
machineset.machine.openshift.io/huliu-n19-xtt9d-1 created

2.Check the machine go into Failed phase, and shows InvalidConfiguration error
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
huliu-n19-xtt9d-1-ppvpf        Failed                           4s
huliu-n19-xtt9d-master-0       Running                          62m
huliu-n19-xtt9d-master-1       Running                          62m
huliu-n19-xtt9d-master-2       Running                          62m
huliu-n19-xtt9d-worker-96w2l   Running                          57m
huliu-n19-xtt9d-worker-x44fk   Running                          57m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-n19-xtt9d-1-ppvpf  -o yaml
...
status:
  conditions:
  - lastTransitionTime: "2022-06-16T02:38:37Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2022-06-16T02:38:37Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2022-06-16T02:38:37Z"
    status: "True"
    type: Terminable
  errorMessage: 'huliu-n19-xtt9d-1-ppvpf: failed in validating machine providerSpec:
    spec.providerSpec.value.image.name: Invalid value: "huliu-n19-xtt9d-rhcosqqq":
    Failed to find image with name "huliu-n19-xtt9d-rhcosqqq". error: Failed to find
    image by name "huliu-n19-xtt9d-rhcosqqq". error: %!w(<nil>)'
  errorReason: InvalidConfiguration
  lastUpdated: "2022-06-16T02:38:38Z"
  phase: Failed
  providerStatus:
    conditions:
    - message: 'huliu-n19-xtt9d-1-ppvpf: failed in validating machine providerSpec:
        spec.providerSpec.value.image.name: Invalid value: "huliu-n19-xtt9d-rhcosqqq":
        Failed to find image with name "huliu-n19-xtt9d-rhcosqqq". error: Failed to
        find image by name "huliu-n19-xtt9d-rhcosqqq". error: %!w(<nil>)'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreation
    - message: Machine instance is not ready
      reason: Machine instance is not ready
      status: "False"
      type: MachineInstanceReady

Comment 6 errata-xmlrpc 2022-08-10 11:14:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.