Bug 1815435 - [RHV] New machine stuck at 'Provisioned' phase
Summary: [RHV] New machine stuck at 'Provisioned' phase
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.5.0
Assignee: Roy Golan
QA Contact: Jianwei Hou
URL:
Whiteboard: ovirt
Depends On:
Blocks: 1817853
TreeView+ depends on / blocked
 
Reported: 2020-03-20 09:27 UTC by Jianwei Hou
Modified: 2020-06-09 14:06 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1817853 (view as bug list)
Environment:
Last Closed: 2020-06-09 14:06:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Machine approver log (141.27 KB, text/plain)
2020-03-20 09:29 UTC, Jianwei Hou
no flags Details
machine- ontroller log (2.88 MB, text/plain)
2020-03-20 09:42 UTC, Jianwei Hou
no flags Details

Description Jianwei Hou 2020-03-20 09:27:27 UTC
Description of problem:
Scale up a machineset on RHV, new machine is stuck at 'Provisioned' phase and can not become running.

Version-Release number of selected component (if applicable):
4.4.0-0.nightly-2020-03-18-102708

How reproducible:
Always

Steps to Reproduce:
1. On an IPI on RHV deployment, clone(create) a machineset, with replicas=1
2. Machine phase can not become 'Running', it's stuck at 'Provisioned' state.
In the yaml output of the machine, it has a status.addresses with type=internalDNS, that is the node being provisioned.
3. List cluster nodes, found that node is already added to cluster with 'Ready' status before machine becomes 'Running'.
4. oc get csr, found that the csr is still pending, however its node already becomes 'Ready'. 

oc get csr
NAME        AGE     REQUESTOR                                                                   CONDITION
csr-2dblb   23m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-l5495   8m21s   system:node:machineset-clone-25436-dz8zk                                    Pending
csr-xcv9q   23m     system:node:machineset-clone-25436-dz8zk                                    Pending

machine-approver log:
I0320 08:34:06.591051       1 main.go:146] CSR csr-xcv9q added
I0320 08:34:06.643897       1 csr_check.go:418] retrieving serving cert from machineset-clone-25436-dz8zk (10.37.142.239:10250)
W0320 08:34:06.648505       1 csr_check.go:178] Failed to retrieve current serving cert: remote error: tls: internal error
I0320 08:34:06.649046       1 csr_check.go:183] Falling back to machine-api authorization for machineset-clone-25436-dz8zk
I0320 08:34:06.649116       1 main.go:181] CSR csr-xcv9q not authorized: No target machine for node "machineset-clone-25436-dz8zk"
I0320 08:34:06.649282       1 main.go:217] Error syncing csr csr-xcv9q: No target machine for node "machineset-clone-25436-dz8zk"

Actual results:
After step 2
# oc get machine machineset-clone-25436-dz8zk -o yaml -n openshift-machine-api

```
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    instance-status: |
      {"kind":"Machine","apiVersion":"machine.openshift.io/v1beta1","metadata":{"name":"machineset-clone-25436-dz8zk","generateName":"machineset-clone-25436-","namespace":"openshift-machine-api","selfLink":"/apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/machineset-clone-25436-dz8zk","uid":"268636af-2605-4940-808b-4c93af9f4439","resourceVersion":"1696820","generation":1,"creationTimestamp":"2020-03-20T08:24:00Z","labels":{"machine.openshift.io/cluster-api-cluster":"six-g7gng","machine.openshift.io/cluster-api-machine-role":"worker","machine.openshift.io/cluster-api-machine-type":"worker","machine.openshift.io/cluster-api-machineset":"machineset-clone-25436"},"annotations":{"VmId":"7c32b1e1-2d64-4d8f-a2bc-b8ebd3effa93","instance-status":""},"ownerReferences":[{"apiVersion":"machine.openshift.io/v1beta1","kind":"MachineSet","name":"machineset-clone-25436","uid":"30021136-c1ad-4a3b-afd5-b80aec5454fb","controller":true,"blockOwnerDeletion":true}],"finalizers":["machine.machine.openshift.io"]},"spec":{"metadata":{"creationTimestamp":null},"providerSpec":{"value":{"apiVersion":"ovirtproviderconfig.openshift.io/v1beta1","cluster_id":"7321c6b7-9abb-4de6-affa-b4ce9b4124a0","credentialsSecret":{"name":"ovirt-credentials"},"id":"","kind":"OvirtMachineProviderSpec","metadata":{"creationTimestamp":null},"name":"","template_name":"six-g7gng-rhcos","userDataSecret":{"name":"worker-user-data"}}},"providerID":"7c32b1e1-2d64-4d8f-a2bc-b8ebd3effa93"},"status":{"lastUpdated":"2020-03-20T08:40:49Z","providerStatus":{"metadata":{"creationTimestamp":null},"instanceId":"machineset-clone-25436-dz8zk","instanceState":"up","conditions":[{"type":"MachineCreated","status":"True","lastProbeTime":"2020-03-20T08:40:49Z","lastTransitionTime":"2020-03-20T08:40:49Z","reason":"machineCreationSucceedReason","message":"machineCreationSucceedMessage"}]},"addresses":[{"type":"InternalDNS","address":"machineset-clone-25436-dz8zk"},{"type":"InternalIP","address":"10.37.142.239"}],"phase":"Provisioned"}}
  creationTimestamp: "2020-03-20T08:24:00Z"
  finalizers:
  - machine.machine.openshift.io
  generateName: machineset-clone-25436-
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: six-g7gng
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
    machine.openshift.io/cluster-api-machineset: machineset-clone-25436
  name: machineset-clone-25436-dz8zk
  namespace: openshift-machine-api
  ownerReferences:
  - apiVersion: machine.openshift.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachineSet
    name: machineset-clone-25436
    uid: 30021136-c1ad-4a3b-afd5-b80aec5454fb
  resourceVersion: "1696850"
  selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/machineset-clone-25436-dz8zk
  uid: 268636af-2605-4940-808b-4c93af9f4439
spec:
  metadata:
    creationTimestamp: null
  providerSpec:
    value:
      apiVersion: ovirtproviderconfig.openshift.io/v1beta1
      cluster_id: 7321c6b7-9abb-4de6-affa-b4ce9b4124a0
      credentialsSecret:
        name: ovirt-credentials
      id: ""
      kind: OvirtMachineProviderSpec
      metadata:
        creationTimestamp: null
      name: ""
      template_name: six-g7gng-rhcos
      userDataSecret:
        name: worker-user-data
status:
  addresses:
  - address: machineset-clone-25436-dz8zk
    type: InternalDNS
  - address: 10.37.142.239
    type: InternalIP
  lastUpdated: "2020-03-20T08:40:49Z"
  phase: Provisioned
  providerStatus:
    conditions:
    - lastProbeTime: "2020-03-20T08:40:49Z"
      lastTransitionTime: "2020-03-20T08:40:49Z"
      message: machineCreationSucceedMessage
      reason: machineCreationSucceedReason
      status: "True"
      type: MachineCreated
    instanceId: machineset-clone-25436-dz8zk
    instanceState: up
    metadata:
      creationTimestamp: null
```

After step 3:
# oc get node machineset-clone-25436-dz8zk -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    machineconfiguration.openshift.io/currentConfig: rendered-worker-3a7892df1405153af5843ced2eb2124f
    machineconfiguration.openshift.io/desiredConfig: rendered-worker-3a7892df1405153af5843ced2eb2124f
    machineconfiguration.openshift.io/reason: ""
    machineconfiguration.openshift.io/state: Done
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2020-03-20T08:28:36Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: machineset-clone-25436-dz8zk
    kubernetes.io/os: linux
    node-role.kubernetes.io/worker: ""
    node.openshift.io/os_id: rhcos
  name: machineset-clone-25436-dz8zk
  resourceVersion: "1700847"
  selfLink: /api/v1/nodes/machineset-clone-25436-dz8zk
  uid: 8191284b-25fd-44a7-8ecd-1d7839c94e0c
spec: {}
status:
  addresses:
  - address: 10.37.142.239
    type: InternalIP
  - address: machineset-clone-25436-dz8zk
    type: Hostname
  allocatable:
    cpu: 3500m
    ephemeral-storage: "13879457768"
    hugepages-2Mi: "0"
    memory: 15258944Ki
    pods: "250"
  capacity:
    cpu: "4"
    ephemeral-storage: 16225260Ki
    hugepages-2Mi: "0"
    memory: 16409920Ki
    pods: "250"
  conditions:
  - lastHeartbeatTime: "2020-03-20T08:46:18Z"
    lastTransitionTime: "2020-03-20T08:28:36Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2020-03-20T08:46:18Z"
    lastTransitionTime: "2020-03-20T08:28:36Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2020-03-20T08:46:18Z"
    lastTransitionTime: "2020-03-20T08:28:36Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2020-03-20T08:46:18Z"
    lastTransitionTime: "2020-03-20T08:30:27Z"
    message: kubelet is posting ready status
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:683de47331b97c6b9fda23e5e4d30ca6ed204cb1af255d92fee0609beb35e4e8
    - <none>:<none>
    sizeBytes: 819364305
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7a2d829994da699c14da9ec713c697ad81ae4667c6f33b3842246fb644ffb5a2
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 467168186
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1be47db5f7ade2960581367a23c2574b774b0f60a5672bdafd8589b1c87779f6
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 443387144
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:108cef723b7ea6e1322e7addc6d8ede14200b4801d2f6acb1d4c2c446376421c
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 429699040
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:feab9b86f8829fc5e98a5e04ee5dfd2169e165fdefed183d544f56829409dd32
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 364316978
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d71d621566014b622fcc830e390d77fd831d28cf51770309f50aa1a541aec3ee
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 342462123
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f2a1a49204ec502c0c567688faa921f1165d8b130e1360c8fe2916932a6ad7c8
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 337457196
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3b9c1bd3aea24770c1a6daf3448d84d46f3e76433041d1ba21132c240eb49ef9
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 325417773
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4d1150eeb6e587161b278595f90580e79f21479a3fed95a56b7f9b3165e2bb58
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 318520021
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:480153ace2b8733a3cfc1c3f521f972b605efd1ea89ccd5786bfd9d7667af2d2
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 317938245
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1f0de2e2f25da5df864b7ccf993d84a4032c912ad6d8bf4fccd62d3ee6877a8b
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 312586096
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8b1ef25de733758ec14635f4e4a4951cd0d11cfbe031ae3fb030bada9db4abc2
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 305426192
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6662962a1e3cf0867e520206bf679609d59a9ec3b6c257cd9eeb4bf331323bda
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 283277682
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eea59d098d0a88657c3e268adc6c117a4050a377ec373d4806e21e2b6cb39dbb
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 267495412
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5782342a322303a684ac6de9bd05f931b7422e866285a8ba869c2ccaed209914
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 264047880
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dc45c8a26642cee797c6daa32ff10f9059192bc7c0bbab3ac955a6c53935e21d
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 255672168
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b31063f8cb53a137f515394838dd4ede503bcf94125e64c2e87401b5243bd067
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 250871136
  - names:
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a79b8c47f97205626b95fba4d54f356c4f8f30491becd33155582744530aba2
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none>
    sizeBytes: 237179055
  nodeInfo:
    architecture: amd64
    bootID: 3eed5f48-772d-4f41-8e17-e519ae2eef80
    containerRuntimeVersion: cri-o://1.17.0-9.dev.rhaos4.4.gitdfc8414.el8
    kernelVersion: 4.18.0-147.5.1.el8_1.x86_64
    kubeProxyVersion: v1.17.1
    kubeletVersion: v1.17.1
    machineID: 7c32b1e12d644d8fa2bcb8ebd3effa93
    operatingSystem: linux
    osImage: Red Hat Enterprise Linux CoreOS 44.81.202003180730-0 (Ootpa)
    systemUUID: 7c32b1e1-2d64-4d8f-a2bc-b8ebd3effa93
Expected results:


Additional info:

Comment 1 Jianwei Hou 2020-03-20 09:29:54 UTC
Created attachment 1671807 [details]
Machine approver log

Comment 2 Jianwei Hou 2020-03-20 09:42:56 UTC
Created attachment 1671821 [details]
machine- ontroller log

On deleting the machineset and machine, node can not be removed automatically.

# oc get nodes
NAME                           STATUS     ROLES    AGE   VERSION
machineset-clone-25436-dz8zk   NotReady   worker   70m   v1.17.1
six-g7gng-master-0             Ready      master   41h   v1.17.1
six-g7gng-master-1             Ready      master   41h   v1.17.1
six-g7gng-master-2             NotReady   master   41h   v1.17.1
six-g7gng-worker-0-4lfbj       Ready      worker   41h   v1.17.1
six-g7gng-worker-0-569xv       Ready      worker   41h   v1.17.1

Comment 3 Sandro Bonazzola 2020-06-09 14:06:30 UTC
Looking at https://bugzilla.redhat.com/show_bug.cgi?id=1817853#c8 I'm closing this as well as not a bug for the same reason.


Note You need to log in before you can comment on or make changes to this bug.