Bug 1861773

Summary: [upi vsphere] Workers nodes CSRs are not automatically approved
Product: OpenShift Container Platform Reporter: Alberto <agarcial>
Component: Cloud ComputeAssignee: Alberto <agarcial>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: ademicev, agarcial, zhsun
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1843384 Environment:
Last Closed: 2020-10-26 15:11:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1843384    
Bug Blocks:    

Comment 3 sunzhaohua 2020-08-12 05:59:34 UTC
Verify failed
clusterversion: 4.5.0-0.nightly-2020-08-11-174348
Machine got stuck in provisioned status and the machine doesnt have an InternalIP.
steps: 
1. setup an upi vsphere cluster
2. modified machineset's "replicas" "networkName", "template","resourcePool" and added one tag in the vCenter
      providerSpec:
        value:
          apiVersion: vsphereprovider.openshift.io/v1beta1
          credentialsSecret:
            name: vsphere-cloud-credentials
          diskGiB: 120
          kind: VSphereMachineProviderSpec
          memoryMiB: 8192
          metadata:
            creationTimestamp: null
          network:
            devices:
            - networkName: VM Network
          numCPUs: 2
          numCoresPerSocket: 1
          snapshot: ""
          template: rhcos-46.82.202008111140-0
          userDataSecret:
            name: worker-user-data
          workspace:
            datacenter: dc1
            datastore: 10TB-GOLD
            folder: /dc1/vm/zhsunvsphere1-dmh68
            resourcePool: /dc1/host/devel/Resources
            server: vcsa2-qe.vmware.devcluster.openshift.com

3. check machines status and logs
$ oc get machine
NAME                               PHASE         TYPE   REGION   ZONE   AGE
zhsunvsphere1-dmh68-worker-w77pd   Provisioned                          35m

status:
  addresses:
  - address: zhsunvsphere1-dmh68-worker-w77pd
    type: InternalDNS
  lastUpdated: "2020-08-12T05:16:24Z"
  phase: Provisioned
  providerStatus:
    conditions:
    - lastProbeTime: "2020-08-12T05:16:14Z"
      lastTransitionTime: "2020-08-12T05:16:14Z"
      message: Machine successfully created
      reason: MachineCreationSucceeded
      status: "True"
      type: MachineCreation
    instanceId: 422ba03b-9481-be4e-e52f-8123455fad2b
    instanceState: poweredOn
    taskRef: task-30999

0812 05:52:39.480735       1 controller.go:169] zhsunvsphere1-dmh68-worker-w77pd: reconciling Machine
I0812 05:52:39.480892       1 actuator.go:80] zhsunvsphere1-dmh68-worker-w77pd: actuator checking if machine exists
I0812 05:52:39.499941       1 session.go:113] Find template by instance uuid: 9a22317a-a103-4b84-a494-7abb75e6db77
I0812 05:52:39.502783       1 reconciler.go:158] zhsunvsphere1-dmh68-worker-w77pd: already exists
I0812 05:52:39.502820       1 controller.go:277] zhsunvsphere1-dmh68-worker-w77pd: reconciling machine triggers idempotent update
I0812 05:52:39.502828       1 actuator.go:94] zhsunvsphere1-dmh68-worker-w77pd: actuator updating machine
I0812 05:52:39.521554       1 session.go:113] Find template by instance uuid: 9a22317a-a103-4b84-a494-7abb75e6db77
I0812 05:52:39.793092       1 reconciler.go:801] zhsunvsphere1-dmh68-worker-w77pd: Reconciling attached tags
I0812 05:52:39.906388       1 reconciler.go:211] zhsunvsphere1-dmh68-worker-w77pd: reconciling machine with cloud state
I0812 05:52:40.426696       1 reconciler.go:219] zhsunvsphere1-dmh68-worker-w77pd: reconciling providerID
I0812 05:52:40.431219       1 reconciler.go:224] zhsunvsphere1-dmh68-worker-w77pd: reconciling network
I0812 05:52:40.436215       1 reconciler.go:870] Getting network status: object reference: vm-4367
I0812 05:52:40.436277       1 reconciler.go:879] Getting network status: device: VM Network, macAddress: 00:50:56:ab:97:ff
I0812 05:52:40.436287       1 reconciler.go:884] Getting network status: getting guest info
I0812 05:52:40.438919       1 reconciler.go:329] zhsunvsphere1-dmh68-worker-w77pd: reconciling network: IP addresses: [{InternalDNS zhsunvsphere1-dmh68-worker-w77pd}]
I0812 05:52:40.438984       1 reconciler.go:229] zhsunvsphere1-dmh68-worker-w77pd: reconciling powerstate annotation
I0812 05:52:40.443037       1 reconciler.go:653] zhsunvsphere1-dmh68-worker-w77pd: Updating provider status
I0812 05:52:40.452930       1 machine_scope.go:101] zhsunvsphere1-dmh68-worker-w77pd: patching machine
I0812 05:52:40.477115       1 controller.go:293] zhsunvsphere1-dmh68-worker-w77pd: has no node yet, requeuing

$ oc logs -f machine-config-server-6958n -n openshift-machine-config-operator
I0812 04:49:21.758658       1 start.go:38] Version: v4.5.0-202008100413.p0-dirty (6b77b94f2ca25d6619ca2c686232b920039c4684)
I0812 04:49:21.779496       1 api.go:56] Launching server on :22624
I0812 04:49:21.779831       1 api.go:56] Launching server on :22623
I0812 04:49:25.547013       1 api.go:102] Pool worker requested by 136.144.52.223:44378
E0812 04:49:25.581968       1 api.go:108] couldn't get config for req: {worker}, error: could not fetch config , err: resource name may not be empty
I0812 04:49:30.583685       1 api.go:102] Pool worker requested by 136.144.52.223:44378

Comment 6 Joel Speed 2020-10-01 14:58:19 UTC
No one got round to this during this sprint, @Alberto are you keen to work on this one in particular or should we re-assign now?

Comment 7 sunzhaohua 2020-10-12 07:57:15 UTC
Test this again with clusterversion 4.5.0-0.nightly-2020-10-10-030038, it woks well, move to verified.

steps: 
1. setup an upi vsphere cluster
2. modified machineset's "replicas" "networkName", "template","resourcePool" and added one tag in the vCenter
      providerSpec:
        value:
          apiVersion: vsphereprovider.openshift.io/v1beta1
          credentialsSecret:
            name: vsphere-cloud-credentials
          diskGiB: 120
          kind: VSphereMachineProviderSpec
          memoryMiB: 8192
          metadata:
            creationTimestamp: null
          network:
            devices:
            - networkName: VM Network
          numCPUs: 2
          numCoresPerSocket: 1
          snapshot: ""
          template: jimatest14-x5z4m-rhcos
          userDataSecret:
            name: worker-user-data
          workspace:
            datacenter: dc1
            datastore: 10TB-GOLD
            folder: /dc1/vm/zhsun45vs-zv6ln
            resourcePool: /dc1/host/devel/Resources
            server: vcsa2-qe.vmware.devcluster.openshift.com

3. check machines status and logs
$ oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
zhsun45vs-zv6ln-worker-hvnvx   Running                          16m

$ oc get node
NAME                           STATUS   ROLES    AGE   VERSION
compute-0                      Ready    worker   35m   v1.18.3+2fbd7c7
control-plane-0                Ready    master   46m   v1.18.3+2fbd7c7
control-plane-1                Ready    master   46m   v1.18.3+2fbd7c7
control-plane-2                Ready    master   46m   v1.18.3+2fbd7c7
zhsun45vs-zv6ln-worker-hvnvx   Ready    worker   13m   v1.18.3+2fbd7c7

Comment 10 errata-xmlrpc 2020-10-26 15:11:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.16 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4268