Bug 1861773 - [upi vsphere] Workers nodes CSRs are not automatically approved
Summary: [upi vsphere] Workers nodes CSRs are not automatically approved
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.5.z
Assignee: Alberto
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On: 1843384
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-29 14:11 UTC by Alberto
Modified: 2020-10-26 15:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1843384
Environment:
Last Closed: 2020-10-26 15:11:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1976 0 None closed bug 1861773: Remove IPI checks for vsphere hostname script and systemd unit 2021-01-22 15:34:10 UTC
Red Hat Product Errata RHBA-2020:4268 0 None None None 2020-10-26 15:12:16 UTC

Comment 3 sunzhaohua 2020-08-12 05:59:34 UTC
Verify failed
clusterversion: 4.5.0-0.nightly-2020-08-11-174348
Machine got stuck in provisioned status and the machine doesnt have an InternalIP.
steps: 
1. setup an upi vsphere cluster
2. modified machineset's "replicas" "networkName", "template","resourcePool" and added one tag in the vCenter
      providerSpec:
        value:
          apiVersion: vsphereprovider.openshift.io/v1beta1
          credentialsSecret:
            name: vsphere-cloud-credentials
          diskGiB: 120
          kind: VSphereMachineProviderSpec
          memoryMiB: 8192
          metadata:
            creationTimestamp: null
          network:
            devices:
            - networkName: VM Network
          numCPUs: 2
          numCoresPerSocket: 1
          snapshot: ""
          template: rhcos-46.82.202008111140-0
          userDataSecret:
            name: worker-user-data
          workspace:
            datacenter: dc1
            datastore: 10TB-GOLD
            folder: /dc1/vm/zhsunvsphere1-dmh68
            resourcePool: /dc1/host/devel/Resources
            server: vcsa2-qe.vmware.devcluster.openshift.com

3. check machines status and logs
$ oc get machine
NAME                               PHASE         TYPE   REGION   ZONE   AGE
zhsunvsphere1-dmh68-worker-w77pd   Provisioned                          35m

status:
  addresses:
  - address: zhsunvsphere1-dmh68-worker-w77pd
    type: InternalDNS
  lastUpdated: "2020-08-12T05:16:24Z"
  phase: Provisioned
  providerStatus:
    conditions:
    - lastProbeTime: "2020-08-12T05:16:14Z"
      lastTransitionTime: "2020-08-12T05:16:14Z"
      message: Machine successfully created
      reason: MachineCreationSucceeded
      status: "True"
      type: MachineCreation
    instanceId: 422ba03b-9481-be4e-e52f-8123455fad2b
    instanceState: poweredOn
    taskRef: task-30999

0812 05:52:39.480735       1 controller.go:169] zhsunvsphere1-dmh68-worker-w77pd: reconciling Machine
I0812 05:52:39.480892       1 actuator.go:80] zhsunvsphere1-dmh68-worker-w77pd: actuator checking if machine exists
I0812 05:52:39.499941       1 session.go:113] Find template by instance uuid: 9a22317a-a103-4b84-a494-7abb75e6db77
I0812 05:52:39.502783       1 reconciler.go:158] zhsunvsphere1-dmh68-worker-w77pd: already exists
I0812 05:52:39.502820       1 controller.go:277] zhsunvsphere1-dmh68-worker-w77pd: reconciling machine triggers idempotent update
I0812 05:52:39.502828       1 actuator.go:94] zhsunvsphere1-dmh68-worker-w77pd: actuator updating machine
I0812 05:52:39.521554       1 session.go:113] Find template by instance uuid: 9a22317a-a103-4b84-a494-7abb75e6db77
I0812 05:52:39.793092       1 reconciler.go:801] zhsunvsphere1-dmh68-worker-w77pd: Reconciling attached tags
I0812 05:52:39.906388       1 reconciler.go:211] zhsunvsphere1-dmh68-worker-w77pd: reconciling machine with cloud state
I0812 05:52:40.426696       1 reconciler.go:219] zhsunvsphere1-dmh68-worker-w77pd: reconciling providerID
I0812 05:52:40.431219       1 reconciler.go:224] zhsunvsphere1-dmh68-worker-w77pd: reconciling network
I0812 05:52:40.436215       1 reconciler.go:870] Getting network status: object reference: vm-4367
I0812 05:52:40.436277       1 reconciler.go:879] Getting network status: device: VM Network, macAddress: 00:50:56:ab:97:ff
I0812 05:52:40.436287       1 reconciler.go:884] Getting network status: getting guest info
I0812 05:52:40.438919       1 reconciler.go:329] zhsunvsphere1-dmh68-worker-w77pd: reconciling network: IP addresses: [{InternalDNS zhsunvsphere1-dmh68-worker-w77pd}]
I0812 05:52:40.438984       1 reconciler.go:229] zhsunvsphere1-dmh68-worker-w77pd: reconciling powerstate annotation
I0812 05:52:40.443037       1 reconciler.go:653] zhsunvsphere1-dmh68-worker-w77pd: Updating provider status
I0812 05:52:40.452930       1 machine_scope.go:101] zhsunvsphere1-dmh68-worker-w77pd: patching machine
I0812 05:52:40.477115       1 controller.go:293] zhsunvsphere1-dmh68-worker-w77pd: has no node yet, requeuing

$ oc logs -f machine-config-server-6958n -n openshift-machine-config-operator
I0812 04:49:21.758658       1 start.go:38] Version: v4.5.0-202008100413.p0-dirty (6b77b94f2ca25d6619ca2c686232b920039c4684)
I0812 04:49:21.779496       1 api.go:56] Launching server on :22624
I0812 04:49:21.779831       1 api.go:56] Launching server on :22623
I0812 04:49:25.547013       1 api.go:102] Pool worker requested by 136.144.52.223:44378
E0812 04:49:25.581968       1 api.go:108] couldn't get config for req: {worker}, error: could not fetch config , err: resource name may not be empty
I0812 04:49:30.583685       1 api.go:102] Pool worker requested by 136.144.52.223:44378

Comment 6 Joel Speed 2020-10-01 14:58:19 UTC
No one got round to this during this sprint, @Alberto are you keen to work on this one in particular or should we re-assign now?

Comment 7 sunzhaohua 2020-10-12 07:57:15 UTC
Test this again with clusterversion 4.5.0-0.nightly-2020-10-10-030038, it woks well, move to verified.

steps: 
1. setup an upi vsphere cluster
2. modified machineset's "replicas" "networkName", "template","resourcePool" and added one tag in the vCenter
      providerSpec:
        value:
          apiVersion: vsphereprovider.openshift.io/v1beta1
          credentialsSecret:
            name: vsphere-cloud-credentials
          diskGiB: 120
          kind: VSphereMachineProviderSpec
          memoryMiB: 8192
          metadata:
            creationTimestamp: null
          network:
            devices:
            - networkName: VM Network
          numCPUs: 2
          numCoresPerSocket: 1
          snapshot: ""
          template: jimatest14-x5z4m-rhcos
          userDataSecret:
            name: worker-user-data
          workspace:
            datacenter: dc1
            datastore: 10TB-GOLD
            folder: /dc1/vm/zhsun45vs-zv6ln
            resourcePool: /dc1/host/devel/Resources
            server: vcsa2-qe.vmware.devcluster.openshift.com

3. check machines status and logs
$ oc get machine
NAME                           PHASE     TYPE   REGION   ZONE   AGE
zhsun45vs-zv6ln-worker-hvnvx   Running                          16m

$ oc get node
NAME                           STATUS   ROLES    AGE   VERSION
compute-0                      Ready    worker   35m   v1.18.3+2fbd7c7
control-plane-0                Ready    master   46m   v1.18.3+2fbd7c7
control-plane-1                Ready    master   46m   v1.18.3+2fbd7c7
control-plane-2                Ready    master   46m   v1.18.3+2fbd7c7
zhsun45vs-zv6ln-worker-hvnvx   Ready    worker   13m   v1.18.3+2fbd7c7

Comment 10 errata-xmlrpc 2020-10-26 15:11:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.16 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4268


Note You need to log in before you can comment on or make changes to this bug.