Bug 1890362 - workers may fail to join the cluster during an update from 4.5
Summary: workers may fail to join the cluster during an update from 4.5
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
Depends On: 1890250
TreeView+ depends on / blocked
Reported: 2020-10-22 01:33 UTC by Eric Paris
Modified: 2021-04-15 07:57 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1890250
Last Closed: 2020-10-27 16:47:52 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2168 0 None closed Bug 1890362: mcs: Ensure that the encapsulated config is spec 2 if requested 2020-12-17 03:42:06 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:48:11 UTC

Comment 1 sunzhaohua 2020-10-22 08:13:14 UTC
Verified this by upgrading from 4.5.0-0.nightly-2020-10-21-224736 to 4.6.0-0.nightly-2020-10-22-034051. worker node could join the cluster.
1. Provision a 4.5 cluster
2. Create a PDB for the deployment. Node drain would not succeed because PDB prevents it
3. Oc adm upgrade 
4. Wait looking at `oc -n openshift-machine-config-operator get ds/machine-config-server` for the new MCS to roll out; you can use e.g. `oc -n openshift-machine-config-operator logs pod/machine-config-server-xyz` and verify it shows its version as 4.6.
5. Verify in `oc get machineconfigpool/worker` that the pool is still progressing (you have at least one worker blocked)
6. Try scaling up a worker machineset via e.g. `oc -n openshift-machine-api scale machineset/worker-xyz`

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-10-22-034051   True        False         7m17s   Cluster version is 4.6.0-0.nightly-2020-10-22-034051

$ oc get machine
NAME                                         PHASE     TYPE         REGION      ZONE         AGE
zhsun22aws1-4d6b9-master-0                   Running   m4.xlarge    us-east-2   us-east-2a   3h39m
zhsun22aws1-4d6b9-master-1                   Running   m4.xlarge    us-east-2   us-east-2b   3h39m
zhsun22aws1-4d6b9-master-2                   Running   m4.xlarge    us-east-2   us-east-2c   3h39m
zhsun22aws1-4d6b9-worker-us-east-2a-6vblc    Running   m4.large     us-east-2   us-east-2a   9m27s
zhsun22aws1-4d6b9-worker-us-east-2aa-m8d4t   Running   m5.4xlarge   us-east-2   us-east-2a   78m
zhsun22aws1-4d6b9-worker-us-east-2b-2r695    Running   m4.large     us-east-2   us-east-2b   9m27s
zhsun22aws1-4d6b9-worker-us-east-2b-s6rmx    Running   m4.large     us-east-2   us-east-2b   3h26m
zhsun22aws1-4d6b9-worker-us-east-2c-44fkq    Running   m4.large     us-east-2   us-east-2c   3h26m

$ oc get node
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-144-169.us-east-2.compute.internal   Ready    worker   24m     v1.19.0+d59ce34
ip-10-0-154-83.us-east-2.compute.internal    Ready    master   3h50m   v1.19.0+d59ce34
ip-10-0-159-91.us-east-2.compute.internal    Ready    worker   94m     v1.19.0+d59ce34
ip-10-0-177-122.us-east-2.compute.internal   Ready    worker   3h41m   v1.19.0+d59ce34
ip-10-0-177-223.us-east-2.compute.internal   Ready    worker   23m     v1.19.0+d59ce34
ip-10-0-180-169.us-east-2.compute.internal   Ready    master   3h50m   v1.19.0+d59ce34
ip-10-0-200-202.us-east-2.compute.internal   Ready    master   3h50m   v1.19.0+d59ce34
ip-10-0-209-66.us-east-2.compute.internal    Ready    worker   3h41m   v1.19.0+d59ce34

Comment 6 errata-xmlrpc 2020-10-27 16:47:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.