Bug 1838984

Summary: 4.4 MachineSet with 4.2 or earlier bootimages fails to scale up because old CRI-O chokes on new CRI-O config
Product: OpenShift Container Platform Reporter: Sinny Kumari <skumari>
Component: Machine Config OperatorAssignee: Sinny Kumari <skumari>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: urgent    
Version: 4.3.zCC: bbreard, imcleod, jhou, jligon, jupierce, kgarriso, lmohanty, miabbott, mifiedle, mnguyen, mpatel, nstielau, scuppett, sdodson, smilner, walters, wking
Target Milestone: ---Keywords: Upgrades
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1830102 Environment:
Last Closed: 2020-06-17 20:28:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1830102    
Bug Blocks:    

Comment 3 Micah Abbott 2020-06-02 22:05:14 UTC
Verified with 4.3.0-0.nightly-2020-06-01-225519

1. Booted 4.2.34 cluster

```
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS                                                                                  
version   4.2.34    True        False         5m20s   Cluster version is 4.2.34           
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION                                         
ip-10-0-129-85.us-west-2.compute.internal    Ready    master   23m   v1.14.6-152-g117ba1f 
ip-10-0-133-59.us-west-2.compute.internal    Ready    worker   15m   v1.14.6-152-g117ba1f
ip-10-0-146-102.us-west-2.compute.internal   Ready    worker   15m   v1.14.6-152-g117ba1f                                                
ip-10-0-153-172.us-west-2.compute.internal   Ready    master   22m   v1.14.6-152-g117ba1f
ip-10-0-173-143.us-west-2.compute.internal   Ready    master   22m   v1.14.6-152-g117ba1f  
ip-10-0-173-154.us-west-2.compute.internal   Ready    worker   14m   v1.14.6-152-g117ba1f
```

2. Upgraded to 4.3.0-0.nightly-2020-06-01-225519

```
$ oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge
clusterversion.config.openshift.io/version patched

$ oc adm upgrade --allow-explicit-upgrade=true --force=true --to-image=registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2020-06-01-225519
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2020-06-01-225519

$ oc get clusterversion                                                                       
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-06-01-225519   True        False         3m35s   Cluster version is 4.3.0-0.nightly-2020-06-01-225519
$ oc get nodes                                       
NAME                                         STATUS   ROLES    AGE   VERSION                       
ip-10-0-129-85.us-west-2.compute.internal    Ready    master   81m   v1.16.2+18cfcc9                                                          
ip-10-0-133-59.us-west-2.compute.internal    Ready    worker   73m   v1.16.2+18cfcc9                               
ip-10-0-146-102.us-west-2.compute.internal   Ready    worker   73m   v1.16.2+18cfcc9                             
ip-10-0-153-172.us-west-2.compute.internal   Ready    master   81m   v1.16.2+18cfcc9                             
ip-10-0-173-143.us-west-2.compute.internal   Ready    master   81m   v1.16.2+18cfcc9 
ip-10-0-173-154.us-west-2.compute.internal   Ready    worker   73m   v1.16.2+18cfcc9

$ oc debug node/ip-10-0-129-85.us-west-2.compute.internal
Starting pod/ip-10-0-129-85us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.129.85
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a20fcdb0e02b8bfd610711c9231789c54b403d4fcf91c9eb8a89a31bb52d0b87
              CustomOrigin: Managed by machine-config-operator
                   Version: 43.81.202006011853.0 (2020-06-01T18:58:42Z)

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b64e472b57538ebd6808a1e0528d9ea83877207d29c091634f41627a609f9b04
              CustomOrigin: Managed by machine-config-operator
                   Version: 42.81.20200525.0 (2020-05-25T20:53:09Z)
```

3.  Checked MachineSets and scaled up one of the worker pools

```
$ oc get machinesets -n openshift-machine-api                                                               
NAME                                      DESIRED   CURRENT   READY   AVAILABLE   AGE 
miabbott-4-2-34-84khx-worker-us-west-2a   1         1         1       1           81m 
miabbott-4-2-34-84khx-worker-us-west-2b   1         1         1       1           81m                                 
miabbott-4-2-34-84khx-worker-us-west-2c   1         1         1       1           81m                                 
miabbott-4-2-34-84khx-worker-us-west-2d   0         0                             81m    
$ oc scale --replicas=2 machineset miabbott-4-2-34-84khx-worker-us-west-2a -n openshift-machine-api
machineset.machine.openshift.io/miabbott-4-2-34-84khx-worker-us-west-2a scaled

$ oc get machinesets -n openshift-machine-api
NAME                                      DESIRED   CURRENT   READY   AVAILABLE   AGE
miabbott-4-2-34-84khx-worker-us-west-2a   2         2         2       2           93m
miabbott-4-2-34-84khx-worker-us-west-2b   1         1         1       1           93m
miabbott-4-2-34-84khx-worker-us-west-2c   1         1         1       1           93m
miabbott-4-2-34-84khx-worker-us-west-2d   0         0                             93m
$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-129-85.us-west-2.compute.internal    Ready    master   94m     v1.16.2+18cfcc9
ip-10-0-132-188.us-west-2.compute.internal   Ready    worker   7m38s   v1.16.2+18cfcc9
ip-10-0-133-59.us-west-2.compute.internal    Ready    worker   86m     v1.16.2+18cfcc9
ip-10-0-146-102.us-west-2.compute.internal   Ready    worker   86m     v1.16.2+18cfcc9
ip-10-0-153-172.us-west-2.compute.internal   Ready    master   94m     v1.16.2+18cfcc9
ip-10-0-173-143.us-west-2.compute.internal   Ready    master   94m     v1.16.2+18cfcc9
ip-10-0-173-154.us-west-2.compute.internal   Ready    worker   86m     v1.16.2+18cfcc9

$ oc debug node/ip-10-0-132-188.us-west-2.compute.internal
Starting pod/ip-10-0-132-188us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.132.188
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a20fcdb0e02b8bfd610711c9231789c54b403d4fcf91c9eb8a89a31bb52d0b87
              CustomOrigin: Managed by machine-config-operator
                   Version: 43.81.202006011853.0 (2020-06-01T18:58:42Z)

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cc71fbd134f063d9fc0ccc78933b89c8dd2b1418b7a7b85bb70de87bc80486d7
              CustomOrigin: Image generated via coreos-assembler
                   Version: 42.80.20191002.0 (2019-10-02T13:31:28Z)
```

4.  Upgraded to 4.4.0-0.nightly-2020-06-02-093230

```
$ oc adm upgrade --allow-explicit-upgrade=true --force=true --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-06-02-093230
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-06-02-093230                                                                                                                                                                                       

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-06-02-093230   True        False         15m     Cluster version is 4.4.0-0.nightly-2020-06-02-093230
$ oc get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-129-85.us-west-2.compute.internal    Ready    master   145m   v1.17.1+f5fb168
ip-10-0-132-188.us-west-2.compute.internal   Ready    worker   58m    v1.17.1+f5fb168
ip-10-0-133-59.us-west-2.compute.internal    Ready    worker   137m   v1.17.1+f5fb168
ip-10-0-146-102.us-west-2.compute.internal   Ready    worker   137m   v1.17.1+f5fb168
ip-10-0-153-172.us-west-2.compute.internal   Ready    master   145m   v1.17.1+f5fb168
ip-10-0-173-143.us-west-2.compute.internal   Ready    master   145m   v1.17.1+f5fb168
ip-10-0-173-154.us-west-2.compute.internal   Ready    worker   137m   v1.17.1+f5fb168

$ oc debug node/ip-10-0-132-188.us-west-2.compute.internal
Starting pod/ip-10-0-132-188us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.132.188
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6ce91fcf3b244f86b2fbda6daa6a80c76ea99b0d4640dd64e10469999b540be2
              CustomOrigin: Managed by machine-config-operator
                   Version: 44.81.202006011547-0 (2020-06-01T15:52:16Z)

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a20fcdb0e02b8bfd610711c9231789c54b403d4fcf91c9eb8a89a31bb52d0b87
              CustomOrigin: Managed by machine-config-operator
                   Version: 43.81.202006011853.0 (2020-06-01T18:58:42Z)
```

5.  Checked MachineSets and scaled up a different worker pool

```
$ oc get machinesets -n openshift-machine-api
NAME                                      DESIRED   CURRENT   READY   AVAILABLE   AGE
miabbott-4-2-34-84khx-worker-us-west-2a   2         2         2       2           145m
miabbott-4-2-34-84khx-worker-us-west-2b   1         1         1       1           145m
miabbott-4-2-34-84khx-worker-us-west-2c   1         1         1       1           145m
miabbott-4-2-34-84khx-worker-us-west-2d   0         0                             145m
$ oc scale --replicas=2 machineset miabbott-4-2-34-84khx-worker-us-west-2b -n openshift-machine-api
machineset.machine.openshift.io/miabbott-4-2-34-84khx-worker-us-west-2b scaled

$ oc get machinesets -n openshift-machine-api
NAME                                      DESIRED   CURRENT   READY   AVAILABLE   AGE
miabbott-4-2-34-84khx-worker-us-west-2a   2         2         2       2           154m
miabbott-4-2-34-84khx-worker-us-west-2b   2         2         2       2           154m
miabbott-4-2-34-84khx-worker-us-west-2c   1         1         1       1           154m
miabbott-4-2-34-84khx-worker-us-west-2d   0         0                             154m
$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-129-85.us-west-2.compute.internal    Ready    master   155m    v1.17.1+f5fb168
ip-10-0-132-188.us-west-2.compute.internal   Ready    worker   68m     v1.17.1+f5fb168
ip-10-0-133-59.us-west-2.compute.internal    Ready    worker   147m    v1.17.1+f5fb168
ip-10-0-146-102.us-west-2.compute.internal   Ready    worker   147m    v1.17.1+f5fb168
ip-10-0-153-172.us-west-2.compute.internal   Ready    master   155m    v1.17.1+f5fb168
ip-10-0-159-46.us-west-2.compute.internal    Ready    worker   3m59s   v1.17.1+f5fb168
ip-10-0-173-143.us-west-2.compute.internal   Ready    master   155m    v1.17.1+f5fb168
ip-10-0-173-154.us-west-2.compute.internal   Ready    worker   147m    v1.17.1+f5fb168
$ oc debug node/ip-10-0-159-46.us-west-2.compute.internal
Starting pod/ip-10-0-159-46us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.159.46
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6ce91fcf3b244f86b2fbda6daa6a80c76ea99b0d4640dd64e10469999b540be2
              CustomOrigin: Managed by machine-config-operator
                   Version: 44.81.202006011547-0 (2020-06-01T15:52:16Z)

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cc71fbd134f063d9fc0ccc78933b89c8dd2b1418b7a7b85bb70de87bc80486d7
              CustomOrigin: Image generated via coreos-assembler
                   Version: 42.80.20191002.0 (2019-10-02T13:31:28Z)
```

Comment 7 errata-xmlrpc 2020-06-17 20:28:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2436