Bug 1877070

Summary: [4.4.z] RHCOS nodes' cri-o version is not consistent with RHEL
Product: OpenShift Container Platform Reporter: Micah Abbott <miabbott>
Component: RHCOSAssignee: Micah Abbott <miabbott>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: bbreard, imcleod, jhou, jligon, mnguyen, nstielau, tsze, wjiang, wking, yunjiang, yuxzhu
Target Milestone: ---Keywords: Regression
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1877069 Environment:
Last Closed: 2020-09-15 17:32:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1876724, 1877069    
Bug Blocks:    

Description Micah Abbott 2020-09-08 19:18:51 UTC
+++ This bug was initially created as a clone of Bug #1877069 +++

+++ This bug was initially created as a clone of Bug #1876724 +++

Description of problem:

Scale up a cluster with RHEL 7.8, checking the cri-o version against RHCOS and RHEL:

Red Hat Enterprise Linux CoreOS 44.82.202009030930-0 (Ootpa)   4.18.0-193.14.3.el8_2.x86_64   cri-o://1.16.6-18.rhaos4.3.git538d861.el8
Red Hat Enterprise Linux Server 7.8 (Maipo)                    3.10.0-1127.19.1.el7.x86_64    cri-o://1.17.5-4.rhaos4.4.git7f0085b.el7


TASK [Get the detailed cri-o version from RHCOS worker] ************************
Monday 07 September 2020  18:41:17 +0800 (0:00:00.799)       0:00:08.459 ****** 
changed: [localhost] => {"changed": true, "cmd": "oc get node ip-10-0-56-49.us-east-2.compute.internal  --output=jsonpath='{.status.nodeInfo.containerRuntimeVersion}' | awk -F '//' '{print $2}' | awk -F '-' '{print $1}'\n", "delta": "0:00:00.213235", "end": "2020-09-07 18:41:17.873516", "rc": 0, "start": "2020-09-07 18:41:17.660281", "stderr": "", "stderr_lines": [], "stdout": "1.16.6", "stdout_lines": ["1.16.6"]}

TASK [Compare the RHEL nodes' cri-o major version is consistent with RHCOS] ****
Monday 07 September 2020  18:41:17 +0800 (0:00:00.379)       0:00:08.839 ****** 
failed: [localhost] (item=ip-10-0-51-48.us-east-2.compute.internal) => {"ansible_loop_var": "item", "changed": true, "cmd": "oc get node ip-10-0-51-48.us-east-2.compute.internal --output=jsonpath='{.status.nodeInfo.containerRuntimeVersion}'  | awk -F '//' '{print $2}' | awk -F '-' '{print $1}'\n", "delta": "0:00:00.207157", "end": "2020-09-07 18:41:18.284359", "failed_when_result": true, "item": "ip-10-0-51-48.us-east-2.compute.internal", "rc": 0, "start": "2020-09-07 18:41:18.077202", "stderr": "", "stderr_lines": [], "stdout": "
", "stdout_lines": ["1.17.5"]}


Version-Release number of selected component (if applicable):
4.4.0-0.nightly-2020-09-07-061145

How reproducible:
100%

Steps to Reproduce:
1. install 4.4 cluster and scaleup with RHEL 7.8 on AWS
2.
3.

Actual results:
RHCOS nodes' cri-o version is not consistent with RHEL

Expected results:
RHCOS nodes' cri-o version is consistent with RHEL

Additional info:

--- Additional comment from Micah Abbott on 2020-09-08 17:27:41 UTC ---

Between Aug 25 and Sep 2, the correct version of `cri-o` was no longer available in the RHAOS 4.4 repo.  The RHCOS build process looked for the best match for the `cri-o` package and selected the version from the RHAOS 4.3 repos.

I'll work with ART to make sure that the correct version of `cri-o` is re-included in the RHAOS 4.4 repo.

--- Additional comment from Eric Paris on 2020-09-08 18:00:26 UTC ---

This bug sets Target Release equal to a z-stream but has no bug in the 'Depends On' field. As such this is not a valid bug state and the target release is being unset.

Any bug targeting 4.1.z must have a bug targeting 4.2 in 'Depends On.'
Similarly, any bug targeting 4.2.z must have a bug with Target Release of 4.3 in 'Depends On.'

Comment 1 Yuxiang Zhu 2020-09-09 12:42:10 UTC
This happened because older versions (cri-o-1.17.5-2.rhaos4.4.git7f0085b.el8 cri-o-1.17.5-3.rhaos4.4.git6b97f81.el8) of cri-o were tagged into rhaos-4.4-rhel-8-candidate after cri-o-1.17.5-4.rhaos4.4.git7f0085b.el8 was tagged in:

$ brew list-history --package=cri-o --tag=rhaos-4.4-rhel-8-candidate | tail
Mon Aug 10 16:49:31 2020 cri-o-1.17.4-24.rhaos4.4.git73658e6.el8 tagged into rhaos-4.4-rhel-8-candidate by lmandvek [still active]
Wed Aug 12 04:23:51 2020 cri-o-1.17.4-15.dev.rhaos4.4.git3572ab6.el8 untagged from rhaos-4.4-rhel-8-candidate by garbage-collector
Tue Aug 18 00:44:32 2020 cri-o-1.17.4-25.rhaos4.4.git462bd29.el8 tagged into rhaos-4.4-rhel-8-candidate by lmandvek [still active]
Tue Aug 18 11:21:45 2020 cri-o-1.17.5-2.rhaos4.4.git34a1ed2.el8 tagged into rhaos-4.4-rhel-8-candidate by lmandvek [still active]
Fri Aug 21 06:47:45 2020 cri-o-1.17.5-3.rhaos4.4.git75b5183.el8 tagged into rhaos-4.4-rhel-8-candidate by lmandvek [still active]
Mon Aug 24 18:48:22 2020 cri-o-1.17.5-4.rhaos4.4.git7f0085b.el8 tagged into rhaos-4.4-rhel-8-candidate by lmandvek [still active]
Sun Aug 30 07:15:07 2020 cri-o-1.17.5-2.rhaos4.4.git7f0085b.el8 tagged into rhaos-4.4-rhel-8-candidate by pehunt
Tue Sep  1 13:27:36 2020 cri-o-1.17.5-3.rhaos4.4.git6b97f81.el8 tagged into rhaos-4.4-rhel-8-candidate by pehunt
Mon Sep  7 23:18:47 2020 cri-o-1.17.5-3.rhaos4.4.git6b97f81.el8 untagged from rhaos-4.4-rhel-8-candidate by yuxzhu
Mon Sep  7 23:19:25 2020 cri-o-1.17.5-2.rhaos4.4.git7f0085b.el8 untagged from rhaos-4.4-rhel-8-candidate by yuxzhu

On Sep 7 I untagged those 2 older versions but apparently it was too late. 

The root cause is that ART leverages Errata Tool to sign RPMs. However Errata Tool doesn't allow us to attach an older version after a newer version is released. As a result, cri-o was excluded from the RHAOS repo created by our pipeline.

I've triggered a force rebuild of the repo. Hopefully cri-o will be there and ready for RHCOS to consume.

Comment 3 Micah Abbott 2020-09-09 14:16:35 UTC
The 4.4 plashet was updated this morning and a new RHCOS build (44.82.202009091324-0) with the correct version of `cri-o`

Moving to MODIFIED

Comment 7 weiwei jiang 2020-09-10 07:51:46 UTC
Checked with 4.4.0-0.nightly-2020-09-09-153044, and it's fixed now. 

qeci-8067-545wx-m-0.c.openshift-qe.internal     Ready    master   60m   v1.17.1+6af3663   10.0.0.5                    Red Hat Enterprise Linux CoreOS 44.82.202009091324-0 (Ootpa)   4.18.0-193.19.1.el8_2.x86_64   cri-o://1.17.5-4.rhaos4.4.git7f0085b.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/arch=amd64,kubernetes.io/hostname=qeci-8067-545wx-m-0.c.openshift-qe.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhcos,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-a", 
        "qeci-8067-545wx-m-1.c.openshift-qe.internal     Ready    master   59m   v1.17.1+6af3663   10.0.0.6                    Red Hat Enterprise Linux CoreOS 44.82.202009091324-0 (Ootpa)   4.18.0-193.19.1.el8_2.x86_64   cri-o://1.17.5-4.rhaos4.4.git7f0085b.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/arch=amd64,kubernetes.io/hostname=qeci-8067-545wx-m-1.c.openshift-qe.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhcos,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-b", 
        "qeci-8067-545wx-m-2.c.openshift-qe.internal     Ready    master   59m   v1.17.1+6af3663   10.0.0.4                    Red Hat Enterprise Linux CoreOS 44.82.202009091324-0 (Ootpa)   4.18.0-193.19.1.el8_2.x86_64   cri-o://1.17.5-4.rhaos4.4.git7f0085b.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-c,kubernetes.io/arch=amd64,kubernetes.io/hostname=qeci-8067-545wx-m-2.c.openshift-qe.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhcos,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-c", 
        "qeci-8067-545wx-w-a-0.c.openshift-qe.internal   Ready    worker   45m   v1.17.1+6af3663   10.0.32.2                   Red Hat Enterprise Linux CoreOS 44.82.202009091324-0 (Ootpa)   4.18.0-193.19.1.el8_2.x86_64   cri-o://1.17.5-4.rhaos4.4.git7f0085b.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/arch=amd64,kubernetes.io/hostname=qeci-8067-545wx-w-a-0.c.openshift-qe.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhcos,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-a", 
        "qeci-8067-545wx-w-a-l-rhel-0                    Ready    worker   54s   v1.17.1+6af3663   10.0.32.5                   Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1127.19.1.el7.x86_64    cri-o://1.17.5-4.rhaos4.4.git7f0085b.el7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/arch=amd64,kubernetes.io/hostname=qeci-8067-545wx-w-a-l-rhel-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhel,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-a", 
        "qeci-8067-545wx-w-a-l-rhel-1                    Ready    worker   48s   v1.17.1+6af3663   10.0.32.6                   Red Hat Enterprise Linux Server 7.7 (Maipo)                    3.10.0-1127.19.1.el7.x86_64    cri-o://1.17.5-4.rhaos4.4.git7f0085b.el7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/arch=amd64,kubernetes.io/hostname=qeci-8067-545wx-w-a-l-rhel-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhel,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-a", 
        "qeci-8067-545wx-w-b-1.c.openshift-qe.internal   Ready    worker   44m   v1.17.1+6af3663   10.0.32.3                   Red Hat Enterprise Linux CoreOS 44.82.202009091324-0 (Ootpa)   4.18.0-193.19.1.el8_2.x86_64   cri-o://1.17.5-4.rhaos4.4.git7f0085b.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/arch=amd64,kubernetes.io/hostname=qeci-8067-545wx-w-b-1.c.openshift-qe.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhcos,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-b", 
        "qeci-8067-545wx-w-c-2.c.openshift-qe.internal   Ready    worker   41m   v1.17.1+6af3663   10.0.32.4                   Red Hat Enterprise Linux CoreOS 44.82.202009091324-0 (Ootpa)   4.18.0-193.19.1.el8_2.x86_64   cri-o://1.17.5-4.rhaos4.4.git7f0085b.el8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-4,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-c,kubernetes.io/arch=amd64,kubernetes.io/hostname=qeci-8067-545wx-w-c-2.c.openshift-qe.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=n1-standard-4,node.openshift.io/os_id=rhcos,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-c

Comment 9 errata-xmlrpc 2020-09-15 17:32:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.21 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3605