Bug 1852357 - RHEL worker scale-up failed for OCP 4.6 due to "No package matching 'cri-o-1.18.*' found available"
Summary: RHEL worker scale-up failed for OCP 4.6 due to "No package matching 'cri-o-1....
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: aos-install
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On: 1861097
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-30 08:59 UTC by xiyuan
Modified: 2020-10-27 16:10 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:10:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:10:55 UTC

Description xiyuan 2020-06-30 08:59:19 UTC
Description of problem:
RHEL worker scaleup failed due to "No package matching 'cri-o-1.18.*' found available"

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-06-26-035408

How reproducible:
Always

Steps to Reproduce:
1. install OCP 4.6(4.6.0-0.nightly-2020-06-26-035408)
2. scaleup rhel worker

Actual results:
scalup rhel worker failed at playbook 
TASK [openshift_node : Install openshift packages] {"ansible_job_id": "107165796205.7457", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-1.18.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-1.18.*' found available, installed or updated"]}
Checked in latest ocp4.6 puddles repo, cri-o-1.19 is there.
http://download.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/4.6/latest/x86_64/os/Packages/
$ oc get node -o wide
NAME                                        STATUS   ROLES    AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION          CONTAINER-RUNTIME
ip-10-0-54-108.us-east-2.compute.internal   Ready    worker   88m   v1.18.3+ba54539   10.0.54.108   <none>        Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa)   4.18.0-211.el8.x86_64   cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev
ip-10-0-56-154.us-east-2.compute.internal   Ready    master   97m   v1.18.3+ba54539   10.0.56.154   <none>        Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa)   4.18.0-211.el8.x86_64   cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev
ip-10-0-57-68.us-east-2.compute.internal    Ready    master   97m   v1.18.3+ba54539   10.0.57.68    <none>        Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa)   4.18.0-211.el8.x86_64   cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev
ip-10-0-58-9.us-east-2.compute.internal     Ready    worker   89m   v1.18.3+ba54539   10.0.58.9     <none>        Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa)   4.18.0-211.el8.x86_64   cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev
ip-10-0-65-138.us-east-2.compute.internal   Ready    worker   88m   v1.18.3+ba54539   10.0.65.138   <none>        Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa)   4.18.0-211.el8.x86_64   cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev
ip-10-0-77-178.us-east-2.compute.internal   Ready    master   96m   v1.18.3+ba54539   10.0.77.178   <none>        Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa)   4.18.0-211.el8.x86_64   cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev

Expected results:
1. Rhel scal-up succeed

Additional info:

Comment 1 Peter Hunt 2020-07-08 18:16:01 UTC
the scaleup scripts should be using 1.19, as that's what's in the 4.6 puddle now. Moving to installer

Comment 2 Gaoyun Pei 2020-07-09 01:25:02 UTC
The kubernetes version is still 1.18 in OCP-4.6 nightly, we need the rebase of OCP on top of k8s 1.19, the steps openshift-ansible playbook did was expected.

TASK [openshift_node : Set fact l_kubernetes_version] **************************
Tuesday 30 June 2020  15:55:55 +0800 (0:00:00.577)       0:05:39.098 ********** 
ok: [ip-10-0-58-77.us-east-2.compute.internal] => {"ansible_facts": {"l_kubernetes_version": "1.18"}, "changed": false}
ok: [ip-10-0-50-230.us-east-2.compute.internal] => {"ansible_facts": {"l_kubernetes_version": "1.18"}, "changed": false}

Comment 3 Russell Teague 2020-07-09 13:17:09 UTC
By design the openshift-ansible scaleup playbooks install cri-o based on the cluster kubernetes version [1].  In CI, we have an override (ci_version_override) for this behavior because during development cycles the package versions available are not always in step with kubernetes rebases or release branching.  This override should not be used outside of CI to ensure we continue to validate installs with proper versions.

This is not a bug.


[1] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/defaults/main.yml#L13

Comment 4 Abhinav Dahiya 2020-07-10 17:54:18 UTC
We are waiting for kube rebase and when that happens scaleup should start picking up the correct version. So this bug will have to wait until that happens.

Comment 5 Johnny Liu 2020-07-21 07:16:09 UTC
In today's scaleup testing with openshift-ansible-4.6.0-202007161549.p0.git.0.8f2f0c3.el7.noarch, seem like error message has some change, it is expecting cri-o-4.6, but not cri-o 1.8/1.9.

TASK [openshift_node : Install openshift packages] *****************************
Monday 20 July 2020  21:02:01 +0800 (0:00:00.085)       0:04:40.552 *********** 
FAILED - RETRYING: Install openshift packages (3 retries left).
FAILED - RETRYING: Install openshift packages (3 retries left).
FAILED - RETRYING: Install openshift packages (2 retries left).
FAILED - RETRYING: Install openshift packages (2 retries left).
FAILED - RETRYING: Install openshift packages (1 retries left).
FAILED - RETRYING: Install openshift packages (1 retries left).
fatal: [10.0.32.6]: FAILED! => {"ansible_job_id": "152086687514.8199", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-4.6.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-4.6.*' found available, installed or updated"]}
fatal: [10.0.32.5]: FAILED! => {"ansible_job_id": "713708506834.8148", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-4.6.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-4.6.*' found available, installed or updated"]}

Comment 6 liujia 2020-07-22 04:37:41 UTC
Also hit the issue in upgrade ci test from 4.5.2-x86_64 to 4.6.0-0.nightly-2020-07-20-093546.

Upgrade cluster succeed, and then run rhel worker upgrade failed.

TASK [openshift_node : Install openshift packages] *****************************
Monday 20 July 2020  21:19:54 +0800 (0:00:00.120)       0:04:21.097 *********** 
FAILED - RETRYING: Install openshift packages (3 retries left).
FAILED - RETRYING: Install openshift packages (2 retries left).
FAILED - RETRYING: Install openshift packages (1 retries left).
fatal: [10.0.32.62]: FAILED! => {"ansible_job_id": "446048133576.57238", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-4.6.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-4.6.*' found available, installed or updated"]}

TASK [openshift_node : Package install failure message] ************************
Monday 20 July 2020  21:22:19 +0800 (0:02:24.795)       0:06:45.893 *********** 
fatal: [10.0.32.62]: FAILED! => {"changed": false, "msg": "Unable to install cri-o-4.6.*, openshift-clients-4.6*, openshift-hyperkube-4.6*, podman. Please ensure repos are configured properly to provide these packages and indicated versions.\n"}

Comment 7 Russell Teague 2020-07-27 19:56:53 UTC
A known regression (1861097) is impacting the install due to the kube version not being reported correctly.

Comment 8 Gaoyun Pei 2020-08-10 12:27:35 UTC
As the Kube API returns version 1.19 now, it's not a blocker for RHEL scale-up now. Thanks.

TASK [openshift_node : Install openshift packages] *****************************
Monday 10 August 2020  20:22:06 +0800 (0:00:00.069)       0:06:37.777 ********* 
changed: [10.0.96.117] => {"ansible_job_id": "317997613290.7515", "attempts": 1, "changed": true, "changes": {"installed": ["cri-o-1.19.*", "openshift-clients-4.6*", "openshift-hyperkube-4.6*", "podman"], "updated": []}, "finished": 1, "msg": "", "obsoletes": {"python-urllib3": {"dist": "noarch", "repo": "installed", "version": "1.10.2-7.el7"}}, "rc": 0, "results": ["Loaded plugins: search-disabled-repos\nResolving Dependencies\n--> Running transaction check\n---> Package cri-o.x86_64 0:1.19.0-69.rhaos4.6.git707b4b9.el7 will be installed
...


# oc get node -o wide
NAME                         STATUS   ROLES    AGE     VERSION                      INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
...
wj46uos810z-hz29f-rhel3-0    Ready    worker   2m7s    v1.19.0-rc.2+5241b27-dirty   192.168.3.185   10.0.96.117   Red Hat Enterprise Linux Server 7.8 (Maipo)                    3.10.0-1127.18.2.el7.x86_64   cri-o://1.19.0-69.rhaos4.6.git707b4b9.el7-dev

Comment 11 errata-xmlrpc 2020-10-27 16:10:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.