Bug 1895309 - [OCP v47] The RHEL node scaleup fails due to "No package matching 'cri-o-1.19.*' found available" on OCP 4.7 cluster
Summary: [OCP v47] The RHEL node scaleup fails due to "No package matching 'cri-o-1.19...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Russell Teague
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks: 1870490
TreeView+ depends on / blocked
 
Reported: 2020-11-06 10:13 UTC by Prashant Dhamdhere
Modified: 2021-02-24 15:32 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Package install logic required an exact match of Kube API version to CRI-O version. Consequence: Newer versions of CRI-O could not be installed, although they should still function correctly. Fix: Changed package install logic to allow newer versions of CRI-O to be installed while still requiring a minimum of the current Kube API version. Result: Newer versions of CRI-O can be installed for older Kube API versions.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:31:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12267 0 None closed Bug 1895309: roles/openshift_node: Allow newer cri-o version 2021-01-13 14:05:53 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:32:00 UTC

Description Prashant Dhamdhere 2020-11-06 10:13:04 UTC
Description of problem:

The RHEL node scaleup fails due to "No package matching 'cri-o-1.19.*' found available" on OCP 4.7 cluster. 

TASK [openshift_node : Install openshift packages] *****************************
Friday 06 November 2020  17:02:45 +0800 (0:00:00.089)       0:08:14.213 ******* 

FAILED - RETRYING: Install openshift packages (3 retries left).
FAILED - RETRYING: Install openshift packages (3 retries left).

FAILED - RETRYING: Install openshift packages (2 retries left).
FAILED - RETRYING: Install openshift packages (2 retries left).

FAILED - RETRYING: Install openshift packages (1 retries left).
FAILED - RETRYING: Install openshift packages (1 retries left).

fatal: [ip-10-0-53-18.us-east-2.compute.internal]: FAILED! => {"ansible_job_id": "224354404120.3488", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-1.19.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-1.19.*' found available, installed or updated"]}
fatal: [ip-10-0-48-123.us-east-2.compute.internal]: FAILED! => {"ansible_job_id": "372095033110.3575", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-1.19.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-1.19.*' found available, installed or updated"]}

TASK [openshift_node : Package install failure message] ************************
Friday 06 November 2020  17:06:07 +0800 (0:03:22.442)       0:11:36.656 ******* 
fatal: [ip-10-0-53-18.us-east-2.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to install cri-o-1.19.*, openshift-clients-4.7*, openshift-hyperkube-4.7*, podman. Please ensure repos are configured properly to provide these packages and indicated versions.\n"}
fatal: [ip-10-0-48-123.us-east-2.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to install cri-o-1.19.*, openshift-clients-4.7*, openshift-hyperkube-4.7*, podman. Please ensure repos are configured properly to provide these packages and indicated versions.\n"}


Version-Release number of selected component (if applicable):

4.7.0-0.nightly-2020-10-27-051128

How reproducible:

Always

Steps to Reproduce:

1. Deploy OCP 4.7 cluster
2. scaleup RHEL worker nodes


Actual results:

The RHEL scaleup failed to add RHEL worker nodes in ocp 4.7 cluster due to "No package 
matching 'cri-o-1.19.*' found available"

Expected results:

The RHEL node scaleup should not fail with the subjected error and it should add RHEL 
worker nodes in ocp 4.7 cluster without an issue.

Additional info:

This is blocking BZ 1870490 verification

Comment 1 Stefan Schimanski 2020-11-06 10:46:14 UTC
What is the reason this is assigned to the kube-apiserver component?

Comment 2 Russell Teague 2020-11-06 13:17:39 UTC
It looks like cri-0-1.20 has been built and tagged for 4.7 (which is ultimately correct) however 4.7 has not yet been rebased to Kube 1.20 therefore the scaleup playbook is looking for 1.19.

We've had a check in the playbooks to install the same version of cri-o as the kube api version since we started OCP 4 and we seem to hit some combination of version mismatch issues during every release cycle.  Is there something better we can do to make sure we test the right versions, but also allow for this skew that happens each release?

Comment 3 Peter Hunt 2020-11-06 14:18:58 UTC
I would say it'd be better to first look for version N, then look for version N-1, for both kube and cri-o. 

We like to get cri-o 1.20 in early to catch any regressions early.

from my experience, having cri-o on version N and kube on N-1 has not caused any issues in recent memory. the CRI has been fairly stable and backward compatible

Comment 4 Russell Teague 2020-11-06 14:57:48 UTC
I will put together a change in openshift-ansible to verify the available cri-o version is at or above the current kubernetes version.

Comment 5 Russell Teague 2020-11-09 14:33:05 UTC
This is not a release blocker because the current code would work correctly once OCP is rebased to Kube 1.20.  The open PR will allow newer versions of crio to be installed than the current kube version to testing during the development cycle.

Comment 7 Gaoyun Pei 2020-11-10 04:14:34 UTC
Verify this bug with openshift-ansible-4.7.0-202011092117.p0.git.0.ec2dd4f.el7.noarch.rpm

cri-o 1.20 could be installed when the k8s version is 1.19.


TASK [openshift_node : Set fact l_kubernetes_server_version] *******************
Tuesday 10 November 2020  11:59:49 +0800 (0:00:00.471)       0:07:14.303 ****** 
ok: [10.0.32.4] => {"ansible_facts": {"l_kubernetes_server_version": "1.19"}, "changed": false}

TASK [openshift_node : Get available cri-o RPM versions] ***********************
Tuesday 10 November 2020  11:59:49 +0800 (0:00:00.077)       0:07:14.380 ****** 
ok: [10.0.32.4] => {"changed": false, "results": [{"arch": "x86_64", "envra": "0:cri-o-1.20.0-0.rhaos4.7.git8e23406.el7.9.x86_64", "epoch": "0", "name": "cri-o", "release": "0.rhaos4.7.git8e23406.el7.9", "repo": "aos-v4-devel-install", "version": "1.20.0", "yumstate": "available"}]}
...

TASK [openshift_node : Install openshift packages] *****************************

nInstalled:\n  cri-o.x86_64 0:1.20.0-0.rhaos4.7.git8e23406.el7.9

Comment 10 errata-xmlrpc 2021-02-24 15:31:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.