Description of problem: The RHEL node scaleup fails due to "No package matching 'cri-o-1.19.*' found available" on OCP 4.7 cluster. TASK [openshift_node : Install openshift packages] ***************************** Friday 06 November 2020 17:02:45 +0800 (0:00:00.089) 0:08:14.213 ******* FAILED - RETRYING: Install openshift packages (3 retries left). FAILED - RETRYING: Install openshift packages (3 retries left). FAILED - RETRYING: Install openshift packages (2 retries left). FAILED - RETRYING: Install openshift packages (2 retries left). FAILED - RETRYING: Install openshift packages (1 retries left). FAILED - RETRYING: Install openshift packages (1 retries left). fatal: [ip-10-0-53-18.us-east-2.compute.internal]: FAILED! => {"ansible_job_id": "224354404120.3488", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-1.19.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-1.19.*' found available, installed or updated"]} fatal: [ip-10-0-48-123.us-east-2.compute.internal]: FAILED! => {"ansible_job_id": "372095033110.3575", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-1.19.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-1.19.*' found available, installed or updated"]} TASK [openshift_node : Package install failure message] ************************ Friday 06 November 2020 17:06:07 +0800 (0:03:22.442) 0:11:36.656 ******* fatal: [ip-10-0-53-18.us-east-2.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to install cri-o-1.19.*, openshift-clients-4.7*, openshift-hyperkube-4.7*, podman. Please ensure repos are configured properly to provide these packages and indicated versions.\n"} fatal: [ip-10-0-48-123.us-east-2.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to install cri-o-1.19.*, openshift-clients-4.7*, openshift-hyperkube-4.7*, podman. Please ensure repos are configured properly to provide these packages and indicated versions.\n"} Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-10-27-051128 How reproducible: Always Steps to Reproduce: 1. Deploy OCP 4.7 cluster 2. scaleup RHEL worker nodes Actual results: The RHEL scaleup failed to add RHEL worker nodes in ocp 4.7 cluster due to "No package matching 'cri-o-1.19.*' found available" Expected results: The RHEL node scaleup should not fail with the subjected error and it should add RHEL worker nodes in ocp 4.7 cluster without an issue. Additional info: This is blocking BZ 1870490 verification
What is the reason this is assigned to the kube-apiserver component?
It looks like cri-0-1.20 has been built and tagged for 4.7 (which is ultimately correct) however 4.7 has not yet been rebased to Kube 1.20 therefore the scaleup playbook is looking for 1.19. We've had a check in the playbooks to install the same version of cri-o as the kube api version since we started OCP 4 and we seem to hit some combination of version mismatch issues during every release cycle. Is there something better we can do to make sure we test the right versions, but also allow for this skew that happens each release?
I would say it'd be better to first look for version N, then look for version N-1, for both kube and cri-o. We like to get cri-o 1.20 in early to catch any regressions early. from my experience, having cri-o on version N and kube on N-1 has not caused any issues in recent memory. the CRI has been fairly stable and backward compatible
I will put together a change in openshift-ansible to verify the available cri-o version is at or above the current kubernetes version.
This is not a release blocker because the current code would work correctly once OCP is rebased to Kube 1.20. The open PR will allow newer versions of crio to be installed than the current kube version to testing during the development cycle.
Verify this bug with openshift-ansible-4.7.0-202011092117.p0.git.0.ec2dd4f.el7.noarch.rpm cri-o 1.20 could be installed when the k8s version is 1.19. TASK [openshift_node : Set fact l_kubernetes_server_version] ******************* Tuesday 10 November 2020 11:59:49 +0800 (0:00:00.471) 0:07:14.303 ****** ok: [10.0.32.4] => {"ansible_facts": {"l_kubernetes_server_version": "1.19"}, "changed": false} TASK [openshift_node : Get available cri-o RPM versions] *********************** Tuesday 10 November 2020 11:59:49 +0800 (0:00:00.077) 0:07:14.380 ****** ok: [10.0.32.4] => {"changed": false, "results": [{"arch": "x86_64", "envra": "0:cri-o-1.20.0-0.rhaos4.7.git8e23406.el7.9.x86_64", "epoch": "0", "name": "cri-o", "release": "0.rhaos4.7.git8e23406.el7.9", "repo": "aos-v4-devel-install", "version": "1.20.0", "yumstate": "available"}]} ... TASK [openshift_node : Install openshift packages] ***************************** nInstalled:\n cri-o.x86_64 0:1.20.0-0.rhaos4.7.git8e23406.el7.9
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633