Description of problem: RHEL worker scaleup failed due to "No package matching 'cri-o-1.18.*' found available" Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-06-26-035408 How reproducible: Always Steps to Reproduce: 1. install OCP 4.6(4.6.0-0.nightly-2020-06-26-035408) 2. scaleup rhel worker Actual results: scalup rhel worker failed at playbook TASK [openshift_node : Install openshift packages] {"ansible_job_id": "107165796205.7457", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-1.18.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-1.18.*' found available, installed or updated"]} Checked in latest ocp4.6 puddles repo, cri-o-1.19 is there. http://download.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/4.6/latest/x86_64/os/Packages/ $ oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-54-108.us-east-2.compute.internal Ready worker 88m v1.18.3+ba54539 10.0.54.108 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev ip-10-0-56-154.us-east-2.compute.internal Ready master 97m v1.18.3+ba54539 10.0.56.154 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev ip-10-0-57-68.us-east-2.compute.internal Ready master 97m v1.18.3+ba54539 10.0.57.68 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev ip-10-0-58-9.us-east-2.compute.internal Ready worker 89m v1.18.3+ba54539 10.0.58.9 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev ip-10-0-65-138.us-east-2.compute.internal Ready worker 88m v1.18.3+ba54539 10.0.65.138 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev ip-10-0-77-178.us-east-2.compute.internal Ready master 96m v1.18.3+ba54539 10.0.77.178 <none> Red Hat Enterprise Linux CoreOS 46.82.202006260140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-30.dev.rhaos4.6.git0a84af5.el8-dev Expected results: 1. Rhel scal-up succeed Additional info:
the scaleup scripts should be using 1.19, as that's what's in the 4.6 puddle now. Moving to installer
The kubernetes version is still 1.18 in OCP-4.6 nightly, we need the rebase of OCP on top of k8s 1.19, the steps openshift-ansible playbook did was expected. TASK [openshift_node : Set fact l_kubernetes_version] ************************** Tuesday 30 June 2020 15:55:55 +0800 (0:00:00.577) 0:05:39.098 ********** ok: [ip-10-0-58-77.us-east-2.compute.internal] => {"ansible_facts": {"l_kubernetes_version": "1.18"}, "changed": false} ok: [ip-10-0-50-230.us-east-2.compute.internal] => {"ansible_facts": {"l_kubernetes_version": "1.18"}, "changed": false}
By design the openshift-ansible scaleup playbooks install cri-o based on the cluster kubernetes version [1]. In CI, we have an override (ci_version_override) for this behavior because during development cycles the package versions available are not always in step with kubernetes rebases or release branching. This override should not be used outside of CI to ensure we continue to validate installs with proper versions. This is not a bug. [1] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/defaults/main.yml#L13
We are waiting for kube rebase and when that happens scaleup should start picking up the correct version. So this bug will have to wait until that happens.
In today's scaleup testing with openshift-ansible-4.6.0-202007161549.p0.git.0.8f2f0c3.el7.noarch, seem like error message has some change, it is expecting cri-o-4.6, but not cri-o 1.8/1.9. TASK [openshift_node : Install openshift packages] ***************************** Monday 20 July 2020 21:02:01 +0800 (0:00:00.085) 0:04:40.552 *********** FAILED - RETRYING: Install openshift packages (3 retries left). FAILED - RETRYING: Install openshift packages (3 retries left). FAILED - RETRYING: Install openshift packages (2 retries left). FAILED - RETRYING: Install openshift packages (2 retries left). FAILED - RETRYING: Install openshift packages (1 retries left). FAILED - RETRYING: Install openshift packages (1 retries left). fatal: [10.0.32.6]: FAILED! => {"ansible_job_id": "152086687514.8199", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-4.6.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-4.6.*' found available, installed or updated"]} fatal: [10.0.32.5]: FAILED! => {"ansible_job_id": "713708506834.8148", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-4.6.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-4.6.*' found available, installed or updated"]}
Also hit the issue in upgrade ci test from 4.5.2-x86_64 to 4.6.0-0.nightly-2020-07-20-093546. Upgrade cluster succeed, and then run rhel worker upgrade failed. TASK [openshift_node : Install openshift packages] ***************************** Monday 20 July 2020 21:19:54 +0800 (0:00:00.120) 0:04:21.097 *********** FAILED - RETRYING: Install openshift packages (3 retries left). FAILED - RETRYING: Install openshift packages (2 retries left). FAILED - RETRYING: Install openshift packages (1 retries left). fatal: [10.0.32.62]: FAILED! => {"ansible_job_id": "446048133576.57238", "attempts": 3, "changed": false, "finished": 1, "msg": "No package matching 'cri-o-4.6.*' found available, installed or updated", "rc": 126, "results": ["No package matching 'cri-o-4.6.*' found available, installed or updated"]} TASK [openshift_node : Package install failure message] ************************ Monday 20 July 2020 21:22:19 +0800 (0:02:24.795) 0:06:45.893 *********** fatal: [10.0.32.62]: FAILED! => {"changed": false, "msg": "Unable to install cri-o-4.6.*, openshift-clients-4.6*, openshift-hyperkube-4.6*, podman. Please ensure repos are configured properly to provide these packages and indicated versions.\n"}
A known regression (1861097) is impacting the install due to the kube version not being reported correctly.
As the Kube API returns version 1.19 now, it's not a blocker for RHEL scale-up now. Thanks. TASK [openshift_node : Install openshift packages] ***************************** Monday 10 August 2020 20:22:06 +0800 (0:00:00.069) 0:06:37.777 ********* changed: [10.0.96.117] => {"ansible_job_id": "317997613290.7515", "attempts": 1, "changed": true, "changes": {"installed": ["cri-o-1.19.*", "openshift-clients-4.6*", "openshift-hyperkube-4.6*", "podman"], "updated": []}, "finished": 1, "msg": "", "obsoletes": {"python-urllib3": {"dist": "noarch", "repo": "installed", "version": "1.10.2-7.el7"}}, "rc": 0, "results": ["Loaded plugins: search-disabled-repos\nResolving Dependencies\n--> Running transaction check\n---> Package cri-o.x86_64 0:1.19.0-69.rhaos4.6.git707b4b9.el7 will be installed ... # oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ... wj46uos810z-hz29f-rhel3-0 Ready worker 2m7s v1.19.0-rc.2+5241b27-dirty 192.168.3.185 10.0.96.117 Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.18.2.el7.x86_64 cri-o://1.19.0-69.rhaos4.6.git707b4b9.el7-dev
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196