Bug 2093357
| Summary: | Upgrading sno spoke with acm-ice, causes the sno to get unreachable | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Constantin Vultur <cvultur> |
| Component: | Special Resource Operator | Assignee: | Pablo Acevedo <pacevedo> |
| Status: | CLOSED ERRATA | QA Contact: | Constantin Vultur <cvultur> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.11 | CC: | bblock, bthurber, mlammon |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:16:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Testing with the new version of acm-ice and build was ok: # oc get all NAME READY STATUS RESTARTS AGE pod/acm-ice-4-8-20-1-build 0/1 Completed 0 148m NAME TYPE FROM LATEST buildconfig.build.openshift.io/acm-ice-4-8-20 Docker Dockerfile 1 NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/acm-ice-4-8-20-1 Docker Dockerfile Complete 2 hours ago 2m43s Then started spoke cluster upgrade from 4.8.20 to 4.8.24 ( 4.18.0-305.25 to 4.18.0-305.28 ) Upgrade went up to 77 % then became unreachable. Checked the spoke system and journalctl -xef showed Jun 14 15:23:52 sno1-0-0 bash[2265]: Trying to pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-1306/acm-ice-driver-container:4.18.0-305.25.1.el8_4.x86_64... Jun 14 15:23:52 sno1-0-0 bash[2265]: Getting image source signatures Jun 14 15:23:52 sno1-0-0 bash[2265]: Copying blob sha256:34c2415aebfcf7c0bc4e8fe2063061c614f7819f68b786c19332d009732fafe1 Jun 14 15:23:52 sno1-0-0 bash[2265]: Copying blob sha256:dddc255e8c1694957778335dc22356798286868501a76e53e5ac328ed9d0e0c8 Jun 14 15:23:52 sno1-0-0 bash[2265]: Copying blob sha256:87b7bd227a863470eb564222dff5ab56d5d86dd8446103505f646afb5fc2c827 Jun 14 15:23:52 sno1-0-0 bash[2265]: Copying blob sha256:aba7e1b5cddd91442924d86159bbc012f115d8f2bedc8e8c1eed835c09a8da14 Jun 14 15:23:52 sno1-0-0 bash[2265]: Copying blob sha256:4752687a61a97d6f352ae62c381c87564bcb2f5b6523a05510ca1fb60d640216 Jun 14 15:23:52 sno1-0-0 bash[2265]: Copying blob sha256:0344366a246a0f7590c2bae4536c01f15f20c6d802b4654ce96ac81047bc23f3 Jun 14 15:23:52 sno1-0-0 bash[2265]: Copying config sha256:48dcd048d16dd7e389afe01483265d38cb27dcd99a0af233f50f0ca5143a416a Jun 14 15:23:52 sno1-0-0 bash[2265]: Writing manifest to image destination Jun 14 15:23:52 sno1-0-0 bash[2265]: Storing signatures Jun 14 15:23:52 sno1-0-0 bash[2265]: 48dcd048d16dd7e389afe01483265d38cb27dcd99a0af233f50f0ca5143a416a Jun 14 15:23:52 sno1-0-0 systemd[6319]: var-lib-containers-storage-overlay.mount: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit UNIT has successfully entered the 'dead' state. Jun 14 15:23:52 sno1-0-0 systemd[1]: var-lib-containers-storage-overlay.mount: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: https://access.redhat.com/support -- -- The unit var-lib-containers-storage-overlay.mount has successfully entered the 'dead' state. Jun 14 15:23:52 sno1-0-0 bash[2265]: Error: statfs /lib/modules/4.18.0-305.25.1.el8_4.x86_64/kernel/drivers: no such file or directory On the spoke node, this is the content of /lib/modules: $ ll /lib/modules/ total 4 drwxr-xr-x. 7 root root 4096 Jan 1 1970 4.18.0-305.28.1.el8_4.x86_64 Verified the new example and upgrade now works as expected. Noting down here, the requirement that the clusterclaim has to be created before the test is started. Also existing/statically created clusterclaims could impact the outcome of ice driver installation. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |
Description of problem: Upgrade of SNO cluster never finishes due to missing acm-ice image in registry. SNO cluster remains indefinetly unreachable Version-Release number of selected component (if applicable): bundle / release-4.11 How reproducible: Steps to Reproduce: 1. deploy acm-ice on SNO spokes 2. perform upgrade of SNOs, making sure that there is a new kernel being deployed 3. Actual results: - SNO Cluster never gets up. - acm-ice service stays blocked in activating , due to missing new image file - kubelet service never starts [core@sno2-0-0 ~]$ systemctl status acm-ice ● acm-ice.service - out-of-tree driver loader Loaded: loaded (/etc/systemd/system/acm-ice.service; enabled; vendor preset: disabled) Active: activating (start) since Fri 2022-06-03 13:29:48 UTC; 43min ago Main PID: 2272 (bash) Tasks: 2 (limit: 153437) Memory: 39.0M CPU: 1min 38.345s CGroup: /system.slice/acm-ice.service ├─ 2272 /usr/bin/bash -c while ! /usr/local/bin/acm-ice load registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container; do sleep 10; done └─18684 sleep 10 Jun 03 14:12:53 sno2-0-0 bash[2272]: Trying to pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64... Jun 03 14:12:53 sno2-0-0 bash[2272]: Error: Error initializing source docker://registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64: Error re> Jun 03 14:13:04 sno2-0-0 bash[2272]: Trying to pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64... Jun 03 14:13:04 sno2-0-0 bash[2272]: Error: Error initializing source docker://registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64: Error re> Jun 03 14:13:14 sno2-0-0 bash[2272]: Trying to pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64... Jun 03 14:13:14 sno2-0-0 bash[2272]: Error: Error initializing source docker://registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64: Error re> Jun 03 14:13:24 sno2-0-0 bash[2272]: Trying to pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64... Jun 03 14:13:24 sno2-0-0 bash[2272]: Error: Error initializing source docker://registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64: Error re> Jun 03 14:13:35 sno2-0-0 bash[2272]: Trying to pull registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64... Jun 03 14:13:35 sno2-0-0 bash[2272]: Error: Error initializing source docker://registry.ocp-edge-cluster-assisted-0.qe.lab.redhat.com:5000/sro-106/acm-ice-driver-container:4.18.0-305.28.1.el8_4.x86_64: Error re> Expected results: upgrade not to get stuck, Additional info: