Bug 2018542
| Summary: | Kernel upgrade does not reconcile DaemonSet | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Pablo Acevedo <pacevedo> |
| Component: | Special Resource Operator | Assignee: | Pablo Acevedo <pacevedo> |
| Status: | CLOSED ERRATA | QA Contact: | liqcui |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.10 | CC: | aos-bugs, bthurber |
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 16:23:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Pablo Acevedo
2021-10-29 16:06:12 UTC
Verified Results:
######################################################
When the worker have different kernel version, the new worker node failed to create pod due to ImagePullBackOff, the image tag with kernel version, it will tag with the kernel version that build-configure job execute from which node.
######################################################
[ocpadmin@ec2-18-217-45-133 k]$ oc describe pod simple-kmod-driver-container-396f682197e94c38-rjn95 -n simple-kmod
Name: simple-kmod-driver-container-396f682197e94c38-rjn95
Namespace: simple-kmod
Priority: 0
Node: ip-10-0-54-185.us-east-2.compute.internal/10.0.54.185
Start Time: Wed, 29 Dec 2021 06:23:13 +0000
Labels: app=simple-kmod-driver-container-396f682197e94c38
controller-revision-hash=df8d695dc
pod-template-generation=1
specialresource.openshift.io/owned=true
Annotations: k8s.ovn.org/pod-networks:
.....................................................
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37m default-scheduler Successfully assigned simple-kmod/simple-kmod-driver-container-396f682197e94c38-rjn95 to ip-10-0-54-185.us-east-2.compute.internal
Normal AddedInterface 37m multus Add eth0 [10.130.2.10/23] from ovn-kubernetes
Warning Failed 35m (x6 over 37m) kubelet Error: ImagePullBackOff
Normal Pulling 35m (x4 over 37m) kubelet Pulling image "image-registry.openshift-image-registry.svc:5000/simple-kmod/simple-kmod-driver-container:v4.18.0-305.el8.x86_64"
Warning Failed 35m (x4 over 37m) kubelet Failed to pull image "image-registry.openshift-image-registry.svc:5000/simple-kmod/simple-kmod-driver-container:v4.18.0-305.el8.x86_64": rpc error: code = Unknown desc = reading manifest v4.18.0-305.el8.x86_64 in image-registry.openshift-image-registry.svc:5000/simple-kmod/simple-kmod-driver-container: manifest unknown: manifest unknown
Warning Failed 35m (x4 over 37m) kubelet Error: ErrImagePull
Normal BackOff 2m9s (x153 over 37m) kubelet Back-off pulling image "image-registry.openshift-image-registry.svc:5000/simple-kmod/simple-kmod-driver-container:v4.18.0-305.el8.x86_64"
[ocpadmin@ec2-18-217-45-133 k]$ oc get pods -n simple-kmod
NAME READY STATUS RESTARTS AGE
simple-kmod-driver-build-396f682197e94c38-1-build 0/1 Error 0 36m
simple-kmod-driver-build-7a2fc1535ea1b11f-1-build 0/1 Completed 0 36m
simple-kmod-driver-container-396f682197e94c38-rjn95 0/1 ImagePullBackOff 0 38m
simple-kmod-driver-container-7a2fc1535ea1b11f-ffd87 1/1 Running 0 37m
simple-kmod-driver-container-7a2fc1535ea1b11f-gxsc7 1/1 Running 0 37m
simple-kmod-driver-container-7a2fc1535ea1b11f-qd82z 1/1 Running 0 37m
######################################################
No simple-kmod pod scheduled to the node that have higher kernel version:
######################################################
[ocpadmin@ec2-18-217-45-133 k]$ oc get pods -n simple-kmod
NAME READY STATUS RESTARTS AGE
simple-kmod-driver-build-396f682197e94c38-1-build 0/1 Error 0 116m
simple-kmod-driver-build-7a2fc1535ea1b11f-1-build 0/1 Completed 0 115m
simple-kmod-driver-container-396f682197e94c38-ngwnp 1/1 Running 0 67m
simple-kmod-driver-container-7a2fc1535ea1b11f-ffd87 1/1 Running 0 117m
simple-kmod-driver-container-7a2fc1535ea1b11f-gxsc7 1/1 Running 0 117m
simple-kmod-driver-container-7a2fc1535ea1b11f-qd82z 1/1 Running 0 117m
[ocpadmin@ec2-18-217-45-133 k]$ oc get pods -n simple-kmod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
simple-kmod-driver-build-396f682197e94c38-1-build 0/1 Error 0 116m 10.130.2.11 ip-10-0-54-185.us-east-2.compute.internal <none> <none>
simple-kmod-driver-build-7a2fc1535ea1b11f-1-build 0/1 Completed 0 116m 10.129.2.148 ip-10-0-59-7.us-east-2.compute.internal <none> <none>
simple-kmod-driver-container-396f682197e94c38-ngwnp 1/1 Running 0 67m 10.130.2.23 ip-10-0-54-185.us-east-2.compute.internal <none> <none>
simple-kmod-driver-container-7a2fc1535ea1b11f-ffd87 1/1 Running 0 117m 10.128.2.32 ip-10-0-61-240.us-east-2.compute.internal <none> <none>
simple-kmod-driver-container-7a2fc1535ea1b11f-gxsc7 1/1 Running 0 117m 10.131.0.25 ip-10-0-68-29.us-east-2.compute.internal <none> <none>
simple-kmod-driver-container-7a2fc1535ea1b11f-qd82z 1/1 Running 0 117m 10.129.2.147 ip-10-0-59-7.us-east-2.compute.internal <none> <none>
[ocpadmin@ec2-18-217-45-133 k]$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-48-229.us-east-2.compute.internal Ready master 7h3m v1.22.3+e790d7f
ip-10-0-49-124.us-east-2.compute.internal Ready master 7h3m v1.22.3+e790d7f
ip-10-0-54-185.us-east-2.compute.internal Ready worker 4h19m v1.22.3+ffbb954
ip-10-0-59-7.us-east-2.compute.internal Ready worker 6h45m v1.22.3+e790d7f
ip-10-0-60-73.us-east-2.compute.internal Ready worker 2m26s v1.22.3+ffbb954
ip-10-0-61-240.us-east-2.compute.internal Ready worker 6h46m v1.22.3+e790d7f
ip-10-0-68-29.us-east-2.compute.internal Ready worker 6h46m v1.22.3+e790d7f
ip-10-0-69-143.us-east-2.compute.internal Ready master 7h3m v1.22.3+e790d7f
[ocpadmin@ec2-18-217-45-133 k]$ oc debug node/ip-10-0-60-73.us-east-2.compute.internal
Starting pod/ip-10-0-60-73us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.60.73
If you don't see a command prompt, try pressing enter.
sh-4.4#
sh-4.4# chroot /host
sh-4.4# uname -a
Linux ip-10-0-60-73.us-east-2.compute.internal 4.18.0-348.7.1.el8_5.x86_64 #1 SMP Wed Dec 8 21:51:17 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4#
######################################################
After upgrade one worker nodes, the pod on upgraded node will automatically terminate, no new pod created anymore.
######################################################
[ec2-user@ip-10-0-60-73 ~]$ uname -a
Linux ip-10-0-60-73.us-east-2.compute.internal 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@ip-10-0-60-73 ec2-user]# yum -y update kernel
Updating Subscription Management repositories.
Red Hat Update Infrastructure 3 Client Configuration Server 8 9.6 kB/s | 2.1 kB 00:00
Red Hat Enterprise Linux 8 for x86_64 - AppStream from RHUI (RPMs) 14 kB/s | 2.8 kB 00:00
Red Hat Enterprise Linux 8 for x86_64 - BaseOS from RHUI (RPMs) 13 kB/s | 2.4 kB 00:00
Dependencies resolved.
====================================================================================================================================================================
Package Architecture Version Repository Size
====================================================================================================================================================================
Installing:
kernel x86_64 4.18.0-348.7.1.el8_5 rhel-8-baseos-rhui-rpms 7.0 M
Installing dependencies:
kernel-core x86_64 4.18.0-348.7.1.el8_5 rhel-8-baseos-rhui-rpms 38 M
kernel-modules x86_64 4.18.0-348.7.1.el8_5 rhel-8-baseos-rhui-rpms 30 M
Transaction Summary
====================================================================================================================================================================
Install 3 Packages
Total size: 74 M
Installed size: 90 M
Downloading Packages:
[SKIPPED] kernel-core-4.18.0-348.7.1.el8_5.x86_64.rpm: Already downloaded
[SKIPPED] kernel-4.18.0-348.7.1.el8_5.x86_64.rpm: Already downloaded
[SKIPPED] kernel-modules-4.18.0-348.7.1.el8_5.x86_64.rpm: Already downloaded
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : kernel-core-4.18.0-348.7.1.el8_5.x86_64 1/3
Running scriptlet: kernel-core-4.18.0-348.7.1.el8_5.x86_64 1/3
Installing : kernel-modules-4.18.0-348.7.1.el8_5.x86_64 2/3
Running scriptlet: kernel-modules-4.18.0-348.7.1.el8_5.x86_64 2/3
Installing : kernel-4.18.0-348.7.1.el8_5.x86_64 3/3
Running scriptlet: kernel-core-4.18.0-348.7.1.el8_5.x86_64 3/3
Running scriptlet: kernel-4.18.0-348.7.1.el8_5.x86_64 3/3
Verifying : kernel-core-4.18.0-348.7.1.el8_5.x86_64 1/3
Verifying : kernel-4.18.0-348.7.1.el8_5.x86_64 2/3
Verifying : kernel-modules-4.18.0-348.7.1.el8_5.x86_64 3/3
Installed products updated.
Installed:
kernel-4.18.0-348.7.1.el8_5.x86_64 kernel-core-4.18.0-348.7.1.el8_5.x86_64 kernel-modules-4.18.0-348.7.1.el8_5.x86_64
Complete!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |