Bug 1952187
| Summary: | Pods stuck in ImagePullBackOff with errors like rpc error: code = Unknown desc = Error committing the finished image: image with ID "SomeLongID" already exists, but uses a different top layer: that ID | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Kedar Kulkarni <kkulkarn> | ||||||
| Component: | Node | Assignee: | Peter Hunt <pehunt> | ||||||
| Node sub component: | CRI-O | QA Contact: | Sunil Choudhary <schoudha> | ||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||
| Severity: | medium | ||||||||
| Priority: | medium | CC: | aos-bugs, nagrawal, nalin | ||||||
| Version: | 4.8 | ||||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 4.8.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-07-27 23:02:36 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Kedar Kulkarni
2021-04-21 17:36:41 UTC
Created attachment 1774123 [details]
sosreport-node2
The image being pulled (quay.io/cloud-bulldozer/backpack:latest) has a schema1 manifest with a couple of "throwaway" entries in its history. If I'm reading https://github.com/containers/image/pull/1205 correctly, the version of github.com/containers/image that CRI-O commit bc1ef35 uses changed how it applied layer blobs when saving images, in a way which affected the ID that it assigned to those layers. I think bumping to CRI-O to use v5.11.1 will fix it. 4.8 PR attached fix merged! Verified on 4.8.0-0.nightly-2021-04-28-090319
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.0-0.nightly-2021-04-28-090319 True False 4h9m Cluster version is 4.8.0-0.nightly-2021-04-28-090319
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-142-4.us-east-2.compute.internal Ready worker 4h54m v1.21.0-rc.0+cd6d8b0
ip-10-0-150-69.us-east-2.compute.internal Ready master 5h3m v1.21.0-rc.0+cd6d8b0
ip-10-0-167-40.us-east-2.compute.internal Ready master 5h2m v1.21.0-rc.0+cd6d8b0
ip-10-0-186-203.us-east-2.compute.internal Ready worker 4h54m v1.21.0-rc.0+cd6d8b0
ip-10-0-216-205.us-east-2.compute.internal Ready master 5h5m v1.21.0-rc.0+cd6d8b0
ip-10-0-223-10.us-east-2.compute.internal Ready worker 4h54m v1.21.0-rc.0+cd6d8b0
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
postgresql-1-deploy 0/1 Completed 0 44m 10.129.2.112 ip-10-0-223-10.us-east-2.compute.internal <none> <none>
postgresql-1-nwjnm 1/1 Running 0 44m 10.129.2.113 ip-10-0-223-10.us-east-2.compute.internal <none> <none>
rails-postgresql-example-1-82xxs 1/1 Running 0 42m 10.129.2.117 ip-10-0-223-10.us-east-2.compute.internal <none> <none>
$ oc get pods
NAME READY STATUS RESTARTS AGE
postgresql-1-nwjnm 0/1 Terminating 0 47m
rails-postgresql-example-1-82xxs 1/1 Terminating 0 45m
rails-postgresql-example-1-build 0/1 Completed 0 47m
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rails-postgresql-example-1-build 0/1 Completed 0 57m 10.129.2.111 ip-10-0-223-10.us-east-2.compute.internal <none> <none>
rails-postgresql-example-55cb7fc669-z9526 1/1 Running 0 5m34s 10.129.2.139 ip-10-0-223-10.us-east-2.compute.internal <none> <none>
$ oc new-app --docker-image=quay.io/cloud-bulldozer/backpack:latest
...
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
backpack-56c78c6-fwb4r 0/1 Completed 2 52s 10.129.2.143 ip-10-0-223-10.us-east-2.compute.internal <none> <none>
$ oc describe pod backpack-56c78c6-fwb4r
Name: backpack-56c78c6-fwb4r
Namespace: app1
Priority: 0
Node: ip-10-0-223-10.us-east-2.compute.internal/10.0.223.10
Start Time: Wed, 28 Apr 2021 20:34:36 +0530
Labels: deployment=backpack
pod-template-hash=56c78c6
Annotations: k8s.v1.cni.cncf.io/network-status:
[{
"name": "",
"interface": "eth0",
"ips": [
"10.129.2.143"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "",
"interface": "eth0",
"ips": [
"10.129.2.143"
],
"default": true,
"dns": {}
}]
openshift.io/generated-by: OpenShiftNewApp
openshift.io/scc: restricted
Status: Running
IP: 10.129.2.143
IPs:
IP: 10.129.2.143
Controlled By: ReplicaSet/backpack-56c78c6
Containers:
backpack:
Container ID: cri-o://b7153f37469f3b1d3b1266a0afbbd8f82de6acb83bcb6a8f341b65e361939f6b
Image: quay.io/cloud-bulldozer/backpack@sha256:fe14b444967419b1cef078f5f2f4ff3b4e6238f1e2ab775d1e0d4e9c7a9685f7
Image ID: quay.io/cloud-bulldozer/backpack@sha256:fe14b444967419b1cef078f5f2f4ff3b4e6238f1e2ab775d1e0d4e9c7a9685f7
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 28 Apr 2021 20:35:19 +0530
Finished: Wed, 28 Apr 2021 20:35:19 +0530
Ready: False
Restart Count: 2
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bgnrs (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-bgnrs:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-bgnrs
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 65s default-scheduler Successfully assigned app1/backpack-56c78c6-fwb4r to ip-10-0-223-10.us-east-2.compute.internal
Normal AddedInterface 64s multus Add eth0 [10.129.2.143/23]
Normal Pulling 64s kubelet Pulling image "quay.io/cloud-bulldozer/backpack@sha256:fe14b444967419b1cef078f5f2f4ff3b4e6238f1e2ab775d1e0d4e9c7a9685f7"
Normal Pulled 38s kubelet Successfully pulled image "quay.io/cloud-bulldozer/backpack@sha256:fe14b444967419b1cef078f5f2f4ff3b4e6238f1e2ab775d1e0d4e9c7a9685f7" in 26.356752673s
Normal Created 23s (x3 over 37s) kubelet Created container backpack
Normal Started 23s (x3 over 37s) kubelet Started container backpack
Normal Pulled 23s (x2 over 37s) kubelet Container image "quay.io/cloud-bulldozer/backpack@sha256:fe14b444967419b1cef078f5f2f4ff3b4e6238f1e2ab775d1e0d4e9c7a9685f7" already present on machine
Warning BackOff 11s (x4 over 36s) kubelet Back-off restarting failed container
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
backpack-56c78c6-nn6c6 0/1 Completed 3 47s 10.129.2.145 ip-10-0-223-10.us-east-2.compute.internal <none> <none>
$ oc describe pod backpack-56c78c6-nn6c6
Name: backpack-56c78c6-nn6c6
Namespace: app1
Priority: 0
Node: ip-10-0-223-10.us-east-2.compute.internal/10.0.223.10
Start Time: Wed, 28 Apr 2021 20:37:46 +0530
Labels: deployment=backpack
pod-template-hash=56c78c6
Annotations: k8s.v1.cni.cncf.io/network-status:
[{
"name": "",
"interface": "eth0",
"ips": [
"10.129.2.145"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "",
"interface": "eth0",
"ips": [
"10.129.2.145"
],
"default": true,
"dns": {}
}]
openshift.io/generated-by: OpenShiftNewApp
openshift.io/scc: restricted
Status: Running
IP: 10.129.2.145
IPs:
IP: 10.129.2.145
Controlled By: ReplicaSet/backpack-56c78c6
Containers:
backpack:
Container ID: cri-o://a9fbdedb6887aef73213cbbbdfcc8de17e815fb0de62e9bd1d2e7aa0519197ce
Image: quay.io/cloud-bulldozer/backpack@sha256:fe14b444967419b1cef078f5f2f4ff3b4e6238f1e2ab775d1e0d4e9c7a9685f7
Image ID: quay.io/cloud-bulldozer/backpack@sha256:fe14b444967419b1cef078f5f2f4ff3b4e6238f1e2ab775d1e0d4e9c7a9685f7
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 28 Apr 2021 20:38:32 +0530
Finished: Wed, 28 Apr 2021 20:38:32 +0530
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 28 Apr 2021 20:38:03 +0530
Finished: Wed, 28 Apr 2021 20:38:03 +0530
Ready: False
Restart Count: 3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bgnrs (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-bgnrs:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-bgnrs
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50s default-scheduler Successfully assigned app1/backpack-56c78c6-nn6c6 to ip-10-0-223-10.us-east-2.compute.internal
Normal AddedInterface 49s multus Add eth0 [10.129.2.145/23]
Normal Pulled 5s (x4 over 49s) kubelet Container image "quay.io/cloud-bulldozer/backpack@sha256:fe14b444967419b1cef078f5f2f4ff3b4e6238f1e2ab775d1e0d4e9c7a9685f7" already present on machine
Normal Created 5s (x4 over 49s) kubelet Created container backpack
Normal Started 5s (x4 over 49s) kubelet Started container backpack
Warning BackOff 5s (x5 over 47s) kubelet Back-off restarting failed container
I'm seeing this issue as well in 4.6/4.7. Can this fix be backported to those versions as well? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |