Bug 2102510
| Summary: | [OCP 4.11] CRI-O failing with: error reserving ctr name when an image-registry pod failed with CreateContainerError | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | MinLi <minmli> |
| Component: | Node | Assignee: | Peter Hunt <pehunt> |
| Node sub component: | CRI-O | QA Contact: | Sunil Choudhary <schoudha> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | qiwan, xiuwang |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-09-13 13:31:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
from the pod's description, the containerID is :27ae76a217d9c
[cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d]
yet from the crictl command, the containerID is:19cdcd51353a2
[CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
19cdcd51353a2 321acf81cbf85b4d748fceee879b0c651fefeeb923dca3649d9ab6d4c68fc68a 2 hours ago Running registry 2 ddb4eaa79175c image-registry-67b7b8989c-bxr9r]
They are different.
% oc describe pod image-registry-67b7b8989c-bxr9r -n openshift-image-registry
Name: image-registry-67b7b8989c-bxr9r
Namespace: openshift-image-registry
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: xiyuan29-1-cdwbk-worker-us-east-1a-x62rr/10.0.152.106
Start Time: Wed, 29 Jun 2022 22:40:23 +0800
Labels: docker-registry=default
pod-template-hash=67b7b8989c
Annotations: imageregistry.operator.openshift.io/dependencies-checksum: sha256:68a7d1da976ade4883af2f220ff9b0b521d308ed9ccbf5b43bd4d7fc4fafa1e5
...
Status: Running
IP: 10.131.0.6
IPs:
IP: 10.131.0.6
Controlled By: ReplicaSet/image-registry-67b7b8989c
Containers:
registry:
Container ID: cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282
Port: 5000/TCP
Host Port: 0/TCP
Command:
/bin/sh
-c
mkdir -p /etc/pki/ca-trust/extracted/edk2 /etc/pki/ca-trust/extracted/java /etc/pki/ca-trust/extracted/openssl /etc/pki/ca-trust/extracted/pem && update-ca-trust extract && exec /usr/bin/dockerregistry
State: Waiting
Reason: CreateContainerError
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Wed, 29 Jun 2022 22:40:33 +0800
Finished: Wed, 29 Jun 2022 22:40:35 +0800
Ready: False
Restart Count: 1
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get https://:5000/healthz delay=5s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get https://:5000/healthz delay=15s timeout=5s period=10s #success=1 #failure=3
Environment:
REGISTRY_STORAGE_OSS_ENDPOINT: xiyuan29-1-cdwbk-image-registry-us-east-1-xiodehipentpoddktbsb.oss-us-east-1-internal.aliyuncs.com
REGISTRY_STORAGE: oss
REGISTRY_STORAGE_OSS_BUCKET: xiyuan29-1-cdwbk-image-registry-us-east-1-xiodehipentpoddktbsb
REGISTRY_STORAGE_OSS_REGION: oss-us-east-1
REGISTRY_STORAGE_OSS_INTERNAL: true
REGISTRY_STORAGE_OSS_ENCRYPT: true
REGISTRY_STORAGE_OSS_CREDENTIALSCONFIGPATH: /var/run/secrets/cloud/credentials
REGISTRY_STORAGE_OSS_ACCESSKEYID: LTAI5tAa9KyRURVfKAc8qVQA
REGISTRY_STORAGE_OSS_ACCESSKEYSECRET: 7bZW2mcMMTcZlNSictBNShElolRNiJ
REGISTRY_HTTP_ADDR: :5000
REGISTRY_HTTP_NET: tcp
REGISTRY_HTTP_SECRET: bc8fc21e845460d5643e47dbf01b8f62dec82933dc248e8c21d5f8a5a05108f4ffda9b6858e60ea83d2da5859bd3267f52b7b925ee36764daa9004959e4d45cc
REGISTRY_LOG_LEVEL: info
REGISTRY_OPENSHIFT_QUOTA_ENABLED: true
REGISTRY_STORAGE_CACHE_BLOBDESCRIPTOR: inmemory
REGISTRY_STORAGE_DELETE_ENABLED: true
REGISTRY_HEALTH_STORAGEDRIVER_ENABLED: true
REGISTRY_HEALTH_STORAGEDRIVER_INTERVAL: 10s
REGISTRY_HEALTH_STORAGEDRIVER_THRESHOLD: 1
REGISTRY_OPENSHIFT_METRICS_ENABLED: true
REGISTRY_OPENSHIFT_SERVER_ADDR: image-registry.openshift-image-registry.svc:5000
REGISTRY_HTTP_TLS_CERTIFICATE: /etc/secrets/tls.crt
REGISTRY_HTTP_TLS_KEY: /etc/secrets/tls.key
Mounts:
/etc/pki/ca-trust/extracted from ca-trust-extracted (rw)
/etc/pki/ca-trust/source/anchors from registry-certificates (rw)
/etc/secrets from registry-tls (rw)
/usr/share/pki/ca-trust-source from trusted-ca (rw)
/var/lib/kubelet/ from installation-pull-secrets (rw)
/var/run/secrets/cloud from image-registry-private-configuration (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xsrq9 (ro)
/var/run/secrets/openshift/serviceaccount from bound-sa-token (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
image-registry-private-configuration:
Type: Secret (a volume populated by a Secret)
SecretName: image-registry-private-configuration
Optional: false
registry-tls:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: image-registry-tls
SecretOptionalName: <nil>
ca-trust-extracted:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
registry-certificates:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: image-registry-certificates
Optional: false
trusted-ca:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: trusted-ca
Optional: true
installation-pull-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: installation-pull-secrets
Optional: true
bound-sa-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3600
kube-api-access-xsrq9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 125m default-scheduler 0/6 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling.
Warning FailedScheduling 125m default-scheduler 0/6 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling.
Warning FailedScheduling 124m default-scheduler 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling.
Warning FailedScheduling 121m default-scheduler 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling.
Normal Scheduled 120m default-scheduler Successfully assigned openshift-image-registry/image-registry-67b7b8989c-bxr9r to xiyuan29-1-cdwbk-worker-us-east-1a-x62rr by xiyuan29-1-cdwbk-master-2
Warning BackOff 110m (x47 over <invalid>) kubelet Back-off restarting failed container
Normal Pulled 31s (x509 over <invalid>) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282" already present on machine
Normal AddedInterface <invalid> multus Add eth0 [10.131.0.6/23] from openshift-sdn
Normal Created <invalid> (x2 over <invalid>) kubelet Created container registry
Normal Started <invalid> (x2 over <invalid>) kubelet Started container registry
$ oc get pod/image-registry-67b7b8989c-bxr9r -n openshift-image-registry -o yaml | yq -y ‘.status.containerStatuses’
- containerID: cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282
imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282
lastState:
terminated:
containerID: cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d
exitCode: 2
finishedAt: ‘2022-06-29T14:40:35Z’
reason: Error
startedAt: ‘2022-06-29T14:40:33Z’
name: registry
ready: false
restartCount: 1
started: false
state:
waiting:
message: ‘error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_2763e89b-1e84-42d7-b443-49afd3908603_2
for id c220a9c99abb876d637f78a814daa68556e5aaa7c908204a9e5d299066cdbf99: name
is reserved’
reason: CreateContainerError
the must-gather can be found here: https://drive.google.com/file/d/1wu8GUVMGwaGOhE2g0hjnwsdqt7wcTLMh/view |
Description of problem: an image-registry pod got stuck in CreateContainerError, and the crio log show: err="failedto \"StartContainer\" for \"registry\" with CreateContainerError: \"error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_263e89b-1e84-42d7-b443-49afd3908603_2 for id 7f1a96c4f0fa18d13b9a8abfe8a500a0ebd9a02598414177cdda8736f95641f4: name is reserved\"" pod="openshift-image-registry/imag-registry-67b7b8989c-bxr9r" podUID=2763e89b-1e84-42d7-b443-49afd3908603 Version-Release number of selected component (if applicable): 4.11.0-0.nightly-2022-06-28-160049 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: an image-registry pod CreateContainerError caused the failure of one node drain. % oc get pod -o wide openshift-image-registry image-registry-67b7b8989c-bxr9r 0/1 CreateContainerError 1 (<invalid> ago) 132m 10.131.0.6 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr <none> crio log show: Jun 29 08:27:17 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr hyperkube[1558]: E0629 08:27:17.151349 1558 pod_workers.go:951] "Error syncing pod, skipping" err="failedto \"StartContainer\" for \"registry\" with CreateContainerError: \"error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_263e89b-1e84-42d7-b443-49afd3908603_2 for id 7f1a96c4f0fa18d13b9a8abfe8a500a0ebd9a02598414177cdda8736f95641f4: name is reserved\"" pod="openshift-image-registry/imag-registry-67b7b8989c-bxr9r" podUID=2763e89b-1e84-42d7-b443-49afd3908603 Jun 29 08:27:28 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr hyperkube[1558]: I0629 08:27:28.145030 1558 scope.go:110] "RemoveContainer" containerID="27ae76a217d9cd9ce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d" me service failed" err="rpc error: code = Unknown desc = error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_2763e89b-1e8-42d7-b443-49afd3908603_2 for id 077001d265a14c47e24c5ca416acaab113970b5e0dc1214a2586436a14379c9f: name is reserved" podSandboxID="ddb4eaa79175cfe8ec59d27084ccbe143b7e6b7da2a5556345225255a171e79"k-worker-us-east-1a-x62rr hyperkube[1558]: E0629 08:27:28.150112 1558 kuberuntime_manager.go:905] container &Container{Name:regist Jun 29 08:29:53 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr crio[1520]: time="2022-06-29 08:29:53.150254485Z" level=info msg="Image status: &ImageStatusResponse{Image:&mage{Id:321acf81cbf85b4d748fceee879b0c651fefeeb923dca3649d9ab6d4c68fc68a,RepoTags:[],RepoDigests:[quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7ccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282],Size_:395917103,Uid:&Int64Value{Value:1001,},Username:,Spec:nil,},Info:map[string]string{},}" id=f845b6e5-f2f4-404-9f11-8f92534328f0 name=/runtime.v1.ImageService/ImageStatus Jun 29 08:29:53 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr crio[1520]: time="2022-06-29 08:29:53.151525445Z" level=info msg="Creating container: openshift-image-registy/image-registry-67b7b8989c-bxr9r/registry" id=74e15498-4185-466c-b289-416209463455 name=/runtime.v1.RuntimeService/CreateContainer Jun 29 08:29:53 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr crio[1520]: time="2022-06-29 08:29:53.151588903Z" level=warning msg="error reserving ctr name k8s_registry_iage-registry-67b7b8989c-bxr9r_openshift-image-registry_2763e89b-1e84-42d7-b443-49afd3908603_2 for id cdb11504a09fe1948e8d578fdde5262849bea4618604092a33675858809df4a: name is reserved" Expected results: the pod createContainer successfully. Additional info: