Description of problem: an image-registry pod got stuck in CreateContainerError, and the crio log show: err="failedto \"StartContainer\" for \"registry\" with CreateContainerError: \"error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_263e89b-1e84-42d7-b443-49afd3908603_2 for id 7f1a96c4f0fa18d13b9a8abfe8a500a0ebd9a02598414177cdda8736f95641f4: name is reserved\"" pod="openshift-image-registry/imag-registry-67b7b8989c-bxr9r" podUID=2763e89b-1e84-42d7-b443-49afd3908603 Version-Release number of selected component (if applicable): 4.11.0-0.nightly-2022-06-28-160049 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: an image-registry pod CreateContainerError caused the failure of one node drain. % oc get pod -o wide openshift-image-registry image-registry-67b7b8989c-bxr9r 0/1 CreateContainerError 1 (<invalid> ago) 132m 10.131.0.6 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr <none> crio log show: Jun 29 08:27:17 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr hyperkube[1558]: E0629 08:27:17.151349 1558 pod_workers.go:951] "Error syncing pod, skipping" err="failedto \"StartContainer\" for \"registry\" with CreateContainerError: \"error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_263e89b-1e84-42d7-b443-49afd3908603_2 for id 7f1a96c4f0fa18d13b9a8abfe8a500a0ebd9a02598414177cdda8736f95641f4: name is reserved\"" pod="openshift-image-registry/imag-registry-67b7b8989c-bxr9r" podUID=2763e89b-1e84-42d7-b443-49afd3908603 Jun 29 08:27:28 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr hyperkube[1558]: I0629 08:27:28.145030 1558 scope.go:110] "RemoveContainer" containerID="27ae76a217d9cd9ce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d" me service failed" err="rpc error: code = Unknown desc = error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_2763e89b-1e8-42d7-b443-49afd3908603_2 for id 077001d265a14c47e24c5ca416acaab113970b5e0dc1214a2586436a14379c9f: name is reserved" podSandboxID="ddb4eaa79175cfe8ec59d27084ccbe143b7e6b7da2a5556345225255a171e79"k-worker-us-east-1a-x62rr hyperkube[1558]: E0629 08:27:28.150112 1558 kuberuntime_manager.go:905] container &Container{Name:regist Jun 29 08:29:53 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr crio[1520]: time="2022-06-29 08:29:53.150254485Z" level=info msg="Image status: &ImageStatusResponse{Image:&mage{Id:321acf81cbf85b4d748fceee879b0c651fefeeb923dca3649d9ab6d4c68fc68a,RepoTags:[],RepoDigests:[quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7ccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282],Size_:395917103,Uid:&Int64Value{Value:1001,},Username:,Spec:nil,},Info:map[string]string{},}" id=f845b6e5-f2f4-404-9f11-8f92534328f0 name=/runtime.v1.ImageService/ImageStatus Jun 29 08:29:53 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr crio[1520]: time="2022-06-29 08:29:53.151525445Z" level=info msg="Creating container: openshift-image-registy/image-registry-67b7b8989c-bxr9r/registry" id=74e15498-4185-466c-b289-416209463455 name=/runtime.v1.RuntimeService/CreateContainer Jun 29 08:29:53 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr crio[1520]: time="2022-06-29 08:29:53.151588903Z" level=warning msg="error reserving ctr name k8s_registry_iage-registry-67b7b8989c-bxr9r_openshift-image-registry_2763e89b-1e84-42d7-b443-49afd3908603_2 for id cdb11504a09fe1948e8d578fdde5262849bea4618604092a33675858809df4a: name is reserved" Expected results: the pod createContainer successfully. Additional info:
from the pod's description, the containerID is :27ae76a217d9c [cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d] yet from the crictl command, the containerID is:19cdcd51353a2 [CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD 19cdcd51353a2 321acf81cbf85b4d748fceee879b0c651fefeeb923dca3649d9ab6d4c68fc68a 2 hours ago Running registry 2 ddb4eaa79175c image-registry-67b7b8989c-bxr9r] They are different. % oc describe pod image-registry-67b7b8989c-bxr9r -n openshift-image-registry Name: image-registry-67b7b8989c-bxr9r Namespace: openshift-image-registry Priority: 2000000000 Priority Class Name: system-cluster-critical Node: xiyuan29-1-cdwbk-worker-us-east-1a-x62rr/10.0.152.106 Start Time: Wed, 29 Jun 2022 22:40:23 +0800 Labels: docker-registry=default pod-template-hash=67b7b8989c Annotations: imageregistry.operator.openshift.io/dependencies-checksum: sha256:68a7d1da976ade4883af2f220ff9b0b521d308ed9ccbf5b43bd4d7fc4fafa1e5 ... Status: Running IP: 10.131.0.6 IPs: IP: 10.131.0.6 Controlled By: ReplicaSet/image-registry-67b7b8989c Containers: registry: Container ID: cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282 Port: 5000/TCP Host Port: 0/TCP Command: /bin/sh -c mkdir -p /etc/pki/ca-trust/extracted/edk2 /etc/pki/ca-trust/extracted/java /etc/pki/ca-trust/extracted/openssl /etc/pki/ca-trust/extracted/pem && update-ca-trust extract && exec /usr/bin/dockerregistry State: Waiting Reason: CreateContainerError Last State: Terminated Reason: Error Exit Code: 2 Started: Wed, 29 Jun 2022 22:40:33 +0800 Finished: Wed, 29 Jun 2022 22:40:35 +0800 Ready: False Restart Count: 1 Requests: cpu: 100m memory: 256Mi Liveness: http-get https://:5000/healthz delay=5s timeout=5s period=10s #success=1 #failure=3 Readiness: http-get https://:5000/healthz delay=15s timeout=5s period=10s #success=1 #failure=3 Environment: REGISTRY_STORAGE_OSS_ENDPOINT: xiyuan29-1-cdwbk-image-registry-us-east-1-xiodehipentpoddktbsb.oss-us-east-1-internal.aliyuncs.com REGISTRY_STORAGE: oss REGISTRY_STORAGE_OSS_BUCKET: xiyuan29-1-cdwbk-image-registry-us-east-1-xiodehipentpoddktbsb REGISTRY_STORAGE_OSS_REGION: oss-us-east-1 REGISTRY_STORAGE_OSS_INTERNAL: true REGISTRY_STORAGE_OSS_ENCRYPT: true REGISTRY_STORAGE_OSS_CREDENTIALSCONFIGPATH: /var/run/secrets/cloud/credentials REGISTRY_STORAGE_OSS_ACCESSKEYID: LTAI5tAa9KyRURVfKAc8qVQA REGISTRY_STORAGE_OSS_ACCESSKEYSECRET: 7bZW2mcMMTcZlNSictBNShElolRNiJ REGISTRY_HTTP_ADDR: :5000 REGISTRY_HTTP_NET: tcp REGISTRY_HTTP_SECRET: bc8fc21e845460d5643e47dbf01b8f62dec82933dc248e8c21d5f8a5a05108f4ffda9b6858e60ea83d2da5859bd3267f52b7b925ee36764daa9004959e4d45cc REGISTRY_LOG_LEVEL: info REGISTRY_OPENSHIFT_QUOTA_ENABLED: true REGISTRY_STORAGE_CACHE_BLOBDESCRIPTOR: inmemory REGISTRY_STORAGE_DELETE_ENABLED: true REGISTRY_HEALTH_STORAGEDRIVER_ENABLED: true REGISTRY_HEALTH_STORAGEDRIVER_INTERVAL: 10s REGISTRY_HEALTH_STORAGEDRIVER_THRESHOLD: 1 REGISTRY_OPENSHIFT_METRICS_ENABLED: true REGISTRY_OPENSHIFT_SERVER_ADDR: image-registry.openshift-image-registry.svc:5000 REGISTRY_HTTP_TLS_CERTIFICATE: /etc/secrets/tls.crt REGISTRY_HTTP_TLS_KEY: /etc/secrets/tls.key Mounts: /etc/pki/ca-trust/extracted from ca-trust-extracted (rw) /etc/pki/ca-trust/source/anchors from registry-certificates (rw) /etc/secrets from registry-tls (rw) /usr/share/pki/ca-trust-source from trusted-ca (rw) /var/lib/kubelet/ from installation-pull-secrets (rw) /var/run/secrets/cloud from image-registry-private-configuration (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xsrq9 (ro) /var/run/secrets/openshift/serviceaccount from bound-sa-token (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: image-registry-private-configuration: Type: Secret (a volume populated by a Secret) SecretName: image-registry-private-configuration Optional: false registry-tls: Type: Projected (a volume that contains injected data from multiple sources) SecretName: image-registry-tls SecretOptionalName: <nil> ca-trust-extracted: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> registry-certificates: Type: ConfigMap (a volume populated by a ConfigMap) Name: image-registry-certificates Optional: false trusted-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: trusted-ca Optional: true installation-pull-secrets: Type: Secret (a volume populated by a Secret) SecretName: installation-pull-secrets Optional: true bound-sa-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3600 kube-api-access-xsrq9: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 125m default-scheduler 0/6 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling. Warning FailedScheduling 125m default-scheduler 0/6 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling. Warning FailedScheduling 124m default-scheduler 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling. Warning FailedScheduling 121m default-scheduler 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling. Normal Scheduled 120m default-scheduler Successfully assigned openshift-image-registry/image-registry-67b7b8989c-bxr9r to xiyuan29-1-cdwbk-worker-us-east-1a-x62rr by xiyuan29-1-cdwbk-master-2 Warning BackOff 110m (x47 over <invalid>) kubelet Back-off restarting failed container Normal Pulled 31s (x509 over <invalid>) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282" already present on machine Normal AddedInterface <invalid> multus Add eth0 [10.131.0.6/23] from openshift-sdn Normal Created <invalid> (x2 over <invalid>) kubelet Created container registry Normal Started <invalid> (x2 over <invalid>) kubelet Started container registry
$ oc get pod/image-registry-67b7b8989c-bxr9r -n openshift-image-registry -o yaml | yq -y ‘.status.containerStatuses’ - containerID: cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282 imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282 lastState: terminated: containerID: cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d exitCode: 2 finishedAt: ‘2022-06-29T14:40:35Z’ reason: Error startedAt: ‘2022-06-29T14:40:33Z’ name: registry ready: false restartCount: 1 started: false state: waiting: message: ‘error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_2763e89b-1e84-42d7-b443-49afd3908603_2 for id c220a9c99abb876d637f78a814daa68556e5aaa7c908204a9e5d299066cdbf99: name is reserved’ reason: CreateContainerError
the must-gather can be found here: https://drive.google.com/file/d/1wu8GUVMGwaGOhE2g0hjnwsdqt7wcTLMh/view