Bug 2102510 - [OCP 4.11] CRI-O failing with: error reserving ctr name when an image-registry pod failed with CreateContainerError
Summary: [OCP 4.11] CRI-O failing with: error reserving ctr name when an image-registr...
Keywords:
Status: CLOSED DUPLICATE of bug 2074052
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.11
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Peter Hunt
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-30 05:26 UTC by MinLi
Modified: 2022-09-13 13:31 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-13 13:31:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description MinLi 2022-06-30 05:26:31 UTC
Description of problem:
an image-registry pod got stuck in CreateContainerError, and the crio log show:

err="failedto \"StartContainer\" for \"registry\" with CreateContainerError: \"error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_263e89b-1e84-42d7-b443-49afd3908603_2 for id 7f1a96c4f0fa18d13b9a8abfe8a500a0ebd9a02598414177cdda8736f95641f4: name is reserved\"" pod="openshift-image-registry/imag-registry-67b7b8989c-bxr9r" podUID=2763e89b-1e84-42d7-b443-49afd3908603
 

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-06-28-160049

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
an image-registry pod CreateContainerError caused the failure of one node drain.
% oc get pod -o wide
openshift-image-registry                           image-registry-67b7b8989c-bxr9r                              0/1     CreateContainerError   1 (<invalid> ago)   132m    10.131.0.6     xiyuan29-1-cdwbk-worker-us-east-1a-x62rr   <none> 

crio log show:
Jun 29 08:27:17 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr hyperkube[1558]: E0629 08:27:17.151349    1558 pod_workers.go:951] "Error syncing pod, skipping" err="failedto \"StartContainer\" for \"registry\" with CreateContainerError: \"error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_263e89b-1e84-42d7-b443-49afd3908603_2 for id 7f1a96c4f0fa18d13b9a8abfe8a500a0ebd9a02598414177cdda8736f95641f4: name is reserved\"" pod="openshift-image-registry/imag-registry-67b7b8989c-bxr9r" podUID=2763e89b-1e84-42d7-b443-49afd3908603
Jun 29 08:27:28 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr hyperkube[1558]: I0629 08:27:28.145030    1558 scope.go:110] "RemoveContainer" containerID="27ae76a217d9cd9ce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d"
me service failed" err="rpc error: code = Unknown desc = error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_2763e89b-1e8-42d7-b443-49afd3908603_2 for id 077001d265a14c47e24c5ca416acaab113970b5e0dc1214a2586436a14379c9f: name is reserved" podSandboxID="ddb4eaa79175cfe8ec59d27084ccbe143b7e6b7da2a5556345225255a171e79"k-worker-us-east-1a-x62rr hyperkube[1558]: E0629 08:27:28.150112    1558 kuberuntime_manager.go:905] container &Container{Name:regist


Jun 29 08:29:53 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr crio[1520]: time="2022-06-29 08:29:53.150254485Z" level=info msg="Image status: &ImageStatusResponse{Image:&mage{Id:321acf81cbf85b4d748fceee879b0c651fefeeb923dca3649d9ab6d4c68fc68a,RepoTags:[],RepoDigests:[quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7ccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282],Size_:395917103,Uid:&Int64Value{Value:1001,},Username:,Spec:nil,},Info:map[string]string{},}" id=f845b6e5-f2f4-404-9f11-8f92534328f0 name=/runtime.v1.ImageService/ImageStatus
Jun 29 08:29:53 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr crio[1520]: time="2022-06-29 08:29:53.151525445Z" level=info msg="Creating container: openshift-image-registy/image-registry-67b7b8989c-bxr9r/registry" id=74e15498-4185-466c-b289-416209463455 name=/runtime.v1.RuntimeService/CreateContainer
Jun 29 08:29:53 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr crio[1520]: time="2022-06-29 08:29:53.151588903Z" level=warning msg="error reserving ctr name k8s_registry_iage-registry-67b7b8989c-bxr9r_openshift-image-registry_2763e89b-1e84-42d7-b443-49afd3908603_2 for id cdb11504a09fe1948e8d578fdde5262849bea4618604092a33675858809df4a: name is reserved"




Expected results:
the pod createContainer successfully.

Additional info:

Comment 1 MinLi 2022-06-30 05:28:00 UTC
from the pod's description, the containerID is :27ae76a217d9c
[cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d]

yet from the crictl command, the containerID is:19cdcd51353a2
[CONTAINER           IMAGE                                                                                                                    CREATED                  STATE               NAME                                    ATTEMPT             POD ID              POD
19cdcd51353a2       321acf81cbf85b4d748fceee879b0c651fefeeb923dca3649d9ab6d4c68fc68a                                                         2 hours ago              Running             registry                                2                   ddb4eaa79175c       image-registry-67b7b8989c-bxr9r]

They are different.



% oc describe pod image-registry-67b7b8989c-bxr9r -n   openshift-image-registry 
Name:                 image-registry-67b7b8989c-bxr9r
Namespace:            openshift-image-registry
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 xiyuan29-1-cdwbk-worker-us-east-1a-x62rr/10.0.152.106
Start Time:           Wed, 29 Jun 2022 22:40:23 +0800
Labels:               docker-registry=default
                      pod-template-hash=67b7b8989c
Annotations:          imageregistry.operator.openshift.io/dependencies-checksum: sha256:68a7d1da976ade4883af2f220ff9b0b521d308ed9ccbf5b43bd4d7fc4fafa1e5
...
Status:               Running
IP:                   10.131.0.6
IPs:
  IP:           10.131.0.6
Controlled By:  ReplicaSet/image-registry-67b7b8989c
Containers:
  registry:
    Container ID:  cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282
    Port:          5000/TCP
    Host Port:     0/TCP
    Command:
      /bin/sh
      -c
      mkdir -p /etc/pki/ca-trust/extracted/edk2 /etc/pki/ca-trust/extracted/java /etc/pki/ca-trust/extracted/openssl /etc/pki/ca-trust/extracted/pem && update-ca-trust extract && exec /usr/bin/dockerregistry
    State:          Waiting
      Reason:       CreateContainerError
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Wed, 29 Jun 2022 22:40:33 +0800
      Finished:     Wed, 29 Jun 2022 22:40:35 +0800
    Ready:          False
    Restart Count:  1
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   http-get https://:5000/healthz delay=5s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get https://:5000/healthz delay=15s timeout=5s period=10s #success=1 #failure=3
    Environment:
      REGISTRY_STORAGE_OSS_ENDPOINT:               xiyuan29-1-cdwbk-image-registry-us-east-1-xiodehipentpoddktbsb.oss-us-east-1-internal.aliyuncs.com
      REGISTRY_STORAGE:                            oss
      REGISTRY_STORAGE_OSS_BUCKET:                 xiyuan29-1-cdwbk-image-registry-us-east-1-xiodehipentpoddktbsb
      REGISTRY_STORAGE_OSS_REGION:                 oss-us-east-1
      REGISTRY_STORAGE_OSS_INTERNAL:               true
      REGISTRY_STORAGE_OSS_ENCRYPT:                true
      REGISTRY_STORAGE_OSS_CREDENTIALSCONFIGPATH:  /var/run/secrets/cloud/credentials
      REGISTRY_STORAGE_OSS_ACCESSKEYID:            LTAI5tAa9KyRURVfKAc8qVQA
      REGISTRY_STORAGE_OSS_ACCESSKEYSECRET:        7bZW2mcMMTcZlNSictBNShElolRNiJ
      REGISTRY_HTTP_ADDR:                          :5000
      REGISTRY_HTTP_NET:                           tcp
      REGISTRY_HTTP_SECRET:                        bc8fc21e845460d5643e47dbf01b8f62dec82933dc248e8c21d5f8a5a05108f4ffda9b6858e60ea83d2da5859bd3267f52b7b925ee36764daa9004959e4d45cc
      REGISTRY_LOG_LEVEL:                          info
      REGISTRY_OPENSHIFT_QUOTA_ENABLED:            true
      REGISTRY_STORAGE_CACHE_BLOBDESCRIPTOR:       inmemory
      REGISTRY_STORAGE_DELETE_ENABLED:             true
      REGISTRY_HEALTH_STORAGEDRIVER_ENABLED:       true
      REGISTRY_HEALTH_STORAGEDRIVER_INTERVAL:      10s
      REGISTRY_HEALTH_STORAGEDRIVER_THRESHOLD:     1
      REGISTRY_OPENSHIFT_METRICS_ENABLED:          true
      REGISTRY_OPENSHIFT_SERVER_ADDR:              image-registry.openshift-image-registry.svc:5000
      REGISTRY_HTTP_TLS_CERTIFICATE:               /etc/secrets/tls.crt
      REGISTRY_HTTP_TLS_KEY:                       /etc/secrets/tls.key
    Mounts:
      /etc/pki/ca-trust/extracted from ca-trust-extracted (rw)
      /etc/pki/ca-trust/source/anchors from registry-certificates (rw)
      /etc/secrets from registry-tls (rw)
      /usr/share/pki/ca-trust-source from trusted-ca (rw)
      /var/lib/kubelet/ from installation-pull-secrets (rw)
      /var/run/secrets/cloud from image-registry-private-configuration (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xsrq9 (ro)
      /var/run/secrets/openshift/serviceaccount from bound-sa-token (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  image-registry-private-configuration:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  image-registry-private-configuration
    Optional:    false
  registry-tls:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          image-registry-tls
    SecretOptionalName:  <nil>
  ca-trust-extracted:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  registry-certificates:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      image-registry-certificates
    Optional:  false
  trusted-ca:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      trusted-ca
    Optional:  true
  installation-pull-secrets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  installation-pull-secrets
    Optional:    true
  bound-sa-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3600
  kube-api-access-xsrq9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                            From               Message
  ----     ------            ----                           ----               -------
  Warning  FailedScheduling  125m                           default-scheduler  0/6 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  125m                           default-scheduler  0/6 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  124m                           default-scheduler  0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  121m                           default-scheduler  0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) didn't match pod topology spread constraints, 2 node(s) were unschedulable, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't match pod topology spread constraints, 4 Preemption is not helpful for scheduling.
  Normal   Scheduled         120m                           default-scheduler  Successfully assigned openshift-image-registry/image-registry-67b7b8989c-bxr9r to xiyuan29-1-cdwbk-worker-us-east-1a-x62rr by xiyuan29-1-cdwbk-master-2
  Warning  BackOff           110m (x47 over <invalid>)      kubelet            Back-off restarting failed container
  Normal   Pulled            31s (x509 over <invalid>)      kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282" already present on machine
  Normal   AddedInterface    <invalid>                      multus             Add eth0 [10.131.0.6/23] from openshift-sdn
  Normal   Created           <invalid> (x2 over <invalid>)  kubelet            Created container registry
  Normal   Started           <invalid> (x2 over <invalid>)  kubelet            Started container registry

Comment 2 MinLi 2022-06-30 05:30:32 UTC
$ oc get pod/image-registry-67b7b8989c-bxr9r -n openshift-image-registry -o yaml | yq -y ‘.status.containerStatuses’
- containerID: cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d
  image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282
  imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41f7277413c7dccb5c7073943e6a701d596ebed75fd9f3476159b728c813d282
  lastState:
    terminated:
      containerID: cri-o://27ae76a217d9cd9cce1011889789ed71f99955ba089def3ba4e5c65b7bb6580d
      exitCode: 2
      finishedAt: ‘2022-06-29T14:40:35Z’
      reason: Error
      startedAt: ‘2022-06-29T14:40:33Z’
  name: registry
  ready: false
  restartCount: 1
  started: false
  state:
    waiting:
      message: ‘error reserving ctr name k8s_registry_image-registry-67b7b8989c-bxr9r_openshift-image-registry_2763e89b-1e84-42d7-b443-49afd3908603_2
        for id c220a9c99abb876d637f78a814daa68556e5aaa7c908204a9e5d299066cdbf99: name
        is reserved’
      reason: CreateContainerError

Comment 3 MinLi 2022-06-30 05:31:37 UTC
the must-gather can be found here: https://drive.google.com/file/d/1wu8GUVMGwaGOhE2g0hjnwsdqt7wcTLMh/view


Note You need to log in before you can comment on or make changes to this bug.