Description of problem: NFS volume recycle failed for ErrImagePull Version-Release number of selected component (if applicable): openshift v3.9.1 kubernetes v1.9.1+a0ce1bc657 How reproducible: always Steps to Reproduce: 1. create a PV with persistentVolumeReclaimPolicy=Recycle 2. create a PVC using the PV created above 3. delete PVC 4. check pv status Actual results: PV status is Released Expected results: PV status is Available Master Log: Node Log (of failed PODs): Feb 28 03:04:38 qe-gpei-test4master-etcd-zone1-1 atomic-openshift-node[14615]: I0228 03:04:38.947835 14625 kuberuntime_manager.go:514] Container {Name:pv-recycler Image:registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1 Command:[/usr/bin/openshift-recycle] Args:[/scrub] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:vol ReadOnly:false MountPath:/scrub SubPath: MountPropagation:<nil>} {Name:pv-recycler-controller-token-9mdw7 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[MKNOD],},Privileged:nil,SELinuxOptions:nil,RunAsUser:*0,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it. Feb 28 03:04:38 qe-gpei-test4master-etcd-zone1-1 atomic-openshift-node[14615]: I0228 03:04:38.948013 14625 kuberuntime_manager.go:725] Creating container &Container{Name:pv-recycler,Image:registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1,Command:[/usr/bin/openshift-recycle],Args:[/scrub],WorkingDir:,Ports:[],Env:[],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[{vol false /scrub <nil>} {pv-recycler-controller-token-9mdw7 true /var/run/secrets/kubernetes.io/serviceaccount <nil>}],LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[MKNOD],},Privileged:nil,SELinuxOptions:nil,RunAsUser:*0,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[],TerminationMessagePolicy:File,VolumeDevices:[],} in pod recycler-for-nfs-hkbss_openshift-infra(b27a548b-1c5d-11e8-86a3-42010af0006b) Feb 28 03:04:38 qe-gpei-test4master-etcd-zone1-1 atomic-openshift-node[14615]: I0228 03:04:38.950448 14625 kuberuntime_manager.go:732] container start failed: ImagePullBackOff: Back-off pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1" Feb 28 03:04:38 qe-gpei-test4master-etcd-zone1-1 atomic-openshift-node[14615]: E0228 03:04:38.950488 14625 pod_workers.go:186] Error syncing pod b27a548b-1c5d-11e8-86a3-42010af0006b ("recycler-for-nfs-hkbss_openshift-infra(b27a548b-1c5d-11e8-86a3-42010af0006b)"), skipping: failed to "StartContainer" for "pv-recycler" with ImagePullBackOff: "Back-off pulling image \"registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1\"" Feb 28 03:04:38 qe-gpei-test4master-etcd-zone1-1 atomic-openshift-node[14615]: I0228 03:04:38.950982 14625 server.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-infra", Name:"recycler-for-nfs-hkbss", UID:"b27a548b-1c5d-11e8-86a3-42010af0006b", APIVersion:"v1", ResourceVersion:"54243", FieldPath:"spec.containers{pv-recycler}"}): type: 'Normal' reason: 'BackOff' Back-off pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1" Feb 28 03:04:38 qe-gpei-test4master-etcd-zone1-1 runc[12519]: time="2018-02-28T03:04:38.949591095-05:00" level=error msg="Handler for GET /v1.26/images/registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1/json returned error: No such image: registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1" Feb 28 03:04:38 qe-gpei-test4master-etcd-zone1-1 runc[12519]: time="2018-02-28T03:04:38.949997350-05:00" level=error msg="Handler for GET /v1.26/images/registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1/json returned error: No such image: registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1" PV Dump: { "apiVersion": "v1", "kind": "PersistentVolume", "metadata": { "name": "nfs", "labels": { "usedFor": "tc522215" } }, "spec": { "capacity": { "storage": "5Gi" }, "accessModes": [ "ReadWriteMany" ], "nfs": { "path": "/", "server": "172.30.163.146" }, "persistentVolumeReclaimPolicy": "Recycle" } } PVC Dump: { "apiVersion": "v1", "kind": "PersistentVolumeClaim", "metadata": { "name": "nfsc1", "labels": { "usedFor": "tc522215" } }, "spec": { "accessModes": [ "ReadWriteMany" ], "resources": { "requests": { "storage": "5Gi" } } } StorageClass Dump (if StorageClass used by PV/PVC): Additional info: # oc describe pod recycler-for-c8a69 -n openshift-infra Name: recycler-for-c8a69 Namespace: openshift-infra Node: 172.16.120.78/ Start Time: Thu, 01 Mar 2018 01:15:24 -0500 Labels: <none> Annotations: openshift.io/scc=hostmount-anyuid Status: Failed Reason: DeadlineExceeded Message: Pod was active on the node longer than the specified deadline IP: Containers: pv-recycler: Image: registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1 Port: <none> Command: /usr/bin/openshift-recycle Args: /scrub Environment: <none> Mounts: /scrub from vol (rw) /var/run/secrets/kubernetes.io/serviceaccount from pv-recycler-controller-token-qfkqr (ro) Volumes: vol: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: 172.30.163.146 Path: / ReadOnly: false pv-recycler-controller-token-qfkqr: Type: Secret (a volume populated by a Secret) SecretName: pv-recycler-controller-token-qfkqr Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulMountVolume 9m kubelet, 172.16.120.78 MountVolume.SetUp succeeded for volume "pv-recycler-controller-token-qfkqr" Normal SuccessfulMountVolume 9m kubelet, 172.16.120.78 MountVolume.SetUp succeeded for volume "vol" Normal Scheduled 9m default-scheduler Successfully assigned recycler-for-c8a69 to 172.16.120.78 Normal Pulling 9m kubelet, 172.16.120.78 pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1" Warning Failed 9m kubelet, 172.16.120.78 Failed to pull image "registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v1.9.1": rpc error: code = Unknown desc = Error: image openshift3/ose-recycler:v1.9.1 not found Warning Failed 9m kubelet, 172.16.120.78 Error: ErrImagePull Normal SandboxChanged 8m (x20 over 9m) kubelet, 172.16.120.78 Pod sandbox changed, it will be killed and re-created. Normal DeadlineExceeded 4m (x2 over 4m) kubelet, 172.16.120.78 Pod was active on the node longer than the specified deadline
Why the v1.9.1 for the image tag? The recycler was shipped with 3.9 with the tags listed here: https://access.redhat.com/containers/?tab=tags#/registry.access.redhat.com/openshift3/ose-recycler (e.g. v3.9, v3.9.14)
I see the same issue in our v3.9.14 instance. It's pulling ose-recycler:v1.9.1 for some reason. Is there any workaround for this? Can I update some template so It would be pulling ose-recycler:latest instead?
It looks like it would require a change to the pod you are using - which is configured with controller arguments: https://docs.openshift.com/container-platform/3.6/architecture/additional_concepts/storage.html :latest should be relatively safe since I don't believe this image varies between releases.
Verify this issue in OCP v3.9.20, still get the same error. The image tag v3.9.20 for one-recycler is exist, and the following command can be run successfully: docker pull registry.reg-aws.openshift.com:443/openshift3/ose-recycler:v3.9.20
Another workaround available to users is - defining environment variable: export OPENSHIFT_RECYCLER_IMAGE="openshift/origin-recycler:v3.9.0" or export OPENSHIFT_RECYCLER_IMAGE="openshift3/ose-recycler:v3.9.20"
Opened a PR for fixing this in 3.9 so as users don't have to use environment variable - https://github.com/openshift/origin/pull/19374
v1.9.1 tag workaround works for me. Still need a real fix, for the PR is not merged, change the bug status to "ASSIGNED".
https://github.com/openshift/origin/pull/19406
Verified in OCP: oc v3.9.24 openshift v3.9.24 kubernetes v1.9.1+a0ce1bc657 # uname -a Linux host-172-16-120-35 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1566
(In reply to Hemant Kumar from comment #14) > Another workaround available to users is - defining environment variable: > > export OPENSHIFT_RECYCLER_IMAGE="openshift/origin-recycler:v3.9.0" > > or > export OPENSHIFT_RECYCLER_IMAGE="openshift3/ose-recycler:v3.9.20" @Hemant does ones define this environment variable on every app node or what and with the root account or what?
This should be defined on master node where controller manager runs.