Bug 1583500
Summary: | Unqualified image is completed with "docker.io" | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | weiwei jiang <wjiang> | |
Component: | Node | Assignee: | Seth Jennings <sjenning> | |
Status: | CLOSED ERRATA | QA Contact: | weiwei jiang <wjiang> | |
Severity: | urgent | Docs Contact: | ||
Priority: | medium | |||
Version: | 3.10.0 | CC: | amcdermo, andcosta, aos-bugs, apizarro, bkozdemb, boris.ruppert, bpritche, byount, chrkim, cshereme, dma, dmoessne, farandac, fbrychta, gmarcote, hekumar, jhocutt, jmalde, jokerman, jonathan.moore, jrosenta, kjartan.paulsen, lstanton, lxia, mkinasz, mmccomas, mrobson, mruzicka, msomasun, mzali, nberry, nils.ketelsen, pdwyer, pkanthal, pprakash, rkant, rpenta, sjenning, szobair, tcarlin, weliang, william_mowery, wjiang, xtian, xxia | |
Target Milestone: | --- | Keywords: | Regression | |
Target Release: | 3.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: |
undefined
|
Story Points: | --- | |
Clone Of: | ||||
: | 1588768 (view as bug list) | Environment: | ||
Last Closed: | 2018-07-30 19:16:51 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1572182, 1581622, 1588768 |
Description
weiwei jiang
2018-05-29 06:54:39 UTC
*** Bug 1584494 has been marked as a duplicate of this bug. *** Same thing happened for deploying openshift3/ose-egress-router pod in openshift v3.10.0-0.56.0. Same steps below worked fine several weeks ago. This issue block our further testing. [root@ip-172-18-11-114 ~]# oc create -f test.yaml pod "egress-dns-proxy" created [root@ip-172-18-11-114 ~]# oc get pods NAME READY STATUS RESTARTS AGE egress-dns-proxy 0/1 Init:0/1 0 4s [root@ip-172-18-11-114 ~]# oc get pods NAME READY STATUS RESTARTS AGE egress-dns-proxy 0/1 Init:ErrImagePull 0 9s [root@ip-172-18-11-114 ~]# oc describe pod egress-dns-proxy Name: egress-dns-proxy Namespace: p1 Node: ip-172-18-6-68.ec2.internal/172.18.6.68 Start Time: Fri, 01 Jun 2018 11:14:11 -0400 Labels: name=egress-dns-proxy Annotations: openshift.io/scc=privileged pod.network.openshift.io/assign-macvlan=true Status: Pending IP: Init Containers: egress-router-setup: Container ID: Image: openshift3/ose-egress-router Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ImagePullBackOff Ready: False Restart Count: 0 Environment: EGRESS_SOURCE: 172.18.6.68 EGRESS_GATEWAY: 172.18.0.1 EGRESS_ROUTER_MODE: dns-proxy Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-6dsqk (ro) Containers: egress-dns-proxy: Container ID: Image: openshift3/ose-egress-dns-proxy Image ID: Port: <none> Host Port: <none> State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Environment: EGRESS_DNS_PROXY_DEBUG: 1 EGRESS_DNS_PROXY_DESTINATION: 80 www.baidu.com Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-6dsqk (ro) Conditions: Type Status Initialized False Ready False PodScheduled True Volumes: default-token-6dsqk: Type: Secret (a volume populated by a Secret) SecretName: default-token-6dsqk Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/compute=true Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 29s default-scheduler Successfully assigned egress-dns-proxy to ip-172-18-6-68.ec2.internal Normal Pulling 13s (x2 over 24s) kubelet, ip-172-18-6-68.ec2.internal pulling image "openshift3/ose-egress-router" Warning Failed 13s (x2 over 24s) kubelet, ip-172-18-6-68.ec2.internal Failed to pull image "openshift3/ose-egress-router": rpc error: code = Unknown desc = repository docker.io/openshift3/ose-egress-router not found: does not exist or no pull access Warning Failed 13s (x2 over 24s) kubelet, ip-172-18-6-68.ec2.internal Error: ErrImagePull Normal BackOff 4s (x3 over 18s) kubelet, ip-172-18-6-68.ec2.internal Back-off pulling image "openshift3/ose-egress-router" Warning Failed 4s (x3 over 18s) kubelet, ip-172-18-6-68.ec2.internal Error: ImagePullBackOff Normal SandboxChanged 3s (x5 over 23s) kubelet, ip-172-18-6-68.ec2.internal Pod sandbox changed, it will be killed and re-created. [root@ip-172-18-11-114 ~]# cat test.yaml apiVersion: v1 kind: Pod metadata: name: egress-dns-proxy labels: name: egress-dns-proxy annotations: pod.network.openshift.io/assign-macvlan: "true" spec: initContainers: - name: egress-router-setup imagePullPolicy: IfNotPresent image: openshift3/ose-egress-router securityContext: privileged: true env: - name: EGRESS_SOURCE value: 172.18.6.68 - name: EGRESS_GATEWAY value: 172.18.0.1 - name: EGRESS_ROUTER_MODE value: dns-proxy containers: - name: egress-dns-proxy image: openshift3/ose-egress-dns-proxy imagePullPolicy: IfNotPresent env: - name: EGRESS_DNS_PROXY_DEBUG value: "1" - name: EGRESS_DNS_PROXY_DESTINATION value: | 80 www.baidu.com [root@ip-172-18-11-114 ~]# [root@ip-172-18-11-114 ~]# [root@ip-172-18-11-114 ~]# oc version oc v3.10.0-0.56.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-11-114.ec2.internal:8443 openshift v3.10.0-0.56.0 kubernetes v1.10.0+b81c8f8 [root@ip-172-18-11-114 ~]# WIP fix: https://github.com/openshift/origin/pull/19903 But as discussed with Seth we may go about this a different way (which is mentioned in the PR). (In reply to Weibin Liang from comment #11) > Same thing happened for deploying openshift3/ose-egress-router pod in > openshift v3.10.0-0.56.0. > > Same steps below worked fine several weeks ago. > > This issue block our further testing. > > > [root@ip-172-18-11-114 ~]# oc create -f test.yaml > pod "egress-dns-proxy" created > [root@ip-172-18-11-114 ~]# oc get pods > NAME READY STATUS RESTARTS AGE > egress-dns-proxy 0/1 Init:0/1 0 4s > [root@ip-172-18-11-114 ~]# oc get pods > NAME READY STATUS RESTARTS AGE > egress-dns-proxy 0/1 Init:ErrImagePull 0 9s > [root@ip-172-18-11-114 ~]# oc describe pod egress-dns-proxy > Name: egress-dns-proxy > Namespace: p1 > Node: ip-172-18-6-68.ec2.internal/172.18.6.68 > Start Time: Fri, 01 Jun 2018 11:14:11 -0400 > Labels: name=egress-dns-proxy > Annotations: openshift.io/scc=privileged > pod.network.openshift.io/assign-macvlan=true > Status: Pending > IP: > Init Containers: > egress-router-setup: > Container ID: > Image: openshift3/ose-egress-router > Image ID: > Port: <none> > Host Port: <none> > State: Waiting > Reason: ImagePullBackOff > Ready: False > Restart Count: 0 > Environment: > EGRESS_SOURCE: 172.18.6.68 > EGRESS_GATEWAY: 172.18.0.1 > EGRESS_ROUTER_MODE: dns-proxy > Mounts: > /var/run/secrets/kubernetes.io/serviceaccount from default-token-6dsqk > (ro) > Containers: > egress-dns-proxy: > Container ID: > Image: openshift3/ose-egress-dns-proxy > Image ID: > Port: <none> > Host Port: <none> > State: Waiting > Reason: PodInitializing > Ready: False > Restart Count: 0 > Environment: > EGRESS_DNS_PROXY_DEBUG: 1 > EGRESS_DNS_PROXY_DESTINATION: 80 www.baidu.com > > Mounts: > /var/run/secrets/kubernetes.io/serviceaccount from default-token-6dsqk > (ro) > Conditions: > Type Status > Initialized False > Ready False > PodScheduled True > Volumes: > default-token-6dsqk: > Type: Secret (a volume populated by a Secret) > SecretName: default-token-6dsqk > Optional: false > QoS Class: BestEffort > Node-Selectors: node-role.kubernetes.io/compute=true > Tolerations: <none> > Events: > Type Reason Age From > Message > ---- ------ ---- ---- > ------- > Normal Scheduled 29s default-scheduler > Successfully assigned egress-dns-proxy to ip-172-18-6-68.ec2.internal > Normal Pulling 13s (x2 over 24s) kubelet, > ip-172-18-6-68.ec2.internal pulling image "openshift3/ose-egress-router" > Warning Failed 13s (x2 over 24s) kubelet, > ip-172-18-6-68.ec2.internal Failed to pull image > "openshift3/ose-egress-router": rpc error: code = Unknown desc = repository > docker.io/openshift3/ose-egress-router not found: does not exist or no pull > access > Warning Failed 13s (x2 over 24s) kubelet, > ip-172-18-6-68.ec2.internal Error: ErrImagePull > Normal BackOff 4s (x3 over 18s) kubelet, > ip-172-18-6-68.ec2.internal Back-off pulling image > "openshift3/ose-egress-router" > Warning Failed 4s (x3 over 18s) kubelet, > ip-172-18-6-68.ec2.internal Error: ImagePullBackOff > Normal SandboxChanged 3s (x5 over 23s) kubelet, > ip-172-18-6-68.ec2.internal Pod sandbox changed, it will be killed and > re-created. > [root@ip-172-18-11-114 ~]# cat test.yaml > apiVersion: v1 > kind: Pod > metadata: > name: egress-dns-proxy > labels: > name: egress-dns-proxy > annotations: > pod.network.openshift.io/assign-macvlan: "true" > spec: > initContainers: > - name: egress-router-setup > imagePullPolicy: IfNotPresent > image: openshift3/ose-egress-router > securityContext: > privileged: true > env: > - name: EGRESS_SOURCE > value: 172.18.6.68 > - name: EGRESS_GATEWAY > value: 172.18.0.1 > - name: EGRESS_ROUTER_MODE > value: dns-proxy > containers: > - name: egress-dns-proxy > image: openshift3/ose-egress-dns-proxy > imagePullPolicy: IfNotPresent > env: > - name: EGRESS_DNS_PROXY_DEBUG > value: "1" > - name: EGRESS_DNS_PROXY_DESTINATION > value: | > 80 www.baidu.com > [root@ip-172-18-11-114 ~]# > [root@ip-172-18-11-114 ~]# > [root@ip-172-18-11-114 ~]# oc version > oc v3.10.0-0.56.0 > kubernetes v1.10.0+b81c8f8 > features: Basic-Auth GSSAPI Kerberos SPNEGO > > Server https://ip-172-18-11-114.ec2.internal:8443 > openshift v3.10.0-0.56.0 > kubernetes v1.10.0+b81c8f8 > [root@ip-172-18-11-114 ~]# You can use the completed one like `registry.reg-aws.openshift.com:443/openshift3/ose-egress-router`, so this bug should not be a blocker I think. *** Bug 1588435 has been marked as a duplicate of this bug. *** *** Bug 1588467 has been marked as a duplicate of this bug. *** For my 3.7 > 3.9 upgrade, I was able to workaround *this* issue by running on each OCP node: docker pull registry.access.redhat.com/openshift3/ose-pod:v3.9.30 I was trying a new Openshift 3.9.30 installation and hit this bug. As a workarround, I set this values in the ansible inventory file: oreg_url=registry.access.redhat.com/openshift3/ose-${component}:${version} openshift_examples_modify_imagestreams=true After this change, I could install properly. Alfredo: That was brilliant! I was able to complete upgrading the control plane after adding your 2 values to /etc/ansible/hosts. Thank you. Do we need to update openshift docs to reflect this? I got stuck on same thing and going to retry with updated inventory file. @Alfredo, you are a life saver. Encountered the same issue on 3.9.30 advanced installation and adding the two lines to the inventory file looks like a solid workaround. Thanks for posting! Hi wjiang, Please verify the bug, thanks. Checked with # oc version oc v3.10.0-0.67.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-pod-310f2-master-etcd-1:8443 openshift v3.10.0-0.67.0 kubernetes v1.10.0+b81c8f8 And the issue can not be reproduced. (In reply to weiwei jiang from comment #33) > Checked with > # oc version > oc v3.10.0-0.67.0 > kubernetes v1.10.0+b81c8f8 > features: Basic-Auth GSSAPI Kerberos SPNEGO > > Server https://qe-pod-310f2-master-etcd-1:8443 > openshift v3.10.0-0.67.0 > kubernetes v1.10.0+b81c8f8 > > And the issue can not be reproduced. Checked with 3.10.0-0.67.0 again, and found that the issue is still not fixed As my test scenario use latest tag, so go into registry.access.redhat.com, and not internal registry. Since latest tag will go into registry.access.redhat.com, so seems docker.io will not be completed now. So I expect that image in registry.reg-aws.redhat.com:443 will be used but actually not now. So what's the current status for this issue. Or something else come in and affect the test result? My scenarios: // Pod is running, Image use registry.reg-aws.openshift.com:443 # oc run aoeuaoeuaa --image=registry.reg-aws.openshift.com:443/openshift3/metrics-heapster:v3.10.0-0.67.0 --command sleep 10d // Pod is ImagePullBackOff # oc run aoeuaoaoeueuaa --image=openshift3/metrics-heapster:v3.10.0-0.67.0 --command sleep 10d // Pod is running, Image use registry.access.redhat.com # oc run h --image=openshift3/metrics-heapster:latest --commmand sleep 10d // Pod is running, Image use registry.reg-aws.openshift.com:443 # oc run ha --image=registry.reg-aws.openshift.com:443/openshift3/metrics-heapster:latest --command sleep 10d (In reply to weiwei jiang from comment #36) > My scenarios: > > // Pod is running, Image use registry.reg-aws.openshift.com:443 > # oc run aoeuaoeuaa > --image=registry.reg-aws.openshift.com:443/openshift3/metrics-heapster:v3.10. > 0-0.67.0 --command sleep 10d This I expect to work because we use a fully qualified domain. > > // Pod is ImagePullBackOff > # oc run aoeuaoaoeueuaa --image=openshift3/metrics-heapster:v3.10.0-0.67.0 > --command sleep 10d Assuming "docker.io" is not getting added to this unqualified image reference I would expect this to fail because in the kubelet there will be no matching authorisation config that can ever match the unqualified image. The request will then use the lookup order in (projectatomic)/docker and fails because of the registry order in /etc/sysconfig/docker, which by default is: ADD_REGISTRY='--add-registry registry.reg-aws.openshift.com --add-registry registry.access.redhat.com' If you change the config to (i.e., reverse the entries): ADD_REGISTRY='--add-registry registry.access.redhat.com --add-registry registry.reg-aws.openshift.com' it will pull successfully as it will try from 'registry.reg-aws.openshift.com' first and the tag 'v3.10.0-0.67.0' exists there. > > // Pod is running, Image use registry.access.redhat.com > # oc run h --image=openshift3/metrics-heapster:latest --commmand sleep 10d This works because the default registry order is: ADD_REGISTRY='--add-registry registry.reg-aws.openshift.com --add-registry registry.access.redhat.com' and the image will be pulled from registry.access.redhat.com. As above, the acutal order seems to be the reverse of what is specified in the config file. > > // Pod is running, Image use registry.reg-aws.openshift.com:443 > # oc run ha > --image=registry.reg-aws.openshift.com:443/openshift3/metrics-heapster: > latest --command sleep 10d Expected to work because it is fully qualified and there will an auth config match for "registry.reg-aws.openshift.com" in the kubelet which will be passed to dockerd. Yes, this bug has gotten confusing. Verification for this bz should simply be "does everything that worked in v3.10.0-0.47.0 (the release before the carry patch went in) still work in build after https://github.com/openshift/origin/pull/19938/commits goes in. It doesn't seem there has been a build that includes https://github.com/openshift/origin/pull/19938 yet so this should still be MODIFIED This workaround does not work for CNS/Gluster images as they do not seem to refer to the oreg_url value. You must specify them explicitly: openshift_storage_glusterfs_image=registry.access.redhat.com/rhgs3/rhgs-server-rhel7 openshift_storage_glusterfs_heketi_image=/registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7 openshift_storage_glusterfs_block_image=registry.access.redhat.com/rhgs3/rhgs-gluster-block-prov-rhel7 It seems that bug 1516534 is related. Looking at test case #2 on v3.10.1 I see: Jun 19 19:54:55 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:55.975687856Z" level=debug msg="Calling POST /v1.26/images/create?fromImage=openshift3%2Fose-pod&tag=v3.10.1" Jun 19 19:54:55 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:55.978711558Z" level=debug msg="Trying to pull registry.reg-aws.openshift.com/openshift3/ose-pod from https://registry.reg-aws.openshift.com v2" Jun 19 19:54:56 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:56.099076645Z" level=debug msg="Trying to pull registry.reg-aws.openshift.com/openshift3/ose-pod from https://registry.reg-aws.openshift.com v1" Jun 19 19:54:56 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:56.126659677Z" level=debug msg="[registry] Calling GET https://registry.reg-aws.openshift.com/v1/repositories/openshift3/ose-pod/images" Jun 19 19:54:56 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:56.140041719Z" level=error msg="Not continuing with pull after error: Error: image openshift3/ose-pod:v3.10.1 not found" Jun 19 19:54:56 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:56.206532663Z" level=debug msg="Trying to pull registry.access.redhat.com/openshift3/ose-pod from https://registry.access.redhat.com v2" Jun 19 19:54:57 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:57.272963146Z" level=debug msg="Trying to pull registry.access.redhat.com/openshift3/ose-pod from https://registry.access.redhat.com v2" Jun 19 19:54:57 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:57.839253462Z" level=debug msg="Trying to pull docker.io/openshift3/ose-pod from https://registry-1.docker.io v2" Jun 19 19:54:57 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:57.891373379Z" level=info msg="Translating \"denied: requested access to the resource is denied\" to \"repository docker.io/openshift3/ose-pod not found: does not exist or no pull access\"" It is trying to pull from the v1 endpoint for reg-aws. reg-aws uses v2. # docker info | grep Registries Insecure Registries: Registries: registry.reg-aws.openshift.com (secure), registry.access.redhat.com (secure), registry.access.redhat.com (secure), docker.io (secure) There is an attempt to pull from reg-aws, it just doesn't work when using an unqualified name. When I change the pod to a fully qualified name, it uses v2: Jun 19 20:05:38 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T20:05:38.939687267Z" level=debug msg="Calling POST /v1.26/images/create?fromImage=registry.reg-aws.openshift.com%2Fopenshift3%2Fose-pod&tag=v3.10.1" Jun 19 20:05:38 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T20:05:38.979055029Z" level=debug msg="Trying to pull registry.reg-aws.openshift.com/openshift3/ose-pod from https://registry.reg-aws.openshift.com v2" Jun 19 20:05:39 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T20:05:39.172655681Z" level=debug msg="Pulling ref from V2 registry: registry.reg-aws.openshift.com/openshift3/ose-pod:v3.10.1" Jun 19 20:05:39 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T20:05:39.174723581Z" level=debug msg="Calling GET /v1.26/images/registry.reg-aws.openshift.com/openshift3/ose-pod:v3.10.1/json" I am going to install .47 (pre any docker.io changes) and see if this happens there as well. If so, then this bz can be verified and we'll open a separate bz for this v1 vs v2 issue. This is a revert and can not introduce new issues. I'm moving this back to ON_QA and please do not failedQA unless it can be demonstrated that a test succeeds for .47 or earlier but fails for .67 or later. If a test fails both, then that is a new bug, not a reason to failedQA on this bug. (In reply to Seth Jennings from comment #47) > This is a revert and can not introduce new issues. > > I'm moving this back to ON_QA and please do not failedQA unless it can be > demonstrated that a test succeeds for .47 or earlier but fails for .67 or > later. > > If a test fails both, then that is a new bug, not a reason to failedQA on > this bug. Checked with v3.10.1, and can not reproduce the original issue, so move to verified. Create a new bug to track the issue in https://bugzilla.redhat.com/show_bug.cgi?id=1583500#c46 (In reply to Jared Hocutt from comment #45) > This workaround does not work for CNS/Gluster images as they do not seem to > refer to the oreg_url value. > > You must specify them explicitly: > > openshift_storage_glusterfs_image=registry.access.redhat.com/rhgs3/rhgs- > server-rhel7 > openshift_storage_glusterfs_heketi_image=/registry.access.redhat.com/rhgs3/ > rhgs-volmanager-rhel7 > openshift_storage_glusterfs_block_image=registry.access.redhat.com/rhgs3/ > rhgs-gluster-block-prov-rhel7 > > It seems that bug 1516534 is related. Accidentally had a "/" in front of registry.access.redhat.com. Here's the correct value for openshift_storage_glusterfs_heketi_image. openshift_storage_glusterfs_heketi_image=registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816 |