Bug 1583500

Summary:	Unqualified image is completed with "docker.io"
Product:	OpenShift Container Platform	Reporter:	weiwei jiang <wjiang>
Component:	Node	Assignee:	Seth Jennings <sjenning>
Status:	CLOSED ERRATA	QA Contact:	weiwei jiang <wjiang>
Severity:	urgent	Docs Contact:
Priority:	medium
Version:	3.10.0	CC:	amcdermo, andcosta, aos-bugs, apizarro, bkozdemb, boris.ruppert, bpritche, byount, chrkim, cshereme, dma, dmoessne, farandac, fbrychta, gmarcote, hekumar, jhocutt, jmalde, jokerman, jonathan.moore, jrosenta, kjartan.paulsen, lstanton, lxia, mkinasz, mmccomas, mrobson, mruzicka, msomasun, mzali, nberry, nils.ketelsen, pdwyer, pkanthal, pprakash, rkant, rpenta, sjenning, szobair, tcarlin, weliang, william_mowery, wjiang, xtian, xxia
Target Milestone:	---	Keywords:	Regression
Target Release:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:	undefined	Story Points:	---
Clone Of:
Clones:	1588768 (view as bug list)		Environment:
Last Closed:	2018-07-30 19:16:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1572182, 1581622, 1588768

Description weiwei jiang 2018-05-29 06:54:39 UTC

Description of problem:
When try to create a pod with image in internal registry on a docker runtime cluster, got ImagePullBackOff.


Version-Release number of selected component (if applicable):
oc v3.10.0-0.53.0
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-9-3.ec2.internal:8443
openshift v3.10.0-0.53.0
kubernetes v1.10.0+b81c8f8

How reproducible:
always

Steps to Reproduce:
1. Create pods with unqualified image name
oc run test --image=openshift3/metrics-heapster:v3.10.0-0.53.0
2. Check the pods status
3.

Actual results:
# oc describe pods heapster-5df69f45c8-ds7wx -n 8feb4 --config=/home/wjiang/workdir/wjiang-wjiang/ose_user2.kubeconfig
Name:           heapster-5df69f45c8-ds7wx
Namespace:      8feb4               
Node:           ip-172-18-0-252.ec2.internal/172.18.0.252
Start Time:     Tue, 29 May 2018 14:27:56 +0800
Labels:         k8s-app=heapster     
                pod-template-hash=1892590174
                task=monitoring
Annotations:    openshift.io/scc=restricted
Status:         Pending
IP:             10.129.0.65       
Controlled By:  ReplicaSet/heapster-5df69f45c8
Containers:                          
  heapster:                                               
    Container ID:     
    Image:         openshift3/metrics-heapster:latest
    Image ID:
    Port:          <none>            
    Host Port:     <none>            
    Command:                                                                                                                                     
      heapster-wrapper.sh                                                        
      --api-server                                                               
      --bind-address=0.0.0.0                                                     
      --secure-port=8443                                                         
      --requestheader-client-ca-file=/var/run/kubernetes/request-header-ca.crt                                                                            
      --tls-ca-file=/var/run/kubernetes/client-ca.crt
      --source=kubernetes:https://kubernetes.default.svc?kubeletPort=10250&kubeletHttps=true
      --sink=influxdb:http://monitoring-influxdb.8feb4.svc:8086
      --requestheader-username-headers=X-Remote-User
      --requestheader-group-headers=X-Remote-Group
      --requestheader-extra-headers-prefix=X-Remote-Extra-
      --cert-dir=/tmp          
    State:          Waiting                
      Reason:       ImagePullBackOff
    Ready:          False  
    Restart Count:  0                         
    Environment:    <none>
    Mounts:
      /var/run/kubernetes from ca (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from heapster-token-f6lkr (ro)
Conditions:  
  Type           Status  
  Initialized    True    
  Ready          False
  PodScheduled   True    
Volumes:          
  ca:                       
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cert-configmap                                                 
    Optional:  false                                 
  heapster-token-f6lkr:                                                                     
    Type:        Secret (a volume populated by a Secret)       
    SecretName:  heapster-token-f6lkr               
    Optional:    false                            
QoS Class:       BestEffort                               
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>    
Events:                             
  Type     Reason          Age                 From                                   Message
  ----     ------          ----                ----                                   -------
  Normal   Scheduled       23m                 default-scheduler                      Successfully assigned heapster-5df69f45c8-ds7wx to ip-172-18-0-252.ec2.internal
  Normal   Pulling         23m (x2 over 23m)   kubelet, ip-172-18-0-252.ec2.internal  pulling image "openshift3/metrics-heapster:latest"
  Warning  Failed          23m (x2 over 23m)   kubelet, ip-172-18-0-252.ec2.internal  Failed to pull image "openshift3/metrics-heapster:latest": rpc error: code = Unknown desc = repository docker.io/openshift3/metrics-heapster not found: does not exist or no pull access
  Warning  Failed          23m (x2 over 23m)   kubelet, ip-172-18-0-252.ec2.internal  Error: ErrImagePull
  Normal   SandboxChanged  22m (x7 over 23m)   kubelet, ip-172-18-0-252.ec2.internal  Pod sandbox changed, it will be killed and re-created.
  Normal   BackOff         8m (x91 over 23m)   kubelet, ip-172-18-0-252.ec2.internal  Back-off pulling image "openshift3/metrics-heapster:latest"
  Warning  Failed          3m (x112 over 23m)  kubelet, ip-172-18-0-252.ec2.internal  Error: ImagePullBackOff


Expected results:
Should pull the correct image

Additional info:

Comment 9 Seth Jennings 2018-05-31 17:48:48 UTC

*** Bug 1584494 has been marked as a duplicate of this bug. ***

Comment 11 Weibin Liang 2018-06-01 15:32:50 UTC

Same thing happened for deploying openshift3/ose-egress-router pod in openshift v3.10.0-0.56.0.

Same steps below worked fine several weeks ago.

This issue block our further testing.


[root@ip-172-18-11-114 ~]# oc create -f test.yaml 
pod "egress-dns-proxy" created
[root@ip-172-18-11-114 ~]# oc get pods
NAME               READY     STATUS     RESTARTS   AGE
egress-dns-proxy   0/1       Init:0/1   0          4s
[root@ip-172-18-11-114 ~]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
egress-dns-proxy   0/1       Init:ErrImagePull   0          9s
[root@ip-172-18-11-114 ~]# oc describe pod egress-dns-proxy
Name:         egress-dns-proxy
Namespace:    p1
Node:         ip-172-18-6-68.ec2.internal/172.18.6.68
Start Time:   Fri, 01 Jun 2018 11:14:11 -0400
Labels:       name=egress-dns-proxy
Annotations:  openshift.io/scc=privileged
              pod.network.openshift.io/assign-macvlan=true
Status:       Pending
IP:           
Init Containers:
  egress-router-setup:
    Container ID:   
    Image:          openshift3/ose-egress-router
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:
      EGRESS_SOURCE:       172.18.6.68
      EGRESS_GATEWAY:      172.18.0.1
      EGRESS_ROUTER_MODE:  dns-proxy
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-6dsqk (ro)
Containers:
  egress-dns-proxy:
    Container ID:   
    Image:          openshift3/ose-egress-dns-proxy
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:
      EGRESS_DNS_PROXY_DEBUG:        1
      EGRESS_DNS_PROXY_DESTINATION:  80  www.baidu.com

    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-6dsqk (ro)
Conditions:
  Type           Status
  Initialized    False 
  Ready          False 
  PodScheduled   True 
Volumes:
  default-token-6dsqk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-6dsqk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type     Reason          Age                From                                  Message
  ----     ------          ----               ----                                  -------
  Normal   Scheduled       29s                default-scheduler                     Successfully assigned egress-dns-proxy to ip-172-18-6-68.ec2.internal
  Normal   Pulling         13s (x2 over 24s)  kubelet, ip-172-18-6-68.ec2.internal  pulling image "openshift3/ose-egress-router"
  Warning  Failed          13s (x2 over 24s)  kubelet, ip-172-18-6-68.ec2.internal  Failed to pull image "openshift3/ose-egress-router": rpc error: code = Unknown desc = repository docker.io/openshift3/ose-egress-router not found: does not exist or no pull access
  Warning  Failed          13s (x2 over 24s)  kubelet, ip-172-18-6-68.ec2.internal  Error: ErrImagePull
  Normal   BackOff         4s (x3 over 18s)   kubelet, ip-172-18-6-68.ec2.internal  Back-off pulling image "openshift3/ose-egress-router"
  Warning  Failed          4s (x3 over 18s)   kubelet, ip-172-18-6-68.ec2.internal  Error: ImagePullBackOff
  Normal   SandboxChanged  3s (x5 over 23s)   kubelet, ip-172-18-6-68.ec2.internal  Pod sandbox changed, it will be killed and re-created.
[root@ip-172-18-11-114 ~]# cat test.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: egress-dns-proxy
  labels:
    name: egress-dns-proxy
  annotations:
    pod.network.openshift.io/assign-macvlan: "true" 
spec:
  initContainers:
  - name: egress-router-setup
    imagePullPolicy:  IfNotPresent
    image: openshift3/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE
      value: 172.18.6.68
    - name: EGRESS_GATEWAY
      value: 172.18.0.1
    - name: EGRESS_ROUTER_MODE
      value: dns-proxy
  containers:
  - name: egress-dns-proxy
    image: openshift3/ose-egress-dns-proxy
    imagePullPolicy:  IfNotPresent
    env:
    - name: EGRESS_DNS_PROXY_DEBUG
      value: "1"
    - name: EGRESS_DNS_PROXY_DESTINATION
      value: |
        80  www.baidu.com
[root@ip-172-18-11-114 ~]# 
[root@ip-172-18-11-114 ~]# 
[root@ip-172-18-11-114 ~]# oc version
oc v3.10.0-0.56.0
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-11-114.ec2.internal:8443
openshift v3.10.0-0.56.0
kubernetes v1.10.0+b81c8f8
[root@ip-172-18-11-114 ~]#

Comment 12 Andrew McDermott 2018-06-01 18:17:04 UTC

WIP fix: https://github.com/openshift/origin/pull/19903

But as discussed with Seth we may go about this a different way (which is mentioned in the PR).

Comment 13 weiwei jiang 2018-06-04 07:29:05 UTC

(In reply to Weibin Liang from comment #11)
> Same thing happened for deploying openshift3/ose-egress-router pod in
> openshift v3.10.0-0.56.0.
> 
> Same steps below worked fine several weeks ago.
> 
> This issue block our further testing.
> 
> 
> [root@ip-172-18-11-114 ~]# oc create -f test.yaml 
> pod "egress-dns-proxy" created
> [root@ip-172-18-11-114 ~]# oc get pods
> NAME               READY     STATUS     RESTARTS   AGE
> egress-dns-proxy   0/1       Init:0/1   0          4s
> [root@ip-172-18-11-114 ~]# oc get pods
> NAME               READY     STATUS              RESTARTS   AGE
> egress-dns-proxy   0/1       Init:ErrImagePull   0          9s
> [root@ip-172-18-11-114 ~]# oc describe pod egress-dns-proxy
> Name:         egress-dns-proxy
> Namespace:    p1
> Node:         ip-172-18-6-68.ec2.internal/172.18.6.68
> Start Time:   Fri, 01 Jun 2018 11:14:11 -0400
> Labels:       name=egress-dns-proxy
> Annotations:  openshift.io/scc=privileged
>               pod.network.openshift.io/assign-macvlan=true
> Status:       Pending
> IP:           
> Init Containers:
>   egress-router-setup:
>     Container ID:   
>     Image:          openshift3/ose-egress-router
>     Image ID:       
>     Port:           <none>
>     Host Port:      <none>
>     State:          Waiting
>       Reason:       ImagePullBackOff
>     Ready:          False
>     Restart Count:  0
>     Environment:
>       EGRESS_SOURCE:       172.18.6.68
>       EGRESS_GATEWAY:      172.18.0.1
>       EGRESS_ROUTER_MODE:  dns-proxy
>     Mounts:
>       /var/run/secrets/kubernetes.io/serviceaccount from default-token-6dsqk
> (ro)
> Containers:
>   egress-dns-proxy:
>     Container ID:   
>     Image:          openshift3/ose-egress-dns-proxy
>     Image ID:       
>     Port:           <none>
>     Host Port:      <none>
>     State:          Waiting
>       Reason:       PodInitializing
>     Ready:          False
>     Restart Count:  0
>     Environment:
>       EGRESS_DNS_PROXY_DEBUG:        1
>       EGRESS_DNS_PROXY_DESTINATION:  80  www.baidu.com
> 
>     Mounts:
>       /var/run/secrets/kubernetes.io/serviceaccount from default-token-6dsqk
> (ro)
> Conditions:
>   Type           Status
>   Initialized    False 
>   Ready          False 
>   PodScheduled   True 
> Volumes:
>   default-token-6dsqk:
>     Type:        Secret (a volume populated by a Secret)
>     SecretName:  default-token-6dsqk
>     Optional:    false
> QoS Class:       BestEffort
> Node-Selectors:  node-role.kubernetes.io/compute=true
> Tolerations:     <none>
> Events:
>   Type     Reason          Age                From                          
> Message
>   ----     ------          ----               ----                          
> -------
>   Normal   Scheduled       29s                default-scheduler             
> Successfully assigned egress-dns-proxy to ip-172-18-6-68.ec2.internal
>   Normal   Pulling         13s (x2 over 24s)  kubelet,
> ip-172-18-6-68.ec2.internal  pulling image "openshift3/ose-egress-router"
>   Warning  Failed          13s (x2 over 24s)  kubelet,
> ip-172-18-6-68.ec2.internal  Failed to pull image
> "openshift3/ose-egress-router": rpc error: code = Unknown desc = repository
> docker.io/openshift3/ose-egress-router not found: does not exist or no pull
> access
>   Warning  Failed          13s (x2 over 24s)  kubelet,
> ip-172-18-6-68.ec2.internal  Error: ErrImagePull
>   Normal   BackOff         4s (x3 over 18s)   kubelet,
> ip-172-18-6-68.ec2.internal  Back-off pulling image
> "openshift3/ose-egress-router"
>   Warning  Failed          4s (x3 over 18s)   kubelet,
> ip-172-18-6-68.ec2.internal  Error: ImagePullBackOff
>   Normal   SandboxChanged  3s (x5 over 23s)   kubelet,
> ip-172-18-6-68.ec2.internal  Pod sandbox changed, it will be killed and
> re-created.
> [root@ip-172-18-11-114 ~]# cat test.yaml 
> apiVersion: v1
> kind: Pod
> metadata:
>   name: egress-dns-proxy
>   labels:
>     name: egress-dns-proxy
>   annotations:
>     pod.network.openshift.io/assign-macvlan: "true" 
> spec:
>   initContainers:
>   - name: egress-router-setup
>     imagePullPolicy:  IfNotPresent
>     image: openshift3/ose-egress-router
>     securityContext:
>       privileged: true
>     env:
>     - name: EGRESS_SOURCE
>       value: 172.18.6.68
>     - name: EGRESS_GATEWAY
>       value: 172.18.0.1
>     - name: EGRESS_ROUTER_MODE
>       value: dns-proxy
>   containers:
>   - name: egress-dns-proxy
>     image: openshift3/ose-egress-dns-proxy
>     imagePullPolicy:  IfNotPresent
>     env:
>     - name: EGRESS_DNS_PROXY_DEBUG
>       value: "1"
>     - name: EGRESS_DNS_PROXY_DESTINATION
>       value: |
>         80  www.baidu.com
> [root@ip-172-18-11-114 ~]# 
> [root@ip-172-18-11-114 ~]# 
> [root@ip-172-18-11-114 ~]# oc version
> oc v3.10.0-0.56.0
> kubernetes v1.10.0+b81c8f8
> features: Basic-Auth GSSAPI Kerberos SPNEGO
> 
> Server https://ip-172-18-11-114.ec2.internal:8443
> openshift v3.10.0-0.56.0
> kubernetes v1.10.0+b81c8f8
> [root@ip-172-18-11-114 ~]#

You can use the completed one like `registry.reg-aws.openshift.com:443/openshift3/ose-egress-router`, so this bug should not be a blocker I think.

Comment 16 Seth Jennings 2018-06-07 13:39:51 UTC

*** Bug 1588435 has been marked as a duplicate of this bug. ***

Comment 17 Seth Jennings 2018-06-07 15:29:22 UTC

*** Bug 1588467 has been marked as a duplicate of this bug. ***

Comment 20 Seth Jennings 2018-06-07 20:47:38 UTC

Origin PR:
https://github.com/openshift/origin/pull/19938

Comment 23 Thom Carlin 2018-06-09 01:24:56 UTC

For my 3.7 > 3.9 upgrade, I was able to workaround *this* issue by running on each OCP node:

docker pull registry.access.redhat.com/openshift3/ose-pod:v3.9.30

Comment 24 Alfredo Pizarro 2018-06-09 20:21:33 UTC

I was trying a new Openshift 3.9.30 installation and hit this bug. As a workarround, I set this values in the ansible inventory file:

oreg_url=registry.access.redhat.com/openshift3/ose-${component}:${version}
openshift_examples_modify_imagestreams=true

After this change, I could install properly.

Comment 25 Thom Carlin 2018-06-10 22:30:59 UTC

Alfredo: That was brilliant!  I was able to complete upgrading the control plane after adding your 2 values to /etc/ansible/hosts.  Thank you.

Comment 26 Hemant Kumar 2018-06-11 20:47:52 UTC

Do we need to update openshift docs to reflect this? I got stuck on same thing and going to retry with updated inventory file.

Comment 30 Mike Kinasz 2018-06-12 19:09:15 UTC

@Alfredo, you are a life saver.  Encountered the same issue on 3.9.30 advanced installation and adding the two lines to the inventory file looks like a solid workaround.  Thanks for posting!

Comment 32 DeShuai Ma 2018-06-13 14:51:29 UTC

Hi wjiang, Please verify the bug, thanks.

Comment 33 weiwei jiang 2018-06-14 03:24:22 UTC

Checked with 
# oc version 
oc v3.10.0-0.67.0
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-pod-310f2-master-etcd-1:8443
openshift v3.10.0-0.67.0
kubernetes v1.10.0+b81c8f8

And the issue can not be reproduced.

Comment 34 weiwei jiang 2018-06-14 09:08:55 UTC

(In reply to weiwei jiang from comment #33)
> Checked with 
> # oc version 
> oc v3.10.0-0.67.0
> kubernetes v1.10.0+b81c8f8
> features: Basic-Auth GSSAPI Kerberos SPNEGO
> 
> Server https://qe-pod-310f2-master-etcd-1:8443
> openshift v3.10.0-0.67.0
> kubernetes v1.10.0+b81c8f8
> 
> And the issue can not be reproduced.

Checked with 3.10.0-0.67.0 again, and found that the issue is still not fixed

As my test scenario use latest tag, so go into registry.access.redhat.com, and not internal registry.

Comment 35 weiwei jiang 2018-06-14 09:58:19 UTC

Since latest tag will go into registry.access.redhat.com, so seems docker.io will not be completed now. So I expect that image in registry.reg-aws.redhat.com:443 will be used but actually not now. So what's the current status for this issue.

Or something else come in and affect the test result?

Comment 36 weiwei jiang 2018-06-14 10:16:49 UTC

My scenarios:

// Pod is running, Image use registry.reg-aws.openshift.com:443
# oc run aoeuaoeuaa --image=registry.reg-aws.openshift.com:443/openshift3/metrics-heapster:v3.10.0-0.67.0 --command sleep 10d 

// Pod is ImagePullBackOff
# oc run aoeuaoaoeueuaa --image=openshift3/metrics-heapster:v3.10.0-0.67.0 --command sleep 10d

// Pod is running, Image use registry.access.redhat.com 
# oc run h --image=openshift3/metrics-heapster:latest --commmand sleep 10d

// Pod is running, Image use registry.reg-aws.openshift.com:443 
# oc run ha --image=registry.reg-aws.openshift.com:443/openshift3/metrics-heapster:latest  --command sleep 10d

Comment 39 Andrew McDermott 2018-06-14 13:17:00 UTC

(In reply to weiwei jiang from comment #36)
> My scenarios:
> 
> // Pod is running, Image use registry.reg-aws.openshift.com:443
> # oc run aoeuaoeuaa
> --image=registry.reg-aws.openshift.com:443/openshift3/metrics-heapster:v3.10.
> 0-0.67.0 --command sleep 10d 

This I expect to work because we use a fully qualified domain.

> 
> // Pod is ImagePullBackOff
> # oc run aoeuaoaoeueuaa --image=openshift3/metrics-heapster:v3.10.0-0.67.0
> --command sleep 10d

Assuming "docker.io" is not getting added to this unqualified image reference
I would expect this to fail because in the kubelet there will be no matching authorisation config that can ever match the unqualified image. The request will then use the lookup order in (projectatomic)/docker and fails because of the registry order in /etc/sysconfig/docker, which by default is:

  ADD_REGISTRY='--add-registry registry.reg-aws.openshift.com --add-registry registry.access.redhat.com'

If you change the config to (i.e., reverse the entries):

  ADD_REGISTRY='--add-registry registry.access.redhat.com --add-registry registry.reg-aws.openshift.com'

it will pull successfully as it will try from 'registry.reg-aws.openshift.com' first and the tag 'v3.10.0-0.67.0' exists there.

> 
> // Pod is running, Image use registry.access.redhat.com 
> # oc run h --image=openshift3/metrics-heapster:latest --commmand sleep 10d

This works because the default registry order is:

ADD_REGISTRY='--add-registry registry.reg-aws.openshift.com --add-registry registry.access.redhat.com'

and the image will be pulled from registry.access.redhat.com. As above, the acutal order seems to be the reverse of what is specified in the config file.

> 
> // Pod is running, Image use registry.reg-aws.openshift.com:443 
> # oc run ha
> --image=registry.reg-aws.openshift.com:443/openshift3/metrics-heapster:
> latest  --command sleep 10d

Expected to work because it is fully qualified and there will an auth config match for "registry.reg-aws.openshift.com" in the kubelet which will be passed to dockerd.

Comment 40 Seth Jennings 2018-06-14 15:11:17 UTC

Yes, this bug has gotten confusing.

Verification for this bz should simply be "does everything that worked in v3.10.0-0.47.0 (the release before the carry patch went in) still work in build after https://github.com/openshift/origin/pull/19938/commits goes in.

It doesn't seem there has been a build that includes https://github.com/openshift/origin/pull/19938 yet so this should still be MODIFIED

Comment 45 Jared Hocutt 2018-06-19 14:58:37 UTC

This workaround does not work for CNS/Gluster images as they do not seem to refer to the oreg_url value.

You must specify them explicitly:

openshift_storage_glusterfs_image=registry.access.redhat.com/rhgs3/rhgs-server-rhel7
openshift_storage_glusterfs_heketi_image=/registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7
openshift_storage_glusterfs_block_image=registry.access.redhat.com/rhgs3/rhgs-gluster-block-prov-rhel7

It seems that bug 1516534 is related.

Comment 46 Seth Jennings 2018-06-19 20:09:46 UTC

Looking at test case #2 on v3.10.1 I see:

Jun 19 19:54:55 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:55.975687856Z" level=debug msg="Calling POST /v1.26/images/create?fromImage=openshift3%2Fose-pod&tag=v3.10.1"
Jun 19 19:54:55 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:55.978711558Z" level=debug msg="Trying to pull registry.reg-aws.openshift.com/openshift3/ose-pod from https://registry.reg-aws.openshift.com v2"
Jun 19 19:54:56 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:56.099076645Z" level=debug msg="Trying to pull registry.reg-aws.openshift.com/openshift3/ose-pod from https://registry.reg-aws.openshift.com v1"
Jun 19 19:54:56 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:56.126659677Z" level=debug msg="[registry] Calling GET https://registry.reg-aws.openshift.com/v1/repositories/openshift3/ose-pod/images"
Jun 19 19:54:56 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:56.140041719Z" level=error msg="Not continuing with pull after error: Error: image openshift3/ose-pod:v3.10.1 not found"
Jun 19 19:54:56 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:56.206532663Z" level=debug msg="Trying to pull registry.access.redhat.com/openshift3/ose-pod from https://registry.access.redhat.com v2"
Jun 19 19:54:57 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:57.272963146Z" level=debug msg="Trying to pull registry.access.redhat.com/openshift3/ose-pod from https://registry.access.redhat.com v2"
Jun 19 19:54:57 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:57.839253462Z" level=debug msg="Trying to pull docker.io/openshift3/ose-pod from https://registry-1.docker.io v2"
Jun 19 19:54:57 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T19:54:57.891373379Z" level=info msg="Translating \"denied: requested access to the resource is denied\" to \"repository docker.io/openshift3/ose-pod not found: does not exist or no pull access\""

It is trying to pull from the v1 endpoint for reg-aws.  reg-aws uses v2.

# docker info | grep Registries
Insecure Registries:
Registries: registry.reg-aws.openshift.com (secure), registry.access.redhat.com (secure), registry.access.redhat.com (secure), docker.io (secure)

There is an attempt to pull from reg-aws, it just doesn't work when using an unqualified name.

When I change the pod to a fully qualified name, it uses v2:

Jun 19 20:05:38 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T20:05:38.939687267Z" level=debug msg="Calling POST /v1.26/images/create?fromImage=registry.reg-aws.openshift.com%2Fopenshift3%2Fose-pod&tag=v3.10.1"
Jun 19 20:05:38 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T20:05:38.979055029Z" level=debug msg="Trying to pull registry.reg-aws.openshift.com/openshift3/ose-pod from https://registry.reg-aws.openshift.com v2"
Jun 19 20:05:39 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T20:05:39.172655681Z" level=debug msg="Pulling ref from V2 registry: registry.reg-aws.openshift.com/openshift3/ose-pod:v3.10.1"
Jun 19 20:05:39 ip-172-18-12-11.ec2.internal dockerd-current[28297]: time="2018-06-19T20:05:39.174723581Z" level=debug msg="Calling GET /v1.26/images/registry.reg-aws.openshift.com/openshift3/ose-pod:v3.10.1/json"

I am going to install .47 (pre any docker.io changes) and see if this happens there as well.  If so, then this bz can be verified and we'll open a separate bz for this v1 vs v2 issue.

Comment 47 Seth Jennings 2018-06-19 22:03:41 UTC

This is a revert and can not introduce new issues.

I'm moving this back to ON_QA and please do not failedQA unless it can be demonstrated that a test succeeds for .47 or earlier but fails for .67 or later.

If a test fails both, then that is a new bug, not a reason to failedQA on this bug.

Comment 48 weiwei jiang 2018-06-20 05:46:33 UTC

(In reply to Seth Jennings from comment #47)
> This is a revert and can not introduce new issues.
> 
> I'm moving this back to ON_QA and please do not failedQA unless it can be
> demonstrated that a test succeeds for .47 or earlier but fails for .67 or
> later.
> 
> If a test fails both, then that is a new bug, not a reason to failedQA on
> this bug.

Checked with v3.10.1, and can not reproduce the original issue, so move to verified.

Comment 49 weiwei jiang 2018-06-20 07:46:14 UTC

Create a new bug to track the issue in https://bugzilla.redhat.com/show_bug.cgi?id=1583500#c46

Comment 50 weiwei jiang 2018-06-20 07:46:44 UTC

new bug https://bugzilla.redhat.com/show_bug.cgi?id=1593139

Comment 51 Jared Hocutt 2018-06-20 12:59:24 UTC

(In reply to Jared Hocutt from comment #45)
> This workaround does not work for CNS/Gluster images as they do not seem to
> refer to the oreg_url value.
> 
> You must specify them explicitly:
> 
> openshift_storage_glusterfs_image=registry.access.redhat.com/rhgs3/rhgs-
> server-rhel7
> openshift_storage_glusterfs_heketi_image=/registry.access.redhat.com/rhgs3/
> rhgs-volmanager-rhel7
> openshift_storage_glusterfs_block_image=registry.access.redhat.com/rhgs3/
> rhgs-gluster-block-prov-rhel7
> 
> It seems that bug 1516534 is related.

Accidentally had a "/" in front of registry.access.redhat.com. Here's the correct value for openshift_storage_glusterfs_heketi_image.

openshift_storage_glusterfs_heketi_image=registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7

Comment 53 errata-xmlrpc 2018-07-30 19:16:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816