Bug 1588467 - OCP 3.9.30 deployment(deploy_cluster.yaml) tries to pull images from docker.io instead of registry.access.redhat.com
Summary: OCP 3.9.30 deployment(deploy_cluster.yaml) tries to pull images from docker.i...
Keywords:
Status: CLOSED DUPLICATE of bug 1583500
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.9.z
Assignee: Seth Jennings
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-07 12:00 UTC by Neha Berry
Modified: 2018-06-14 00:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-07 15:29:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Neha Berry 2018-06-07 12:00:25 UTC
Description of problem:
++++++++++++++++++++++++++++++

I was trying to install the latest LIVE version of OCP 3.9, i.e. 3.9.30 using advanced installation. It is seen that even though the playbook run for deploy_cluster.yaml completed with no error message, none of the following pods were in Running state. ALl the 3 pods in default namespace were in ImagePullBackOff stateas thye were trying to pull image from docker.io instead of registry.access.redhat.com.


[root@dhcp42-137 ~]# oc get pods
NAME                        READY     STATUS             RESTARTS   AGE
docker-registry-1-deploy    0/1       ErrImagePull       0          1h
registry-console-1-deploy   0/1       ImagePullBackOff   0          1h
router-1-deploy             0/1       ImagePullBackOff   0          1h

error message
+++++++++++++++

#oc describe pod docker-registry-1-deploy

Events:
  Type     Reason                  Age                  From                                       Message
  ----     ------                  ----                 ----                                       -------
  Normal   Scheduled               48m                  default-scheduler                          Successfully assigned docker-registry-1-deploy to dhcp42-86.lab.eng.blr.redhat.com
  Normal   SuccessfulMountVolume   48m                  kubelet, dhcp42-86.lab.eng.blr.redhat.com  MountVolume.SetUp succeeded for volume "deployer-token-fc2gz"
  Warning  FailedCreatePodSandBox  18m (x109 over 48m)  kubelet, dhcp42-86.lab.eng.blr.redhat.com  Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "openshift3/ose-pod:v3.9.30": repository docker.io/openshift3/ose-pod not found: does not exist or no pull access
  Normal   SandboxChanged          13m (x18 over 14m)   kubelet, dhcp42-86.lab.eng.blr.redhat.com  Pod sandbox changed, it will be killed and re-created.
  Normal   BackOff                 7m (x19 over 12m)    kubelet, dhcp42-86.lab.eng.blr.redhat.com  Back-off pulling image "openshift3/ose-deployer:v3.9.30"
  Warning  Failed                  3m (x40 over 12m)    kubelet, dhcp42-86.lab.eng.blr.redhat.com  Error: ImagePullBackOff
[root@dhcp42-137 ~]# oc describe pod registry-console-1-deploy

----------------------

# oc describe pod router-1-deploy

  Normal   Scheduled               48m                  default-scheduler                          Successfully assigned router-1-deploy to dhcp42-86.lab.eng.blr.redhat.com
  Normal   SuccessfulMountVolume   48m                  kubelet, dhcp42-86.lab.eng.blr.redhat.com  MountVolume.SetUp succeeded for volume "deployer-token-fc2gz"
  Warning  FailedCreatePodSandBox  18m (x110 over 48m)  kubelet, dhcp42-86.lab.eng.blr.redhat.com  Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "openshift3/ose-pod:v3.9.30": repository docker.io/openshift3/ose-pod not found: does not exist or no pull access
  Normal   Pulling                 13m (x3 over 14m)    kubelet, dhcp42-86.lab.eng.blr.redhat.com  pulling image "openshift3/ose-deployer:v3.9.30"
  Warning  Failed                  3m (x44 over 14m)    kubelet, dhcp42-86.lab.eng.blr.redhat.com  Error: ImagePullBackOff


Note:
++++++++++

1. We used the same ansible inventory file which had lead to a successful OCP deployment in v3.9.25. But in current scenario, the images for the pods are tried to be pulled from docker.io instead of the registry.access.redhat.com ( which is added in the /etc/containers/registries.conf).

2. By default, a manual "docker pull" of image pulls it from registry.access.redhat.com and not from docker.io . Hence, not sure why the playbook tries to pull it from a different registry.

E.g. 
++++++++
#docker pull openshift3/registry-console:v3.9
Trying to pull repository registry.access.redhat.com/openshift3/registry-console ... 
v3.9: Pulling from registry.access.redhat.com/openshift3/registry-console
e0f71f706c2a: Already exists 
121ab4741000: Already exists 
9be651b88329: Pull complete 
Digest: sha256:e32ea83f298df4615592c9f96aba95e4415ea99dc2c4466fc480590b1baddd57
Status: Downloaded newer image for registry.access.redhat.com/openshift3/registry-console:v3.9

-------------------

# docker pull openshift3/ose-pod:v3.9.30
Trying to pull repository registry.access.redhat.com/openshift3/ose-pod ... 
v3.9.30: Pulling from registry.access.redhat.com/openshift3/ose-pod
e0f71f706c2a: Pull complete 
121ab4741000: Pull complete 
9988e1f7ff11: Pull complete 
Digest: sha256:388ede198262b7fb97afd7ab04235e4cb3f841ad2e5cbe2de452a0db16a5d973
Status: Downloaded newer image for registry.access.redhat.com/openshift3/ose-pod:v3.9.30


---------------------------------

Version-Release number of selected component (if applicable):
+++++++++++++++++++

[root@dhcp42-137 ~]# rpm -qa| grep openshift
openshift-ansible-3.9.30-1.git.7.46f8678.el7.noarch
atomic-openshift-docker-excluder-3.9.30-1.git.0.dec1ba7.el7.noarch
atomic-openshift-master-3.9.30-1.git.0.dec1ba7.el7.x86_64
atomic-openshift-node-3.9.30-1.git.0.dec1ba7.el7.x86_64
openshift-ansible-roles-3.9.30-1.git.7.46f8678.el7.noarch
openshift-ansible-docs-3.9.30-1.git.7.46f8678.el7.noarch
atomic-openshift-utils-3.9.30-1.git.7.46f8678.el7.noarch
atomic-openshift-clients-3.9.30-1.git.0.dec1ba7.el7.x86_64
atomic-openshift-excluder-3.9.30-1.git.0.dec1ba7.el7.noarch
openshift-ansible-playbooks-3.9.30-1.git.7.46f8678.el7.noarch
atomic-openshift-3.9.30-1.git.0.dec1ba7.el7.x86_64
atomic-openshift-sdn-ovs-3.9.30-1.git.0.dec1ba7.el7.x86_64

-------------------------

[root@dhcp42-137 ~]# oc version
oc v3.9.30
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp42-137.lab.eng.blr.redhat.com:8443
openshift v3.9.30
kubernetes v1.9.1+a0ce1bc657
[root@dhcp42-137 ~]# 

---------------------------


How reproducible:
1x1

Steps to Reproduce:
1. Configure setup for OCP 3.9 live installation

subscription-manager repos     --enable="rhel-7-server-rpms"     --enable="rhel-7-server-extras-rpms"     --enable="rhel-7-server-ose-3.9-rpms"     --enable="rhel-7-fast-datapath-rpms"     --enable="rhel-7-server-ansible-2.4-rpms"

yum install atomic-openshift-utils

2. Ran the pre-reqiuisities.yaml
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml

3. Started the deploy_cluster.yml and it completed
ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

4. It is seen from ansible.log that the task "TASK [openshift_hosted : Poll for OpenShift pod deployment success]" timed out waiting for "FAILED - RETRYING: Poll for OpenShift pod deployment success (1 retries left)"

5. Once OCP is installed, checked the pod status in default namespace. All were in error state 

[root@dhcp42-137 ~]# oc get pods
NAME                        READY     STATUS             RESTARTS   AGE
docker-registry-1-deploy    0/1       ImagePullBackOff   0          1h
registry-console-1-deploy   0/1       ImagePullBackOff   0          46m
router-1-deploy             0/1       ImagePullBackOff   0          1h


error message
++++++++++++++
    Normal   SuccessfulMountVolume   45m                  kubelet, dhcp42-86.lab.eng.blr.redhat.com  MountVolume.SetUp succeeded for volume "deployer-token-fc2gz"
  Warning  FailedCreatePodSandBox  14m (x109 over 44m)  kubelet, dhcp42-86.lab.eng.blr.redhat.com  Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "openshift3/ose-pod:v3.9.30": repository docker.io/openshift3/ose-pod not found: does not exist or no pull access



Actual results:
++++++++++++++++++++

The images for Registry-console, docker-registry and router pods are tried to be pulled from docker.io 

Expected results:
++++++++++++++++++++++++

As seen in previous builds, the image should have been pulled from  registry.access.redhat.com 



Additional info
++++++++++++++++++
Registries.conf file from the nodes
++++++++++++++++++

[root@dhcp42-137 ~]# cat /etc/containers/registries.conf
# This is a system-wide configuration file used to
# keep track of registries for various container backends.
# It adheres to TOML format and does not support recursive
# lists of registries.

# The default location for this configuration file is /etc/containers/registries.conf.

# The only valid categories are: 'registries.search', 'registries.insecure', 
# and 'registries.block'.

[registries.search]
registries = ['registry.access.redhat.com','brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888']

# If you need to access insecure registries, add the registry's fully-qualified name.
# An insecure registry is one that does not have a valid SSL certificate or only does HTTP.
[registries.insecure]
registries = ['brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888']


# If you need to block pull access from a registry, uncomment the section below
# and add the registries fully-qualified name.
#
# Docker only
[registries.block]
registries = []


Note You need to log in before you can comment on or make changes to this bug.