Bug 1503860

Summary: system container install seems broken for 3.7
Product: OpenShift Container Platform Reporter: Ben Breard <bbreard>
Component: InstallerAssignee: Steve Milner <smilner>
Status: CLOSED ERRATA QA Contact: Gan Huang <ghuang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.0CC: aos-bugs, ghuang, gscrivan, jokerman, mmccomas, mpatel
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The logic for selecting the enterprise registry was moved to a location that which was never read when installing system containers. Consequence: Enterprise installs using system containers would fail as the ose image could not be found in the docker hub registry. Fix: Moved the enterprise registry logic into a high level playbook so that it is set for all runtime set ups. Result: The enterprise images can be found and installation works.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:17:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
installer log none

Description Ben Breard 2017-10-18 23:01:21 UTC
Created attachment 1340412 [details]
installer log

Description of problem:

I'm trying to use the system container install w/ CRI-O and there seems to be some issues recognizing the system container variables.

CRI-O & container-engine images pull and run fine.

The installer fails on `docker run --rm openshift3/ose:v3.7 version` which doesn't make any sense because this container needs to be pulled w/ atomic pull to store in ostree.   


Version-Release number of the following components:

commit 8ef9ee879216d3ce0309daf980f06817a8489f30 (HEAD -> master, origin/master, origin/HEAD)

same result on rhel & fedora

rpm -q openshift-ansible
rpm -q ansible
ansible-2.4.0.0-1.fc26.noarch

ansible --version
ansible 2.4.0.0
  config file = /home/bbreard/src/atomic_demo/openshift-ansible/ansible.cfg
  configured module search path = [u'/home/bbreard/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.13 (default, Sep  5 2017, 08:53:59) [GCC 7.1.1 20170622 (Red Hat 7.1.1-3)]



ansible-2.3.2.0-2.el7.noarch
ansible --version
ansible 2.3.2.0
  config file = /root/openshift-ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]



How reproducible:

Steps to Reproduce:
1. 2x rhel atomic host VMs (1 master 1 node)
2. use the following inventory:
# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root

# If ansible_ssh_user is not root, ansible_become must be set to true
#ansible_become=true
openshift_release=v3.7
openshift_disable_check=disk_availability,memory_availability,docker_storage
#openshift_deployment_type=origin
deployment_type=openshift-enterprise
openshift_deployment_type=openshift-enterprise

containerized=False

openshift_docker_insecure_registries="brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888"
openshift_use_system_containers=True
openshift_crio_systemcontainer_image_override="brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/cri-o:1.0.0"
system_images_registry="brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888"
openshift_use_crio=True
openshift_use_etcd_system_container=True
openshift_use_openvswitch_system_container=True
openshift_use_node_system_container=True
openshift_use_master_system_container=True
openshift_docker_use_system_container=True
openshift_docker_systemcontainer_image_override="brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/container-engine:v3.7"

openshift_router_selector='region=primary'
openshift_registry_selector='region=primary'

# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
#openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
[etcd]
atomic-node0.example.com

# host group for masters
[masters]
atomic-node0.example.com

# host group for nodes, includes region info
[nodes]
atomic-node0.example.com schedulable=True
atomic-node1.example.com openshift_node_labels="{'region': 'primary', 'zone': 'east'}"


3. fails 



Actual results:

fatal: [atomic-node0.example.com]: FAILED! => {
    "changed": true, 
    "cmd": [
        "docker", 
        "run", 
        "--rm", 
        "openshift3/ose:v3.7", 
        "version"
    ], 
    "delta": "0:00:00.984461", 
    "end": "2017-10-18 21:48:12.220495", 
    "failed": true, 
    "invocation": {
        "module_args": {
            "_raw_params": "docker run --rm openshift3/ose:v3.7 version", 
            "_uses_shell": false, 
            "chdir": null, 
            "creates": null, 
            "executable": null, 
            "removes": null, 
            "stdin": null, 
            "warn": true
        }
    }, 
    "msg": "non-zero return code", 
    "rc": 125, 
    "start": "2017-10-18 21:48:11.236034", 
    "stderr": "Unable to find image 'openshift3/ose:v3.7' locally\nTrying to pull repository docker.io/openshift3/ose ... \n/usr/bin/docker-current: unauthorized: authentication required.\nSee '/usr/bin/docker-current run --help'.", 
    "stderr_lines": [
        "Unable to find image 'openshift3/ose:v3.7' locally", 
        "Trying to pull repository docker.io/openshift3/ose ... ", 
        "/usr/bin/docker-current: unauthorized: authentication required.", 
        "See '/usr/bin/docker-current run --help'."
    ], 
    "stdout": "", 
    "stdout_lines": []
}

Expected results:

atomic pull should be used and the image should pull from the system_images_registry defined in the inventory file. 

What am I missing?

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Steve Milner 2017-10-19 15:45:51 UTC
Setting openshift_image_tag (EX: openshift_image_tag=v3.6.0) seems to get past the issue of checking. Normally that use of docker is actually expected as it's part of openshift_version which is trying to sense what the latest image tag is since it wasn't provided.

However, the real problem looks like the ose image is being looked at on the wrong registry. This isn't related specifically to system containers but I'll look to see if I can find out why.

Comment 2 Steve Milner 2017-10-19 16:31:56 UTC
PR: https://github.com/openshift/openshift-ansible/pull/5818

Michael Gugino noted that the enterprise registry was only being added to package installs. The above PR moves that logic into the higher level main file which will run on package or container.

Comment 3 Gan Huang 2017-10-20 05:50:05 UTC
We (QE) always have `openshift_docker_additional_registries` set in inventory host file:

## for docker related configurations
openshift_docker_additional_registries=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888
openshift_docker_insecure_registries=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888

## for etcd/node/master/openvswitch system containers configurations
openshift_use_system_containers=True
system_images_registry="brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888"

## for cri-o system container configurations
openshift_use_crio=True
openshift_crio_systemcontainer_image_override="brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/cri-o:1.0.0"

## for docker system container configurations
openshift_docker_use_system_container=True
openshift_docker_systemcontainer_image_override="brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/container-engine:v3.7"

Currently we still require docker running even if you enable cri-o, so I think openshift_docker* is still needed if you're using a non-enterprise registry.

Once we completely remove docker from OpenShift, we could safely remove openshift_docker*.

Comment 4 Steve Milner 2017-10-23 13:32:39 UTC
Merged.

Comment 6 Ben Breard 2017-10-24 22:52:34 UTC
Confirmed. The combination of this fix plus the following options fixes the issue.

## for docker related configurations
openshift_docker_additional_registries=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888
openshift_docker_insecure_registries=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888

Thanks

Comment 7 Gan Huang 2017-10-25 08:49:27 UTC
Thanks for confirm.

Tested in openshift-ansible-3.7.0-0.176.0.git.0.eec12b8.el7.noarch.rp

enterprise registry is added by default while pulling `openshift3/ose3` image

Comment 10 errata-xmlrpc 2017-11-28 22:17:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188