Bug 1491171

Summary: Failed to install metrics/logging 3.7, due to "--add-registry registry.ops.openshift.com " was removed from ADD_REGISTRY part in /etc/sysconfig/docker during installation
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: InstallerAssignee: Michael Gugino <mgugino>
Status: CLOSED NOTABUG QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.7.0CC: aos-bugs, jokerman, juzhao, mmccomas
Target Milestone: ---Keywords: TestBlocker
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-15 03:20:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ansible running log
none
/etc/sysconfig/docker file before and after metrics is deployed
none
journal log none

Description Junqi Zhao 2017-09-13 08:48:11 UTC
Created attachment 1325275 [details]
ansible running log

Description of problem:
GCE IAAS, deploy metrics 3.7 via ansible, it faied at the following step:
STDERR:

The connection to the server ${MASTER_URL}:8443 was refused - did you specify the right host or port?
    to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.retry

Checked, the root cause is when deploying metrics, "--add-registry registry.ops.openshift.com " was removed from ADD_REGISTRY part in /etc/sysconfig/docker, when docker was restarted during the installation, it caused atomic-openshift-master-api.service failed to start up, then caused metrics installation failed.

Attached ansible log

Version-Release number of selected component (if applicable):
docker version: docker-1.12.6-55.gitc4618fb.el7.x86_64

# rpm -qa | grep openshift-ansible
openshift-ansible-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-roles-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-docs-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-filter-plugins-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-lookup-plugins-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-playbooks-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-callback-plugins-3.7.0-0.126.0.git.0.33d254a.el7.noarch


How reproducible:
Always

Steps to Reproduce:
1. Deploy logging 3.7 via ansible on GCE IAAS, inventory file see the [Additional info] part
2.
3.

Actual results:
Failed to install metrics 3.7

Expected results:
Should install metrics 3.7 successfully.

Additional info:
# Inventory file
[OSEv3:children]
masters
etcd

[masters]
${MASTER_URL} openshift_public_hostname=${MASTER_URL}

[etcd]
${MASTER_URL} openshift_public_hostname=${MASTER_URL}

[OSEv3:vars]
ansible_ssh_user=root
ansible_ssh_private_key_file="~/libra.pem"
deployment_type=openshift-enterprise

# Metrics
openshift_metrics_install_metrics=true
openshift_metrics_hawkular_hostname=hawkular-metrics.${SUB_DOMAIN}
openshift_metrics_project=openshift-infra
openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=v3.7

Comment 1 Junqi Zhao 2017-09-13 08:49:42 UTC
Created attachment 1325276 [details]
/etc/sysconfig/docker file before and after metrics is deployed

Comment 2 Junqi Zhao 2017-09-13 08:51:37 UTC
Blocks all metrics installation.

Comment 3 Matt Wringe 2017-09-13 18:42:59 UTC
The error message doesn't have anything to do with metrics components but an issue with the registry installation. I am re-assigning this to the installer component.

Comment 4 Scott Dodson 2017-09-13 19:04:46 UTC
Junqi,

I believe based on the description you set them in /etc/sysconfig/docker manually?

In general the installer manages the list of additional, insecure, and blocked registries so you'll want to set variables like the following in your inventory.

openshift_docker_additional_registries=registry.ops.openshift.com
openshift_docker_insecure_registries=registry.ops.openshift.com

I recognize that this may odd that docker configuration may be applied while calling the metrics installation playbooks but the role dependencies today are such that we ensure that docker is configured properly no matter which playbook you call so we need to make sure that proper inventory variables are set in all cases.

Does that help?

Comment 5 Junqi Zhao 2017-09-14 00:22:09 UTC
(In reply to Scott Dodson from comment #4)
> Junqi,
> 
> I believe based on the description you set them in /etc/sysconfig/docker
> manually?
  No, I did not set it manually, 
> In general the installer manages the list of additional, insecure, and
> blocked registries so you'll want to set variables like the following in
> your inventory.
> 
> openshift_docker_additional_registries=registry.ops.openshift.com
> openshift_docker_insecure_registries=registry.ops.openshift.com

I used template to build my jobs:
http://git.app.eng.bos.redhat.com/git/openshift-misc.git/plain/v3-launch-templates/functionality-testing/aos-37/vars-gce/vars.ose37-container-ah7-gcs_registry-gce-cloudprovider

registry.ops.openshift.com already in openshift_docker_additional_registries and 
openshift_docker_insecure_registries

After OCP was built successfully, I checked "--add-registry registry.ops.openshift.com" is in ADD_REGISTRY from /etc/sysconfig/docker
, but when I deployed metrics via ansible, it removed from ADD_REGISTRY, I think  it's related to metrics playbooks.

I checked logs in the attached file "ansible running log", and found the following, do you think
("line": "ADD_REGISTRY='--add-registry registry.access.redhat.com') is the cause?
    "invocation": {
        "module_args": {
            "attributes": null, 
            "backrefs": false, 
            "backup": false, 
            "content": null, 
            "create": false, 
            "delimiter": null, 
            "dest": "/etc/sysconfig/docker", 
            "directory_mode": null, 
            "follow": false, 
            "force": null, 
            "group": null, 
            "insertafter": null, 
            "insertbefore": null, 
            "line": "ADD_REGISTRY='--add-registry registry.access.redhat.com'", 
            "mode": null, 
            "owner": null, 
            "path": "/etc/sysconfig/docker", 
            "regexp": "^ADD_REGISTRY=.*$", 
            "remote_src": null, 
            "selevel": null, 
            "serole": null, 
            "setype": null, 
            "seuser": null, 
            "src": null, 
            "state": "present", 
            "unsafe_writes": null, 
            "validate": null
        }

Comment 7 Junqi Zhao 2017-09-14 04:16:01 UTC
same error with logging, attached journal logs

Comment 8 Junqi Zhao 2017-09-14 04:17:17 UTC
Created attachment 1325703 [details]
journal log

Comment 9 Michael Gugino 2017-09-14 14:07:19 UTC
I can confirm that openshift_docker_additional_registries works as previous.

openshift_docker_additional_registries must be specified each time you run plays.  It appears that the lack of this variable inside your inventory has caused the registry to be removed, which is expected current behavior.

Comment 10 Junqi Zhao 2017-09-15 03:20:35 UTC
From OCP 3.7, we should add openshift_docker_additional_registries in inventory file, otherwise it will throw out error like this defect reported.

Added this parameter in inventory, and logging and metrics installation were successful.

Close it as NOTABUG.