Bug 1491171 - Failed to install metrics/logging 3.7, due to "--add-registry registry.ops.openshift.com " was removed from ADD_REGISTRY part in /etc/sysconfig/docker during installation
Summary: Failed to install metrics/logging 3.7, due to "--add-registry registry.ops.op...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.7.0
Assignee: Michael Gugino
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-13 08:48 UTC by Junqi Zhao
Modified: 2017-09-15 03:20 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-15 03:20:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ansible running log (558.42 KB, text/plain)
2017-09-13 08:48 UTC, Junqi Zhao
no flags Details
/etc/sysconfig/docker file before and after metrics is deployed (3.12 KB, text/plain)
2017-09-13 08:49 UTC, Junqi Zhao
no flags Details
journal log (14.54 MB, text/plain)
2017-09-14 04:17 UTC, Junqi Zhao
no flags Details

Description Junqi Zhao 2017-09-13 08:48:11 UTC
Created attachment 1325275 [details]
ansible running log

Description of problem:
GCE IAAS, deploy metrics 3.7 via ansible, it faied at the following step:
STDERR:

The connection to the server ${MASTER_URL}:8443 was refused - did you specify the right host or port?
    to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.retry

Checked, the root cause is when deploying metrics, "--add-registry registry.ops.openshift.com " was removed from ADD_REGISTRY part in /etc/sysconfig/docker, when docker was restarted during the installation, it caused atomic-openshift-master-api.service failed to start up, then caused metrics installation failed.

Attached ansible log

Version-Release number of selected component (if applicable):
docker version: docker-1.12.6-55.gitc4618fb.el7.x86_64

# rpm -qa | grep openshift-ansible
openshift-ansible-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-roles-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-docs-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-filter-plugins-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-lookup-plugins-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-playbooks-3.7.0-0.126.0.git.0.33d254a.el7.noarch
openshift-ansible-callback-plugins-3.7.0-0.126.0.git.0.33d254a.el7.noarch


How reproducible:
Always

Steps to Reproduce:
1. Deploy logging 3.7 via ansible on GCE IAAS, inventory file see the [Additional info] part
2.
3.

Actual results:
Failed to install metrics 3.7

Expected results:
Should install metrics 3.7 successfully.

Additional info:
# Inventory file
[OSEv3:children]
masters
etcd

[masters]
${MASTER_URL} openshift_public_hostname=${MASTER_URL}

[etcd]
${MASTER_URL} openshift_public_hostname=${MASTER_URL}

[OSEv3:vars]
ansible_ssh_user=root
ansible_ssh_private_key_file="~/libra.pem"
deployment_type=openshift-enterprise

# Metrics
openshift_metrics_install_metrics=true
openshift_metrics_hawkular_hostname=hawkular-metrics.${SUB_DOMAIN}
openshift_metrics_project=openshift-infra
openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=v3.7

Comment 1 Junqi Zhao 2017-09-13 08:49:42 UTC
Created attachment 1325276 [details]
/etc/sysconfig/docker file before and after metrics is deployed

Comment 2 Junqi Zhao 2017-09-13 08:51:37 UTC
Blocks all metrics installation.

Comment 3 Matt Wringe 2017-09-13 18:42:59 UTC
The error message doesn't have anything to do with metrics components but an issue with the registry installation. I am re-assigning this to the installer component.

Comment 4 Scott Dodson 2017-09-13 19:04:46 UTC
Junqi,

I believe based on the description you set them in /etc/sysconfig/docker manually?

In general the installer manages the list of additional, insecure, and blocked registries so you'll want to set variables like the following in your inventory.

openshift_docker_additional_registries=registry.ops.openshift.com
openshift_docker_insecure_registries=registry.ops.openshift.com

I recognize that this may odd that docker configuration may be applied while calling the metrics installation playbooks but the role dependencies today are such that we ensure that docker is configured properly no matter which playbook you call so we need to make sure that proper inventory variables are set in all cases.

Does that help?

Comment 5 Junqi Zhao 2017-09-14 00:22:09 UTC
(In reply to Scott Dodson from comment #4)
> Junqi,
> 
> I believe based on the description you set them in /etc/sysconfig/docker
> manually?
  No, I did not set it manually, 
> In general the installer manages the list of additional, insecure, and
> blocked registries so you'll want to set variables like the following in
> your inventory.
> 
> openshift_docker_additional_registries=registry.ops.openshift.com
> openshift_docker_insecure_registries=registry.ops.openshift.com

I used template to build my jobs:
http://git.app.eng.bos.redhat.com/git/openshift-misc.git/plain/v3-launch-templates/functionality-testing/aos-37/vars-gce/vars.ose37-container-ah7-gcs_registry-gce-cloudprovider

registry.ops.openshift.com already in openshift_docker_additional_registries and 
openshift_docker_insecure_registries

After OCP was built successfully, I checked "--add-registry registry.ops.openshift.com" is in ADD_REGISTRY from /etc/sysconfig/docker
, but when I deployed metrics via ansible, it removed from ADD_REGISTRY, I think  it's related to metrics playbooks.

I checked logs in the attached file "ansible running log", and found the following, do you think
("line": "ADD_REGISTRY='--add-registry registry.access.redhat.com') is the cause?
    "invocation": {
        "module_args": {
            "attributes": null, 
            "backrefs": false, 
            "backup": false, 
            "content": null, 
            "create": false, 
            "delimiter": null, 
            "dest": "/etc/sysconfig/docker", 
            "directory_mode": null, 
            "follow": false, 
            "force": null, 
            "group": null, 
            "insertafter": null, 
            "insertbefore": null, 
            "line": "ADD_REGISTRY='--add-registry registry.access.redhat.com'", 
            "mode": null, 
            "owner": null, 
            "path": "/etc/sysconfig/docker", 
            "regexp": "^ADD_REGISTRY=.*$", 
            "remote_src": null, 
            "selevel": null, 
            "serole": null, 
            "setype": null, 
            "seuser": null, 
            "src": null, 
            "state": "present", 
            "unsafe_writes": null, 
            "validate": null
        }

Comment 7 Junqi Zhao 2017-09-14 04:16:01 UTC
same error with logging, attached journal logs

Comment 8 Junqi Zhao 2017-09-14 04:17:17 UTC
Created attachment 1325703 [details]
journal log

Comment 9 Michael Gugino 2017-09-14 14:07:19 UTC
I can confirm that openshift_docker_additional_registries works as previous.

openshift_docker_additional_registries must be specified each time you run plays.  It appears that the lack of this variable inside your inventory has caused the registry to be removed, which is expected current behavior.

Comment 10 Junqi Zhao 2017-09-15 03:20:35 UTC
From OCP 3.7, we should add openshift_docker_additional_registries in inventory file, otherwise it will throw out error like this defect reported.

Added this parameter in inventory, and logging and metrics installation were successful.

Close it as NOTABUG.


Note You need to log in before you can comment on or make changes to this bug.