Created attachment 1325275 [details] ansible running log Description of problem: GCE IAAS, deploy metrics 3.7 via ansible, it faied at the following step: STDERR: The connection to the server ${MASTER_URL}:8443 was refused - did you specify the right host or port? to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.retry Checked, the root cause is when deploying metrics, "--add-registry registry.ops.openshift.com " was removed from ADD_REGISTRY part in /etc/sysconfig/docker, when docker was restarted during the installation, it caused atomic-openshift-master-api.service failed to start up, then caused metrics installation failed. Attached ansible log Version-Release number of selected component (if applicable): docker version: docker-1.12.6-55.gitc4618fb.el7.x86_64 # rpm -qa | grep openshift-ansible openshift-ansible-3.7.0-0.126.0.git.0.33d254a.el7.noarch openshift-ansible-roles-3.7.0-0.126.0.git.0.33d254a.el7.noarch openshift-ansible-docs-3.7.0-0.126.0.git.0.33d254a.el7.noarch openshift-ansible-filter-plugins-3.7.0-0.126.0.git.0.33d254a.el7.noarch openshift-ansible-lookup-plugins-3.7.0-0.126.0.git.0.33d254a.el7.noarch openshift-ansible-playbooks-3.7.0-0.126.0.git.0.33d254a.el7.noarch openshift-ansible-callback-plugins-3.7.0-0.126.0.git.0.33d254a.el7.noarch How reproducible: Always Steps to Reproduce: 1. Deploy logging 3.7 via ansible on GCE IAAS, inventory file see the [Additional info] part 2. 3. Actual results: Failed to install metrics 3.7 Expected results: Should install metrics 3.7 successfully. Additional info: # Inventory file [OSEv3:children] masters etcd [masters] ${MASTER_URL} openshift_public_hostname=${MASTER_URL} [etcd] ${MASTER_URL} openshift_public_hostname=${MASTER_URL} [OSEv3:vars] ansible_ssh_user=root ansible_ssh_private_key_file="~/libra.pem" deployment_type=openshift-enterprise # Metrics openshift_metrics_install_metrics=true openshift_metrics_hawkular_hostname=hawkular-metrics.${SUB_DOMAIN} openshift_metrics_project=openshift-infra openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/ openshift_metrics_image_version=v3.7
Created attachment 1325276 [details] /etc/sysconfig/docker file before and after metrics is deployed
Blocks all metrics installation.
The error message doesn't have anything to do with metrics components but an issue with the registry installation. I am re-assigning this to the installer component.
Junqi, I believe based on the description you set them in /etc/sysconfig/docker manually? In general the installer manages the list of additional, insecure, and blocked registries so you'll want to set variables like the following in your inventory. openshift_docker_additional_registries=registry.ops.openshift.com openshift_docker_insecure_registries=registry.ops.openshift.com I recognize that this may odd that docker configuration may be applied while calling the metrics installation playbooks but the role dependencies today are such that we ensure that docker is configured properly no matter which playbook you call so we need to make sure that proper inventory variables are set in all cases. Does that help?
(In reply to Scott Dodson from comment #4) > Junqi, > > I believe based on the description you set them in /etc/sysconfig/docker > manually? No, I did not set it manually, > In general the installer manages the list of additional, insecure, and > blocked registries so you'll want to set variables like the following in > your inventory. > > openshift_docker_additional_registries=registry.ops.openshift.com > openshift_docker_insecure_registries=registry.ops.openshift.com I used template to build my jobs: http://git.app.eng.bos.redhat.com/git/openshift-misc.git/plain/v3-launch-templates/functionality-testing/aos-37/vars-gce/vars.ose37-container-ah7-gcs_registry-gce-cloudprovider registry.ops.openshift.com already in openshift_docker_additional_registries and openshift_docker_insecure_registries After OCP was built successfully, I checked "--add-registry registry.ops.openshift.com" is in ADD_REGISTRY from /etc/sysconfig/docker , but when I deployed metrics via ansible, it removed from ADD_REGISTRY, I think it's related to metrics playbooks. I checked logs in the attached file "ansible running log", and found the following, do you think ("line": "ADD_REGISTRY='--add-registry registry.access.redhat.com') is the cause? "invocation": { "module_args": { "attributes": null, "backrefs": false, "backup": false, "content": null, "create": false, "delimiter": null, "dest": "/etc/sysconfig/docker", "directory_mode": null, "follow": false, "force": null, "group": null, "insertafter": null, "insertbefore": null, "line": "ADD_REGISTRY='--add-registry registry.access.redhat.com'", "mode": null, "owner": null, "path": "/etc/sysconfig/docker", "regexp": "^ADD_REGISTRY=.*$", "remote_src": null, "selevel": null, "serole": null, "setype": null, "seuser": null, "src": null, "state": "present", "unsafe_writes": null, "validate": null }
same error with logging, attached journal logs
Created attachment 1325703 [details] journal log
I can confirm that openshift_docker_additional_registries works as previous. openshift_docker_additional_registries must be specified each time you run plays. It appears that the lack of this variable inside your inventory has caused the registry to be removed, which is expected current behavior.
From OCP 3.7, we should add openshift_docker_additional_registries in inventory file, otherwise it will throw out error like this defect reported. Added this parameter in inventory, and logging and metrics installation were successful. Close it as NOTABUG.