Description of problem: see the following details. Version-Release number of selected component (if applicable): openshift3/ose:v3.1.1.11 (374c01d0b0c4) openshift-ansible-playbooks-3.0.98-1.git.0.248113a.el7aos.noarch How reproducible: Always Steps to Reproduce: 1. prepare inventory host file to trigger a containerized install 2. trigger installation 3. Actual results: installer exit at the following step: TASK: [openshift_registry | Deploy OpenShift Registry] ************************ failed: [xxxx.redhat.com] => {"changed": true, "cmd": ["/usr/local/bin/oadm", "registry", "--create", "--replicas=1", "--service-account=registry", "--selector=region=infra", "--credentials=/etc/origin/master/openshift-registry.kubeconfig", "--images=registry.access.stage.redhat.com/openshift3/ose-${component}:${version}"], "delta": "0:00:00.554250", "end": "2017-04-14 03:26:22.395568", "rc": 1, "start": "2017-04-14 03:26:21.841318", "stdout_lines": [], "warnings": []} stderr: ================================================================================ ATTENTION: You are running oadm via a wrapper around 'docker run openshift3/ose:v3.1.1.11'. This wrapper is intended only to be used to bootstrap an environment. Please install client tools on another host once you have granted cluster-admin privileges to a user. See https://docs.openshift.com/enterprise/latest/cli_reference/get_started_cli.html ================================================================================= error: registry does not exist; the provided credentials "/etc/origin/master/openshift-registry.kubeconfig" could not be loaded: stat /etc/origin/master/openshift-registry.kubeconfig: no such file or directory FATAL: all hosts have already failed -- aborting PLAY RECAP ******************************************************************** to retry, use: --limit @/home/slave3/config.retry Go to /etc/origin/master, indeed there is no /etc/origin/master/openshift-registry.kubeconfig. Go though the whole installation, the registry kubeconfig file should be generated at "Create the master certificates if they do not already exist" in openshift_master_ca role. Then add one more step to checking generated files by the above step in openshift-ansible for debugging: TASK: [openshift_master_ca | Create the master certificates if they do not already exist] *** changed: [xxxx.rdu2.redhat.com] => {"changed": true, "cmd": ["/usr/local/bin/oadm", "create-master-certs", "--hostnames=10.8.175.224,172.30.0.1,172.16.120.205,kubernetes.default.svc.cluster.local,kubernetes,openshift.default,openshift.default.svc,host-8-175-224.host.centralci.eng.rdu2.redhat.com,kubernetes.default,openshift.default.svc.cluster.local,kubernetes.default.svc,openshift", "--master=https://host-8-175-224.host.centralci.eng.rdu2.redhat.com:8443", "--public-master=https://host-8-175-224.host.centralci.eng.rdu2.redhat.com:8443", "--cert-dir=/etc/origin/master", "--overwrite=false"], "delta": "0:00:02.287135", "end": "2017-04-14 03:21:41.481822", "rc": 0, "start": "2017-04-14 03:21:39.194687", "stderr": "\n================================================================================\nATTENTION: You are running oadm via a wrapper around 'docker run openshift3/ose:'.\nThis wrapper is intended only to be used to bootstrap an environment. Please\ninstall client tools on another host once you have granted cluster-admin\nprivileges to a user. \nSee https://docs.openshift.com/enterprise/latest/cli_reference/get_started_cli.html\n=================================================================================\n\nCommand \"create-master-certs\" is deprecated, Use 'oadm ca' instead.", "stdout": "Generated new key pair as /etc/origin/master/serviceaccounts.public.key and /etc/origin/master/serviceaccounts.private.key", "warnings": []} TASK: [openshift_master_ca | jialiu checking] ********************************* changed: [xxxx.rdu2.redhat.com] => {"changed": true, "cmd": "ls \"/etc/origin/master\"", "delta": "0:00:00.002712", "end": "2017-04-14 03:21:41.899971", "rc": 0, "start": "2017-04-14 03:21:41.897259", "stderr": "", "stdout": "admin.crt\nadmin.key\nadmin.kubeconfig\nca-bundle.crt\nca.crt\nca.key\nca.serial.txt\netcd.server.crt\netcd.server.key\nmaster.etcd-client.crt\nmaster.etcd-client.key\nmaster.kubelet-client.crt\nmaster.kubelet-client.key\nmaster.proxy-client.crt\nmaster.proxy-client.key\nmaster.server.crt\nmaster.server.key\nopenshift-master.crt\nopenshift-master.key\nopenshift-master.kubeconfig\nservice-signer.crt\nservice-signer.key\nserviceaccounts.private.key\nserviceaccounts.public.key", "warnings": []} The output prove that both registry and router certificates and kubeconfig files are not generated. The weird thing is that manually run "/usr/local/bin/oadm create-master-certs" command just like what playbook did, registry and router certificates and kubeconfig files are generated successfully on the failed master. Expected results: installation is completed successfully. Additional info: 1. Try a rpm install using the same version of openshift-ansible does not encounter such issue. 2. This blocking QE's testing 3. Compare 3.2, 3.1 is using CLI wrapper to run oc/oadm command, I think backport 3.2 change about copying CLI binary from image to host should have a chance to fix this issue, of course, this is just my guess.
I backported those changes, but in my testing I'm not getting /etc/origin/master/openshift-{router,registry}.kubeconfig created during installation.
https://github.com/openshift/openshift-ansible/pull/3946 backports those changes and sets the default image to v3.1.
The problem was that it was pulling the latest image which was a 3.5 image and using that to generate certificates. That version of openshift no longer generates the openshift-{registry,router}.kubeconfig files. Later in the install, after the version had been detected it would update the image to the correct version. So, fix this by extracting the CLI and fixing the defaulting of the image version to be 'v3.1'. This will break origin installs but origin 1.1 is unsupported at this point.
Verified this bug with openshift-ansible-3.0.100-1.git.0.95611a0.el7aos.noarch, and PASS. Actually here user have to set "openshift_image_tag=v3.1.1.11" to make the whole installation completed. If user does not set openshift_image_tag, and the latest image is a 3.5 image, another place will hit error, though a default v3.1 version of image will be pulled at "Pull CLI Image" step. <--snip--> TASK: [openshift_cli | Pull CLI Image] **************************************** changed: [openshift-104.lab.sjc.redhat.com] => {"changed": true, "cmd": ["docker", "pull", "openshift3/ose:v3.1"], "delta": "0:00:18.234724", "end": "2017-04-19 03:11:05.118800", "rc": 0, "start": "2017-04-19 03:10:46.884076", "stderr": "", "stdout": "Trying to pull repository registry.access.stage.redhat.com/openshift3/ose ... v3.1: Pulling from openshift3/ose\n1154060a6b29: Pulling fs layer\n374c01d0b0c4: Pulling fs layer\nef612c978d09: Already exists\nd63a0d7e67ae: Already exists\n1154060a6b29: Verifying Checksum\n1154060a6b29: Download complete\n374c01d0b0c4: Verifying Checksum\n374c01d0b0c4: Download complete\n1154060a6b29: Pull complete\n374c01d0b0c4: Pull complete\nDigest: sha256:fff2d66555519a57169d3edc9895fc23c263512fdf868d1ec88a2fe677fc1c54\nStatus: Downloaded newer image for registry.access.stage.redhat.com/openshift3/ose:v3.1", "stdout_lines": ["Trying to pull repository registry.access.stage.redhat.com/openshift3/ose ... v3.1: Pulling from openshift3/ose", "1154060a6b29: Pulling fs layer", "374c01d0b0c4: Pulling fs layer", "ef612c978d09: Already exists", "d63a0d7e67ae: Already exists", "1154060a6b29: Verifying Checksum", "1154060a6b29: Download complete", "374c01d0b0c4: Verifying Checksum", "374c01d0b0c4: Download complete", "1154060a6b29: Pull complete", "374c01d0b0c4: Pull complete", "Digest: sha256:fff2d66555519a57169d3edc9895fc23c263512fdf868d1ec88a2fe677fc1c54", "Status: Downloaded newer image for registry.access.stage.redhat.com/openshift3/ose:v3.1"], "warnings": []} <--snip--> TASK: [openshift_docker | set_fact ] ****************************************** ok: [openshift-104.lab.sjc.redhat.com] => {"ansible_facts": {"has_image_tag_fact": "False", "is_containerized": "True"}} TASK: [openshift_docker | Set version when containerized] ********************* changed: [openshift-104.lab.sjc.redhat.com] => {"changed": true, "cmd": ["docker", "run", "--rm", "openshift3/ose", "version"], "delta": "0:00:54.991404", "end": "2017-04-19 03:10:29.104796", "rc": 0, "start": "2017-04-19 03:09:34.113392", "stderr": "Unable to find image 'openshift3/ose:latest' locally\nTrying to pull repository registry.access.stage.redhat.com/openshift3/ose ... latest: Pulling from openshift3/ose\nef612c978d09: Pulling fs layer\nd63a0d7e67ae: Pulling fs layer\n5385c04c2690: Pulling fs layer\n78f0c10e3a6d: Pulling fs layer\nd63a0d7e67ae: Verifying Checksum\nd63a0d7e67ae: Download complete\nef612c978d09: Verifying Checksum\nef612c978d09: Download complete\n78f0c10e3a6d: Verifying Checksum\n78f0c10e3a6d: Download complete\nef612c978d09: Pull complete\nd63a0d7e67ae: Pull complete\n5385c04c2690: Verifying Checksum\n5385c04c2690: Download complete\n5385c04c2690: Pull complete\n78f0c10e3a6d: Pull complete\nDigest: sha256:d2141116caeb290d3b130cadd8b937a41356307054d697cc51836a2dfde70652\nStatus: Downloaded newer image for registry.access.stage.redhat.com/openshift3/ose:latest", "stdout": "openshift v3.5.5.5\nkubernetes v1.5.2+43a9be4\netcd 3.1.0", "warnings": []} TASK: [openshift_docker | set_fact ] ****************************************** skipping: [openshift-104.lab.sjc.redhat.com] TASK: [openshift_docker | set_fact ] ****************************************** ok: [openshift-104.lab.sjc.redhat.com] => {"ansible_facts": {"l_image_tag": "v3.5.5.5"}} TASK: [openshift_docker | set_fact ] ****************************************** skipping: [openshift-104.lab.sjc.redhat.com] <--snip--> <--snip--> TASK: [openshift_docker_facts | set_fact ] ************************************ ok: [openshift-104.lab.sjc.redhat.com] => {"ansible_facts": {"l_common_version": "3.5.5.5"}} TASK: [openshift_docker_facts | set_fact ] ************************************ skipping: [openshift-104.lab.sjc.redhat.com] TASK: [openshift_docker_facts | Set docker version to be installed] *********** skipping: [openshift-104.lab.sjc.redhat.com] TASK: [openshift_docker_facts | Set docker version to be installed] *********** ok: [openshift-104.lab.sjc.redhat.com] => {"ansible_facts": {"docker_version": "1.9.1"}} <--snip--> <--snip--> TASK: [docker | Install docker] *********************************************** failed: [openshift-104.lab.sjc.redhat.com] => {"changed": false, "failed": true, "rc": 0, "results": []} msg: No Package matching 'docker-1.9.1' found available, installed or updated FATAL: all hosts have already failed -- aborting <--snip--> Note that, before running installation, already install docker-excluder to excluder docker-1.9.1, because 3.1 only support docker-1.8.2 That means "Set version when containerized" setp in roles/openshift_docker/tasks/main.yml also need more polish. Because now QE could finish containerized install by setting openshift_image_tag, move this bug to "VERIFIED", will open a new bug to track installer issue when openshift_image_tag is not set.
New bug - https://bugzilla.redhat.com/show_bug.cgi?id=1443423 to track issues describe in comment 5 when openshift_image_tag is not set.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0989