Bug 1442346 - [3.1] registry/router certificates are not generated in containerized install which is leading installer exit
Summary: [3.1] registry/router certificates are not generated in containerized install...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.1.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.1.1
Assignee: Scott Dodson
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-14 08:21 UTC by Johnny Liu
Modified: 2017-04-19 19:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously 3.1 containerized installs may have incorrectly used the latest image when creating the initial certificates. Because of this and a change to the 3.5 image this could have led to a scenario where the 3.1 installation would fail because the 3.5 image no longer creates openshift-registry.kubeconfig and openshift-router.kubeconfig as required of a 3.1 install. The 3.1 installer has been updated to ensure that a 3.1 image is used when creating the initial certificates which ensures that all the necessary files are created.
Clone Of:
Environment:
Last Closed: 2017-04-19 19:44:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0989 0 normal SHIPPED_LIVE OpenShift Container Platform 3.4, 3.3, 3.2, and 3.1 bug fix update 2017-04-19 23:42:19 UTC

Description Johnny Liu 2017-04-14 08:21:39 UTC
Description of problem:
see the following details.

Version-Release number of selected component (if applicable):
openshift3/ose:v3.1.1.11 (374c01d0b0c4)
openshift-ansible-playbooks-3.0.98-1.git.0.248113a.el7aos.noarch

How reproducible:
Always

Steps to Reproduce:
1. prepare inventory host file to trigger a containerized install
2. trigger installation
3.

Actual results:
installer exit at the following step:
TASK: [openshift_registry | Deploy OpenShift Registry] ************************ 
failed: [xxxx.redhat.com] => {"changed": true, "cmd": ["/usr/local/bin/oadm", "registry", "--create", "--replicas=1", "--service-account=registry", "--selector=region=infra", "--credentials=/etc/origin/master/openshift-registry.kubeconfig", "--images=registry.access.stage.redhat.com/openshift3/ose-${component}:${version}"], "delta": "0:00:00.554250", "end": "2017-04-14 03:26:22.395568", "rc": 1, "start": "2017-04-14 03:26:21.841318", "stdout_lines": [], "warnings": []}
stderr: 
================================================================================
ATTENTION: You are running oadm via a wrapper around 'docker run openshift3/ose:v3.1.1.11'.
This wrapper is intended only to be used to bootstrap an environment. Please
install client tools on another host once you have granted cluster-admin
privileges to a user. 
See https://docs.openshift.com/enterprise/latest/cli_reference/get_started_cli.html
=================================================================================

error: registry does not exist; the provided credentials "/etc/origin/master/openshift-registry.kubeconfig" could not be loaded: stat /etc/origin/master/openshift-registry.kubeconfig: no such file or directory

FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/home/slave3/config.retry


Go to /etc/origin/master, indeed there is no /etc/origin/master/openshift-registry.kubeconfig.

Go though the whole installation, the registry kubeconfig file should be generated at "Create the master certificates if they do not already exist" in openshift_master_ca role.

Then add one more step to checking generated files by the above step in openshift-ansible for debugging:
TASK: [openshift_master_ca | Create the master certificates if they do not already exist] *** 
changed: [xxxx.rdu2.redhat.com] => {"changed": true, "cmd": ["/usr/local/bin/oadm", "create-master-certs", "--hostnames=10.8.175.224,172.30.0.1,172.16.120.205,kubernetes.default.svc.cluster.local,kubernetes,openshift.default,openshift.default.svc,host-8-175-224.host.centralci.eng.rdu2.redhat.com,kubernetes.default,openshift.default.svc.cluster.local,kubernetes.default.svc,openshift", "--master=https://host-8-175-224.host.centralci.eng.rdu2.redhat.com:8443", "--public-master=https://host-8-175-224.host.centralci.eng.rdu2.redhat.com:8443", "--cert-dir=/etc/origin/master", "--overwrite=false"], "delta": "0:00:02.287135", "end": "2017-04-14 03:21:41.481822", "rc": 0, "start": "2017-04-14 03:21:39.194687", "stderr": "\n================================================================================\nATTENTION: You are running oadm via a wrapper around 'docker run openshift3/ose:'.\nThis wrapper is intended only to be used to bootstrap an environment. Please\ninstall client tools on another host once you have granted cluster-admin\nprivileges to a user. \nSee https://docs.openshift.com/enterprise/latest/cli_reference/get_started_cli.html\n=================================================================================\n\nCommand \"create-master-certs\" is deprecated, Use 'oadm ca' instead.", "stdout": "Generated new key pair as /etc/origin/master/serviceaccounts.public.key and /etc/origin/master/serviceaccounts.private.key", "warnings": []}

TASK: [openshift_master_ca | jialiu checking] ********************************* 
changed: [xxxx.rdu2.redhat.com] => {"changed": true, "cmd": "ls \"/etc/origin/master\"", "delta": "0:00:00.002712", "end": "2017-04-14 03:21:41.899971", "rc": 0, "start": "2017-04-14 03:21:41.897259", "stderr": "", "stdout": "admin.crt\nadmin.key\nadmin.kubeconfig\nca-bundle.crt\nca.crt\nca.key\nca.serial.txt\netcd.server.crt\netcd.server.key\nmaster.etcd-client.crt\nmaster.etcd-client.key\nmaster.kubelet-client.crt\nmaster.kubelet-client.key\nmaster.proxy-client.crt\nmaster.proxy-client.key\nmaster.server.crt\nmaster.server.key\nopenshift-master.crt\nopenshift-master.key\nopenshift-master.kubeconfig\nservice-signer.crt\nservice-signer.key\nserviceaccounts.private.key\nserviceaccounts.public.key", "warnings": []}


The output prove that both registry and router certificates and kubeconfig files are not generated.

The weird thing is that manually run "/usr/local/bin/oadm create-master-certs" command just like what playbook did, registry and router certificates and kubeconfig files are generated successfully on the failed master.


Expected results:
installation is completed successfully.

Additional info:
1. Try a rpm install using the same version of openshift-ansible does not encounter such issue.
2. This blocking QE's testing
3. Compare 3.2, 3.1 is using CLI wrapper to run oc/oadm command, I think backport 3.2 change about copying CLI binary from image to host should have a chance to fix this issue, of course, this is just my guess.

Comment 1 Scott Dodson 2017-04-18 18:50:15 UTC
I backported those changes, but in my testing I'm not getting /etc/origin/master/openshift-{router,registry}.kubeconfig created during installation.

Comment 2 Scott Dodson 2017-04-19 00:03:02 UTC
https://github.com/openshift/openshift-ansible/pull/3946 backports those changes and sets the default image to v3.1.

Comment 3 Scott Dodson 2017-04-19 00:05:35 UTC
The problem was that it was pulling the latest image which was a 3.5 image and using that to generate certificates. That version of openshift no longer generates the openshift-{registry,router}.kubeconfig files. Later in the install, after the version had been detected it would update the image to the correct version.

So, fix this by extracting the CLI and fixing the defaulting of the image version to be 'v3.1'. This will break origin installs but origin 1.1 is unsupported at this point.

Comment 5 Johnny Liu 2017-04-19 08:22:55 UTC
Verified this bug with openshift-ansible-3.0.100-1.git.0.95611a0.el7aos.noarch, and PASS.

Actually here user have to set "openshift_image_tag=v3.1.1.11" to make the whole installation completed. 

If user does not set openshift_image_tag, and the latest image is a 3.5 image, another place will hit error, though a default v3.1 version of image will be pulled at "Pull CLI Image" step.

<--snip-->
TASK: [openshift_cli | Pull CLI Image] **************************************** 
changed: [openshift-104.lab.sjc.redhat.com] => {"changed": true, "cmd": ["docker", "pull", "openshift3/ose:v3.1"], "delta": "0:00:18.234724", "end": "2017-04-19 03:11:05.118800", "rc": 0, "start": "2017-04-19 03:10:46.884076", "stderr": "", "stdout": "Trying to pull repository registry.access.stage.redhat.com/openshift3/ose ... v3.1: Pulling from openshift3/ose\n1154060a6b29: Pulling fs layer\n374c01d0b0c4: Pulling fs layer\nef612c978d09: Already exists\nd63a0d7e67ae: Already exists\n1154060a6b29: Verifying Checksum\n1154060a6b29: Download complete\n374c01d0b0c4: Verifying Checksum\n374c01d0b0c4: Download complete\n1154060a6b29: Pull complete\n374c01d0b0c4: Pull complete\nDigest: sha256:fff2d66555519a57169d3edc9895fc23c263512fdf868d1ec88a2fe677fc1c54\nStatus: Downloaded newer image for registry.access.stage.redhat.com/openshift3/ose:v3.1", "stdout_lines": ["Trying to pull repository registry.access.stage.redhat.com/openshift3/ose ... v3.1: Pulling from openshift3/ose", "1154060a6b29: Pulling fs layer", "374c01d0b0c4: Pulling fs layer", "ef612c978d09: Already exists", "d63a0d7e67ae: Already exists", "1154060a6b29: Verifying Checksum", "1154060a6b29: Download complete", "374c01d0b0c4: Verifying Checksum", "374c01d0b0c4: Download complete", "1154060a6b29: Pull complete", "374c01d0b0c4: Pull complete", "Digest: sha256:fff2d66555519a57169d3edc9895fc23c263512fdf868d1ec88a2fe677fc1c54", "Status: Downloaded newer image for registry.access.stage.redhat.com/openshift3/ose:v3.1"], "warnings": []}
<--snip-->
TASK: [openshift_docker | set_fact ] ****************************************** 
ok: [openshift-104.lab.sjc.redhat.com] => {"ansible_facts": {"has_image_tag_fact": "False", "is_containerized": "True"}}

TASK: [openshift_docker | Set version when containerized] ********************* 
changed: [openshift-104.lab.sjc.redhat.com] => {"changed": true, "cmd": ["docker", "run", "--rm", "openshift3/ose", "version"], "delta": "0:00:54.991404", "end": "2017-04-19 03:10:29.104796", "rc": 0, "start": "2017-04-19 03:09:34.113392", "stderr": "Unable to find image 'openshift3/ose:latest' locally\nTrying to pull repository registry.access.stage.redhat.com/openshift3/ose ... latest: Pulling from openshift3/ose\nef612c978d09: Pulling fs layer\nd63a0d7e67ae: Pulling fs layer\n5385c04c2690: Pulling fs layer\n78f0c10e3a6d: Pulling fs layer\nd63a0d7e67ae: Verifying Checksum\nd63a0d7e67ae: Download complete\nef612c978d09: Verifying Checksum\nef612c978d09: Download complete\n78f0c10e3a6d: Verifying Checksum\n78f0c10e3a6d: Download complete\nef612c978d09: Pull complete\nd63a0d7e67ae: Pull complete\n5385c04c2690: Verifying Checksum\n5385c04c2690: Download complete\n5385c04c2690: Pull complete\n78f0c10e3a6d: Pull complete\nDigest: sha256:d2141116caeb290d3b130cadd8b937a41356307054d697cc51836a2dfde70652\nStatus: Downloaded newer image for registry.access.stage.redhat.com/openshift3/ose:latest", "stdout": "openshift v3.5.5.5\nkubernetes v1.5.2+43a9be4\netcd 3.1.0", "warnings": []}

TASK: [openshift_docker | set_fact ] ****************************************** 
skipping: [openshift-104.lab.sjc.redhat.com]

TASK: [openshift_docker | set_fact ] ****************************************** 

ok: [openshift-104.lab.sjc.redhat.com] => {"ansible_facts": {"l_image_tag": "v3.5.5.5"}}

TASK: [openshift_docker | set_fact ] ****************************************** 
skipping: [openshift-104.lab.sjc.redhat.com]
<--snip-->
<--snip-->
TASK: [openshift_docker_facts | set_fact ] ************************************ 
ok: [openshift-104.lab.sjc.redhat.com] => {"ansible_facts": {"l_common_version": "3.5.5.5"}}

TASK: [openshift_docker_facts | set_fact ] ************************************ 
skipping: [openshift-104.lab.sjc.redhat.com]

TASK: [openshift_docker_facts | Set docker version to be installed] *********** 
skipping: [openshift-104.lab.sjc.redhat.com]

TASK: [openshift_docker_facts | Set docker version to be installed] *********** 
ok: [openshift-104.lab.sjc.redhat.com] => {"ansible_facts": {"docker_version": "1.9.1"}}
<--snip-->
<--snip-->
TASK: [docker | Install docker] *********************************************** 
failed: [openshift-104.lab.sjc.redhat.com] => {"changed": false, "failed": true, "rc": 0, "results": []}
msg: No Package matching 'docker-1.9.1' found available, installed or updated

FATAL: all hosts have already failed -- aborting
<--snip-->

Note that, before running installation, already install docker-excluder to excluder docker-1.9.1, because 3.1 only support docker-1.8.2

That means "Set version when containerized" setp in roles/openshift_docker/tasks/main.yml also need more polish.

Because now QE could finish containerized install by setting openshift_image_tag, move this bug to "VERIFIED", will open a new bug to track installer issue when openshift_image_tag is not set.

Comment 6 Johnny Liu 2017-04-19 08:50:14 UTC
New bug - https://bugzilla.redhat.com/show_bug.cgi?id=1443423 to track issues describe in comment 5 when openshift_image_tag is not set.

Comment 8 errata-xmlrpc 2017-04-19 19:44:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0989


Note You need to log in before you can comment on or make changes to this bug.