Hide Forgot
Description of problem: https://access.redhat.com/documentation/en/openshift-enterprise/version-3.2/installation-and-configuration/#multiple-masters Using ansible hosts inventory file noted in 2.4 incorrectly puts etcdctl certs on master nodes instead of etcd nodes Also the documentation in section 2.5.7 verification tells you to run etcdctl from the master which doesn't make sense because etcd is installed on the defined etcd nodes not the master Version-Release number of selected component (if applicable): How reproducible: Create separate systems similar to following host inventory file [root@m01-useast1a-c001 ~]# cat /etc/ansible/hosts # Create an OSEv3 group that contains the master, nodes, etcd, and lb groups. # The lb group lets Ansible configure HAProxy as the load balancing solution. # Comment lb out if your load balancer is pre-configured. [OSEv3:children] masters nodes etcd lb # Set variables common for all OSEv3 hosts [OSEv3:vars] ansible_ssh_user=root deployment_type=openshift-enterprise # Uncomment the following to enable htpasswd authentication; defaults to # DenyAllPasswordIdentityProvider. #openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}] # Native high availbility cluster method with optional load balancer. # If no lb group is defined installer assumes that a load balancer has # been preconfigured. For installation the value of # openshift_master_cluster_hostname must resolve to the load balancer # or to one or all of the masters defined in the inventory if no load # balancer is present. openshift_master_cluster_method=native openshift_master_cluster_hostname=c001-useast1a.ose.sullyvon.com openshift_master_cluster_public_hostname=c001-useast1a.ose.sullyvon.com # override the default controller lease ttl #osm_controller_lease_ttl=30 # host group for masters [masters] m01-useast1a-c001.ose.sullyvon.com m02-useast1a-c001.ose.sullyvon.com m03-useast1a-c001.ose.sullyvon.com # host group for etcd [etcd] e01-useast1a-c001.ose.sullyvon.com e02-useast1a-c001.ose.sullyvon.com e03-useast1a-c001.ose.sullyvon.com # Specify load balancer host [lb] lb01-useast1a-c001.ose.sullyvon.com # host group for nodes, includes region info [nodes] m01-useast1a-c001.ose.sullyvon.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=false m02-useast1a-c001.ose.sullyvon.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=false m03-useast1a-c001.ose.sullyvon.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=false n01-useast1a-c001.ose.sullyvon.com openshift_node_labels="{'region': 'primary', 'zone': 'east'}" n02-useast1a-c001.ose.sullyvon.com openshift_node_labels="{'region': 'primary', 'zone': 'east'}" Do the Advanced Ansible Installation (see attached install log) Steps to Reproduce: 1. 2. 3. Actual results: [root@m01-useast1a-c001 ~]# which etcdctl /usr/bin/which: no etcdctl in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin) [root@m01-useast1a-c001 ~]# ls /etc/origin/master/ | grep etcd etcd.server.crt etcd.server.key master.etcd-ca.crt master.etcd-client.crt master.etcd-client.csr master.etcd-client.key There is no /etc/origin directory on the etcd hosts [root@e02-useast1a-c001 ec2-user]# cd /etc/origin bash: cd: /etc/origin: No such file or directory Expected results: If there is system separation of etcd, i.e. etcd is not installed on the same systems as the masters then the certs need to be placed on the etcd nodes accordingly, and it's origin really the right location? Additional info: One can rsync the certs from the master nodes to the etcd nodes and run etcdctl just fine from the etcd nodes [root@e01-useast1a-c001 ~]# hostname e01-useast1a-c001.ose.sullyvon.com [root@e01-useast1a-c001 ~]# cat etcd_check.sh #!/bin/bash etcdctl -C \ https://10.0.2.9:2379 \ --ca-file=/etc/origin/master/master.etcd-ca.crt \ --cert-file=/etc/origin/master/master.etcd-client.crt \ --key-file=/etc/origin/master/master.etcd-client.key cluster-health etcdctl -C \ https://10.0.2.9:2379 \ --ca-file=/etc/origin/master/master.etcd-ca.crt \ --cert-file=/etc/origin/master/master.etcd-client.crt \ --key-file=/etc/origin/master/master.etcd-client.key member list [root@e01-useast1a-c001 ~]# sh etcd_check.sh member fd267e59c8a8dbb is healthy: got healthy result from https://10.0.2.32:2379 member 83604d68922c7ccd is healthy: got healthy result from https://10.0.2.9:2379 member b946b00eebc61494 is healthy: got healthy result from https://10.0.2.194:2379 cluster is healthy fd267e59c8a8dbb: name=10.0.2.32 peerURLs=https://10.0.2.32:2380 clientURLs=https://10.0.2.32:2379 isLeader=true 83604d68922c7ccd: name=10.0.2.9 peerURLs=https://10.0.2.9:2380 clientURLs=https://10.0.2.9:2379 isLeader=false b946b00eebc61494: name=10.0.2.194 peerURLs=https://10.0.2.194:2380 clientURLs=https://10.0.2.194:2379 isLeader=false
Created attachment 1196947 [details] ansible install log
This will only be a problem when etcd hosts are not the same as the master hosts. A workaround for this is to install etcdctl on the master and use it there.
At a minimum we need to document how handle environments where etcd is not running on the masters.
Moving down to medium, workaround in comment 3.
Docs PR to mention ensuring that etcd is installed https://github.com/openshift/openshift-docs/pull/4560 Also, since this bug has been filed there are now two helper functions that are deployed on etcd hosts named `etcdctl2` and `etcdctl3` which call `etcdctl` with the appropriate flags for the cert locations on the etcd hosts. However I think the documentation is better to have the steps mentioned executed on the master. Moving to docs component.
Docs PR is merged
Content is now published: https://access.redhat.com/documentation/en-us/openshift_container_platform/3.5/html-single/installation_and_configuration/#verifying-multiple-etcd-hosts