Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1624031

Summary: Missing htpasswd file causes OCP 3.10 install to fail because master api/controller cannot start.
Product: OpenShift Container Platform Reporter: Jeremy Whiting <jwhiting>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED DUPLICATE QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, jokerman, mmccomas, vrutkovs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-31 08:47:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Execution of cluster installation Ansible playbook with vvv logging enabled. none

Description Jeremy Whiting 2018-08-30 17:05:21 UTC
Created attachment 1479872 [details]
Execution of cluster installation Ansible playbook with vvv logging enabled.

Description of problem:

The cluster installation fails to complete. This is due to the master api image (registry.access.redhat.com/openshift3/ose-control-plane v3.10) not finding this required file /etc/origin/openshift-passwd on the file system.
 Not finding the file causes the service not to start and the cluster installation process fails.
 What's has been observed is on the host a htpasswd file is generated at /etc/origin/master/htpasswd by the Ansible playbooks. I suspect that file needs to be copied to the master api image to this path /etc/origin/openshift-passwd

Version-Release number of the following components:

Using openshift-ansible git branch release-3.10

# git describe
openshift-ansible-3.10.37-1

# ansible --version
ansible 2.4.4.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May 31 2018, 09:41:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]


How reproducible:

Every time.

Steps to Reproduce:
1. Configure Ansible inventory with 1 master and 1 compute node
2. Run the prerequisites script
3. Run the install cluster script

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml;ansible-playbook /home/benchuser/jwhiting/openshift-ansible/playbooks/prerequisites.yml ; rm -f nohup.out ; nohup ansible-playbook /home/benchuser/jwhiting/openshift-ansible/playbooks/deploy_cluster.yml -vvv &

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Cluster installation to complete successfully.

Observed Results

The master api service cannot start without the htpasswd file. Dumping the log of the master api image shows a file is missing.

# docker logs -f --tail 10 k8s_api_master-api-server7_kube-system_5946c1f644096161a1242b3de0ee5875_3085
Invalid MasterConfig /etc/origin/master/master-config.yaml
  oauthConfig.identityProvider[0].provider.file: Invalid value: "/etc/origin/openshift-passwd": could not read file: stat /etc/origin/openshift-passwd: no such file or directory
#

 The same error is found in the master controller.

Additional info:

This is for a single master and single compute node installation on bare metal RHEL 7.5. This issue was initially raised on GitHub.

platform version

# uname -a
Linux benchserver7 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Ansible inventory file

# cat /etc/ansible/hosts

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
containerized=false
openshift_deployment_type=openshift-enterprise
debug_level=0
openshift_node_groups=[{'name': 'node-config-master', 'labels': ['node-role.kubernetes.io/master=true']}, {'name': 'node-config-infra', 'labels': ['node-role.kubernetes.io/infra=true',]}, {'name': 'node-config-compute', 'labels': ['node-role.kubernetes.io/compute=true'], 'edits': [{ 'key': 'kubeletArguments.pods-per-core','value': ['20']}]}]
openshift_master_cluster_hostname=server7
ansible_ssh_user=root
openshift_enable_service_catalog=false
disk_availability=false
openshift_disable_check=memory_availability,disk_availability
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_htpasswd_users={'admin': 'admin:$apr1$7BqGtoN6$4m5Si/xcm1UjRiO/lA6cL0'}

[masters]
server7.acme.com

[etcd]
server7.acme.com

[nodes]
server7.acme.com openshift_node_group_name='node-config-master'
server2.acme.com openshift_node_group_name='node-config-compute'

Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Vadim Rutkovsky 2018-08-30 17:27:42 UTC
It seems `/etc/origin/openshift-passwd` was specified previously, but either facts still have it or uninstall playbook didn't work.

Jeremy, lets try this:
* Remove ~/ansible on the machine you're running playbooks from
* Remove /etc/ansible/facts.d from the hosts
* Stop the node - `systemctl stop atomic-openshift-node` on the machines you'r 
* Remove /etc/origin/master on the hosts
* Start prerequisites and deploy playbooks again

Comment 2 Jeremy Whiting 2018-08-31 08:47:37 UTC

*** This bug has been marked as a duplicate of bug 1565447 ***

Comment 3 Jeremy Whiting 2018-08-31 12:57:30 UTC
 In the end this turned out to be residual images from a 3.9 OCP install.

 This sequence of steps solved the problem.

1. uninstall everything
2. atomic storage reset
3. pre-requisits
4. cluster install

 The cluster install completed successfully.