Bug 1814133

Summary: Failed to install OCP 4.3 latest on the Bare Metals: Cluster operator authentication Degraded is True with RouteHealthDegradedFailedGet: RouteHealthDegraded: failed to GET route: dial tcp <ip>:443: connect: connection refused\
Product: OpenShift Container Platform Reporter: spandura
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, mmasters
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-04 18:05:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description spandura 2020-03-17 07:02:08 UTC
Description of problem:
==========================
Failed to install OCP 4.3 latest build on Bare Metals. 


Following is the error message for OCP 4.3 install failure on Bare Metals
=======================================================
fatal: [10.8.32.24]: FAILED! => {
    "changed": false,
    "cmd": [
        "openshift-install",
        "wait-for",
        "install-complete"
    ],
    "delta": "0:30:00.122576",
    "end": "2020-03-13 10:17:29.372341",
    "invocation": {
        "module_args": {
            "_raw_params": "openshift-install wait-for install-complete",
            "_uses_shell": false,
            "argv": null,
            "chdir": "/root/ocp-cluster-bm",
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true,
            "warn": true
        }
    },
    "msg": "non-zero return code",
    "rc": 1,
    "start": "2020-03-13 09:47:29.249765",
    "stderr": "level=info msg=\"Waiting up to 30m0s for the cluster at https://api.ocp-cluster-bm.ocpmig.css-qe.com:6443 to initialize...\"\nlevel=error msg=\"Cluster operator authentication Degraded is True with RouteHealthDegradedFailedGet: RouteHealthDegraded: failed to GET route: dial tcp 10.8.32.24:443: connect: connection refused\"\nlevel=info msg=\"Cluster operator authentication Progressing is Unknown with NoData: \"\nlevel=info msg=\"Cluster operator authentication Available is Unknown with NoData: \"\nlevel=info msg=\"Cluster operator console Progressing is True with SyncLoopRefreshProgressingInProgress: SyncLoopRefreshProgressing: Working toward version 4.3.3\"\nlevel=info msg=\"Cluster operator console Available is False with DeploymentAvailableInsufficientReplicas: DeploymentAvailable: 0 pods available for console deployment\"\nlevel=info msg=\"Cluster operator insights Disabled is False with : \"\nlevel=fatal msg=\"failed to initialize the cluster: Some cluster operators are still updating: authentication, console\"",
    "stderr_lines": [
        "level=info msg=\"Waiting up to 30m0s for the cluster at https://api.ocp-cluster-bm.ocpmig.css-qe.com:6443 to initialize...\"",
        "level=error msg=\"Cluster operator authentication Degraded is True with RouteHealthDegradedFailedGet: RouteHealthDegraded: failed to GET route: dial tcp 10.8.32.24:443: connect: connection refused\"",
        "level=info msg=\"Cluster operator authentication Progressing is Unknown with NoData: \"",
        "level=info msg=\"Cluster operator authentication Available is Unknown with NoData: \"",
        "level=info msg=\"Cluster operator console Progressing is True with SyncLoopRefreshProgressingInProgress: SyncLoopRefreshProgressing: Working toward version 4.3.3\"",
        "level=info msg=\"Cluster operator console Available is False with DeploymentAvailableInsufficientReplicas: DeploymentAvailable: 0 pods available for console deployment\"",
        "level=info msg=\"Cluster operator insights Disabled is False with : \"",
        "level=fatal msg=\"failed to initialize the cluster: Some cluster operators are still updating: authentication, console\""
    ],
    "stdout": "",
    "stdout_lines": []
}

PLAY RECAP ********************************************************************************************************************************************************************************************************
10.8.32.24                 : ok=89   changed=54   unreachable=0    failed=1    skipped=7    rescued=0    ignored=0   


Version-Release number of the following components:
==========================================================
[core@master0 ~]$ rpm -qa | grep openshift
openshift-hyperkube-4.3.3-202002140552.git.0.e38059c.el8.x86_64
openshift-clients-4.3.3-202002140552.git.1.ff73b47.el8.x86_64

ocp_installers_index_url: https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.3.3/

ocp_rhcos_index_url: https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/


How reproducible:
====================
1/1


Steps to Reproduce:
========================
1. Ran this playbook to install ocp 4.3 on baremetals: https://code.engineering.redhat.com/gerrit/gitweb?p=CSS_OCP_OCS_Migration.git;a=blob;f=css_ocp_ocs_migration/ansible/playbooks/setup_ocp_4_x.yml;h=43770efd7f969edd65d28d8f4e876c74fe4135ef;hb=9e861bef5f85c95854ab76351a35613e1192c850

Actual results:
=======================
Failed to install OCP 4.3 on bare metals

Expected results:
====================
Openshift install should be successful

Additional info:
=====================
cat ocp-cluster-bm/.openshift_install.log
http://pastebin.test.redhat.com/845283

cat /etc/haproxy/haproxy.cfg: 
http://pastebin.test.redhat.com/845285

iptables -L :
http://pastebin.test.redhat.com/845286


cat ocp-cluster-bm/install-config.yaml
http://pastebin.test.redhat.com/845289

Comment 1 spandura 2020-03-17 10:20:45 UTC
[root@dell-per730-09 ~]# export KUBECONFIG="ocp-cluster-bm/auth/kubeconfig" 
[root@dell-per730-09 ~]# 
[root@dell-per730-09 ~]# oc projects
You have access to the following projects and can switch between them with 'oc project <projectname>':

default
kube-node-lease
kube-public
kube-system
openshift
openshift-apiserver
openshift-apiserver-operator
openshift-authentication
openshift-authentication-operator
openshift-cloud-credential-operator
openshift-cluster-machine-approver
openshift-cluster-node-tuning-operator
openshift-cluster-samples-operator
openshift-cluster-storage-operator
openshift-cluster-version
openshift-config
openshift-config-managed
openshift-console
openshift-console-operator
openshift-controller-manager
openshift-controller-manager-operator
openshift-dns
openshift-dns-operator
openshift-etcd
openshift-image-registry
openshift-infra
openshift-ingress
openshift-ingress-operator
openshift-insights
openshift-kni-infra
openshift-kube-apiserver
openshift-kube-apiserver-operator
openshift-kube-controller-manager
openshift-kube-controller-manager-operator
openshift-kube-scheduler
openshift-kube-scheduler-operator
openshift-machine-api
openshift-machine-config-operator
openshift-marketplace
openshift-monitoring
openshift-multus
openshift-network-operator
openshift-node
openshift-openstack-infra
openshift-operator-lifecycle-manager
openshift-operators
openshift-sdn
openshift-service-ca
openshift-service-ca-operator
openshift-service-catalog-apiserver-operator
openshift-service-catalog-controller-manager-operator
[root@dell-per730-09 ~]# 
[root@dell-per730-09 ~]# 
[root@dell-per730-09 ~]# oc get pods -n openshift-console
NAME                        READY   STATUS             RESTARTS   AGE
console-5d49d56cd9-m8bdq    0/1     Running            15         66m
console-5f9c89ffc5-glwzf    0/1     Running            15         67m
console-5f9c89ffc5-rvbd7    0/1     CrashLoopBackOff   15         67m
downloads-f99ff6d4f-9ncj5   1/1     Running            0          69m
downloads-f99ff6d4f-jlndd   1/1     Running            0          69m
[root@dell-per730-09 ~]# 
[root@dell-per730-09 ~]# oc logs pod/console-5d49d56cd9-m8bdq -n openshift-console
2020/03/17 10:11:34 cmd/main: cookies are secure!
2020/03/17 10:11:34 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:11:44 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:11:54 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:12:04 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:12:14 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:12:24 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:12:34 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:12:44 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:12:54 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:13:04 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:13:14 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:13:24 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:13:34 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:13:44 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:13:54 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:14:04 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
2020/03/17 10:14:14 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com/oauth/token failed: Head https://oauth-openshift.apps.ocp-cluster-bm.ocpmig.css-qe.com: dial tcp 10.8.32.24:443: connect: connection refused
[root@dell-per730-09 ~]#

Comment 2 Scott Dodson 2020-03-17 12:51:31 UTC
1) Please test with the latest version, 4.3.5
2) It's complaining about the route for various components which is highly dependent on your provisioned load balancer, please check that and the status of ingress

Most likely a problem in your infrastructure but moving over to Routing

Comment 3 spandura 2020-03-17 13:07:50 UTC
[root@dell-per730-09 ~]# oc adm must-gather
http://pastebin.test.redhat.com/845417

Comment 4 spandura 2020-03-17 13:08:00 UTC
[root@dell-per730-09 ~]# oc adm must-gather
http://pastebin.test.redhat.com/845417

Comment 5 spandura 2020-03-17 16:20:32 UTC
I tried the same on the latest build and still seeing the issues:

[core@master0 ~]$ rpm -qa | grep openshift
openshift-hyperkube-4.3.5-202002280657.git.0.b3bfb5a.el8.x86_64
openshift-clients-4.3.5-202002280657.git.1.55a9334.el8.x86_64

Comment 8 spandura 2020-03-17 19:51:29 UTC
Moving this bug to VERIFIED as i made some changes to my environment and it worked.

Comment 11 errata-xmlrpc 2020-08-04 18:05:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409