Please provide the logs from "oc adm must-gather"
Moving back to assigned as I know there are at least two distinct bugs here.
Hello Mo, I tried with 4.2.0-0.nightly-2019-08-20-213632, it seems still have this issue. # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication Unknown Unknown True 125m cloud-credential 4.2.0-0.nightly-2019-08-20-213632 True False False 129m cluster-autoscaler 4.2.0-0.nightly-2019-08-20-213632 True False False 124m console 4.2.0-0.nightly-2019-08-20-213632 False True False 124m dns 4.2.0-0.nightly-2019-08-20-213632 True False False 128m image-registry 4.2.0-0.nightly-2019-08-20-213632 True False False 123m ingress 4.2.0-0.nightly-2019-08-20-213632 True False False 124m insights 4.2.0-0.nightly-2019-08-20-213632 True False False 129m kube-apiserver 4.2.0-0.nightly-2019-08-20-213632 True False False 128m kube-controller-manager 4.2.0-0.nightly-2019-08-20-213632 True False False 126m kube-scheduler 4.2.0-0.nightly-2019-08-20-213632 True False False 126m machine-api 4.2.0-0.nightly-2019-08-20-213632 True False False 129m machine-config 4.2.0-0.nightly-2019-08-20-213632 True False False 128m marketplace 4.2.0-0.nightly-2019-08-20-213632 True False False 123m monitoring 4.2.0-0.nightly-2019-08-20-213632 True False False 121m network 4.2.0-0.nightly-2019-08-20-213632 True False False 128m node-tuning 4.2.0-0.nightly-2019-08-20-213632 True False False 125m openshift-apiserver 4.2.0-0.nightly-2019-08-20-213632 True False False 125m openshift-controller-manager 4.2.0-0.nightly-2019-08-20-213632 True False False 128m openshift-samples 4.2.0-0.nightly-2019-08-20-213632 True False False 123m operator-lifecycle-manager 4.2.0-0.nightly-2019-08-20-213632 True False False 128m operator-lifecycle-manager-catalog 4.2.0-0.nightly-2019-08-20-213632 True False False 128m operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-08-20-213632 True False False 125m service-ca 4.2.0-0.nightly-2019-08-20-213632 True False False 129m service-catalog-apiserver 4.2.0-0.nightly-2019-08-20-213632 True False False 125m service-catalog-controller-manager 4.2.0-0.nightly-2019-08-20-213632 True False False 125m storage 4.2.0-0.nightly-2019-08-20-213632 True False False 124m # oc describe co authentication Name: authentication Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-08-21T03:22:02Z Generation: 1 Resource Version: 15984 Self Link: /apis/config.openshift.io/v1/clusteroperators/authentication UID: d7b4f42f-c3c2-11e9-aa43-02a5eca1f9aa Spec: Status: Conditions: Last Transition Time: 2019-08-21T03:25:51Z Message: RouteHealthDegraded: failed to GET route: EOF Reason: RouteHealthDegradedFailedGet Status: True Type: Degraded Last Transition Time: 2019-08-21T03:22:02Z Reason: NoData Status: Unknown Type: Progressing Last Transition Time: 2019-08-21T03:22:02Z Reason: NoData Status: Unknown Type: Available Last Transition Time: 2019-08-21T03:22:02Z Reason: AsExpected Status: True Type: Upgradeable Extension: <nil> Related Objects: Group: operator.openshift.io Name: cluster Resource: authentications Group: config.openshift.io Name: cluster Resource: authentications Group: config.openshift.io Name: cluster Resource: infrastructures Group: config.openshift.io Name: cluster Resource: oauths Group: Name: openshift-config Resource: namespaces Group: Name: openshift-config-managed Resource: namespaces Group: Name: openshift-authentication Resource: namespaces Group: Name: openshift-authentication-operator Resource: namespaces Events: <none> And master still could be scheduled, I deploy pod on it successfully.
Moving to routing to debug why routes are not working on this cluster.
Nevermind the AWS bug references — I misread the original report and missed a key point about the topology under test. The problem is that there are no instances assigned to the ELB. Looking at the cluster nodes: $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-139-134.us-east-2.compute.internal Ready master,worker 21h v1.14.0+17b784327 ip-10-0-157-164.us-east-2.compute.internal Ready master,worker 21h v1.14.0+17b784327 ip-10-0-167-50.us-east-2.compute.internal Ready master,worker 21h v1.14.0+17b784327 Notice that every worker node in the cluster is labeled as a master. In Kubernetes, master nodes are not allowed to be load balancer targets. This is a deliberate behavior upstream, not a bug. It follows that ingress controllers published by a load balancer depends on at least one non-master node on which to expose a port to connect to the load balancer. We should probably consider preventing ingress controllers from being scheduled on masters. If we did so, ingress operator would report degraded and the problem would be more visible. I think we simply don't support this topology when using cloud load balancers. I'm going to close the bug and recommend we prune the test case as unsupported.
I opened https://bugzilla.redhat.com/show_bug.cgi?id=1744370 to track the scheduling and status reporting issue.
Installing master as schedulerable node on Azure platform is successful, but cannot access any routes (since no virtual machine in Azure LB backend pools). $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-08-21-235427 True False 33m Cluster version is 4.2.0-0.nightly-2019-08-21-235427 $ oc get node NAME STATUS ROLES AGE VERSION hongli-az427-hwp5f-master-0 Ready master,worker 49m v1.14.0+a80442411 hongli-az427-hwp5f-master-1 Ready master,worker 49m v1.14.0+a80442411 hongli-az427-hwp5f-master-2 Ready master,worker 49m v1.14.0+a80442411 $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.2.0-0.nightly-2019-08-21-235427 True False False 34m cloud-credential 4.2.0-0.nightly-2019-08-21-235427 True False False 48m cluster-autoscaler 4.2.0-0.nightly-2019-08-21-235427 True False False 41m console 4.2.0-0.nightly-2019-08-21-235427 True False False 35m dns 4.2.0-0.nightly-2019-08-21-235427 True False False 48m image-registry 4.2.0-0.nightly-2019-08-21-235427 True False False 38m ingress 4.2.0-0.nightly-2019-08-21-235427 True False False 41m insights 4.2.0-0.nightly-2019-08-21-235427 True False False 48m kube-apiserver 4.2.0-0.nightly-2019-08-21-235427 True False False 45m kube-controller-manager 4.2.0-0.nightly-2019-08-21-235427 True False False 45m kube-scheduler 4.2.0-0.nightly-2019-08-21-235427 True False False 45m machine-api 4.2.0-0.nightly-2019-08-21-235427 True False False 48m machine-config 4.2.0-0.nightly-2019-08-21-235427 True False False 42m marketplace 4.2.0-0.nightly-2019-08-21-235427 True False False 41m monitoring 4.2.0-0.nightly-2019-08-21-235427 True False False 39m network 4.2.0-0.nightly-2019-08-21-235427 True False False 47m node-tuning 4.2.0-0.nightly-2019-08-21-235427 True False False 43m openshift-apiserver 4.2.0-0.nightly-2019-08-21-235427 True False False 43m openshift-controller-manager 4.2.0-0.nightly-2019-08-21-235427 True False False 46m openshift-samples 4.2.0-0.nightly-2019-08-21-235427 True False False 33m operator-lifecycle-manager 4.2.0-0.nightly-2019-08-21-235427 True False False 47m operator-lifecycle-manager-catalog 4.2.0-0.nightly-2019-08-21-235427 True False False 47m operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-08-21-235427 True False False 44m service-ca 4.2.0-0.nightly-2019-08-21-235427 True False False 48m service-catalog-apiserver 4.2.0-0.nightly-2019-08-21-235427 True False False 43m service-catalog-controller-manager 4.2.0-0.nightly-2019-08-21-235427 True False False 43m storage 4.2.0-0.nightly-2019-08-21-235427 True False False 42m $ oc get pod -n openshift-ingress -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-86f85b897b-62rs2 1/1 Running 0 34m 10.128.0.30 hongli-az427-hwp5f-master-1 <none> <none> router-default-86f85b897b-8cr9p 1/1 Running 0 34m 10.129.0.37 hongli-az427-hwp5f-master-2 <none> <none> $ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.180.223 13.89.142.235 80:31364/TCP,443:31093/TCP 42m router-internal-default ClusterIP 172.30.176.214 <none> 80/TCP,443/TCP,1936/TCP 42m $ curl https://console-openshift-console.apps.hongli-az427.qe.azure.devcluster.openshift.com -k -vv * Rebuilt URL to: https://console-openshift-console.apps.hongli-az427.qe.azure.devcluster.openshift.com/ * Trying 13.89.142.235... * TCP_NODELAY set (time out)
Created attachment 1606863 [details] error message when adding vm to ingress LB When I try to update the ingress LB and add vm to the backend pools manually, the error message says: This virtual machine and IP address is already added in another Public load balancer backend pool. So seems even cluster can be installed but ingress LB is still unavailable on Azure platform.
Regarding to comment 10, there is new bug to trace this issue, so close this one.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922