Description of problem: vSphere UPI installation is failing with below error " failed to initialize the cluster: Some cluster operators are still updating: authentication, console " Version-Release number of the following components: $ ./bin/openshift-install version ./bin/openshift-install unreleased-master-1704-gb5dbb46b7e97d2c63333048f055dd518aa01eb10-dirty built from commit b5dbb46b7e97d2c63333048f055dd518aa01eb10 release image registry.svc.ci.openshift.org/ocp/release@sha256:50e379837780325a517151a5edf61eb1689b8249a0e206731d9593e63f2e71d6 > openshift-install-linux-4.2.0-0.ci-2019-09-09-021340.tar.gz How reproducible: 1/1 Steps to Reproduce: 1. Install OCP vSphere UPI as per documentaion waiting for install complete fails with below error Actual results: > boot-strap is completed without any issues 2019-09-09 11:37:03,319 - INFO - ocs_ci.utility.utils.run_cmd.369 - Executing command: ./bin/openshift-install wait-for bootstrap-complete --dir /home/vavuthu/VJ/installations/clusterdirs/vs p-test --log-level INFO 2019-09-09 12:05:22,630 - DEBUG - ocs_ci.utility.utils.run_cmd.379 - Command output: 2019-09-09 12:05:22,631 - WARNING - ocs_ci.utility.utils.run_cmd.381 - Command warning:: level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.vsp-test.qe.rh-ocs.com:6443 ..." level=info msg="API v1.14.6+f26aefa up" level=info msg="Waiting up to 30m0s for bootstrapping to complete..." level=info msg="It is now safe to remove the bootstrap resources" > wait-for install-complete failed with below error $ ./bin/openshift-install wait-for install-complete --dir=/home/vavuthu/VJ/installations/clusterdirs/vsp-test/ INFO Waiting up to 30m0s for the cluster at https://api.vsp-test.qe.rh-ocs.com:6443 to initialize... FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console $ Expected results: installation should complete without any errors Additional info: > clusteroperator status $ oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication False True False 30m cloud-credential 4.2.0-0.ci-2019-09-09-021340 True False False 54m cluster-autoscaler 4.2.0-0.ci-2019-09-09-021340 True False False 27m console 4.2.0-0.ci-2019-09-09-021340 False True False 34m dns 4.2.0-0.ci-2019-09-09-021340 True False False 55m image-registry 4.2.0-0.ci-2019-09-09-021340 True False False 32m ingress 4.2.0-0.ci-2019-09-09-021340 True False False 33m insights 4.2.0-0.ci-2019-09-09-021340 True False False 56m kube-apiserver 4.2.0-0.ci-2019-09-09-021340 True True False 41m kube-controller-manager 4.2.0-0.ci-2019-09-09-021340 True False False 42m kube-scheduler 4.2.0-0.ci-2019-09-09-021340 True True False 41m machine-api 4.2.0-0.ci-2019-09-09-021340 True False False 54m machine-config 4.2.0-0.ci-2019-09-09-021340 True False False 42m marketplace 4.2.0-0.ci-2019-09-09-021340 True False False 35m monitoring 4.2.0-0.ci-2019-09-09-021340 True False False 26m network 4.2.0-0.ci-2019-09-09-021340 True True False 41m node-tuning 4.2.0-0.ci-2019-09-09-021340 True False False 37m openshift-apiserver 4.2.0-0.ci-2019-09-09-021340 True False False 39m openshift-controller-manager 4.2.0-0.ci-2019-09-09-021340 True False False 42m openshift-samples 4.2.0-0.ci-2019-09-09-021340 True False False 22m operator-lifecycle-manager 4.2.0-0.ci-2019-09-09-021340 True False False 43m operator-lifecycle-manager-catalog 4.2.0-0.ci-2019-09-09-021340 True False False 42m operator-lifecycle-manager-packageserver 4.2.0-0.ci-2019-09-09-021340 True False False 40m service-ca 4.2.0-0.ci-2019-09-09-021340 True False False 54m service-catalog-apiserver 4.2.0-0.ci-2019-09-09-021340 True False False 38m service-catalog-controller-manager 4.2.0-0.ci-2019-09-09-021340 True False False 38m storage 4.2.0-0.ci-2019-09-09-021340 True False False 35m > describe of failed clusteroperators $ oc describe co authentication Name: authentication Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-09-09T06:37:10Z Generation: 1 Resource Version: 16335 Self Link: /apis/config.openshift.io/v1/clusteroperators/authentication UID: 400983d9-d2cc-11e9-8fc8-005056be0641 Spec: Status: Conditions: Last Transition Time: 2019-09-09T06:45:08Z Reason: AsExpected Status: False Type: Degraded Last Transition Time: 2019-09-09T06:45:08Z Message: Progressing: got '404 Not Found' status while trying to GET the OAuth well-known https://10.35.145.26:6443/.well-known/oauth-authorization-server endpoint data Reason: ProgressingWellKnownNotReady Status: True Type: Progressing Last Transition Time: 2019-09-09T06:45:08Z Reason: Available Status: False Type: Available Last Transition Time: 2019-09-09T06:37:10Z Reason: AsExpected Status: True Type: Upgradeable Extension: <nil> Related Objects: Group: operator.openshift.io Name: cluster Resource: authentications Group: config.openshift.io Name: cluster Resource: authentications Group: config.openshift.io Name: cluster Resource: infrastructures Group: config.openshift.io Name: cluster Resource: oauths Group: Name: openshift-config Resource: namespaces Group: Name: openshift-config-managed Resource: namespaces Group: Name: openshift-authentication Resource: namespaces Group: Name: openshift-authentication-operator Resource: namespaces Events: <none> $ oc describe co console Name: console Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-09-09T06:41:22Z Generation: 1 Resource Version: 20241 Self Link: /apis/config.openshift.io/v1/clusteroperators/console UID: d63d41af-d2cc-11e9-8fc8-005056be0641 Spec: Status: Conditions: Last Transition Time: 2019-09-09T06:41:24Z Reason: AsExpected Status: False Type: Degraded Last Transition Time: 2019-09-09T06:42:00Z Message: SyncLoopRefreshProgressing: Working toward version 4.2.0-0.ci-2019-09-09-021340 Reason: SyncLoopRefreshProgressingInProgress Status: True Type: Progressing Last Transition Time: 2019-09-09T06:42:00Z Message: DeploymentAvailable: 2 replicas ready at version 4.2.0-0.ci-2019-09-09-021340 Reason: DeploymentAvailableFailedUpdate Status: False Type: Available Last Transition Time: 2019-09-09T06:41:24Z Reason: AsExpected Status: True Type: Upgradeable Extension: <nil> Related Objects: Group: operator.openshift.io Name: cluster Resource: consoles Group: config.openshift.io Name: cluster Resource: consoles Group: config.openshift.io Name: cluster Resource: infrastructures Group: config.openshift.io Name: cluster Resource: proxies Group: oauth.openshift.io Name: console Resource: oauthclients Group: Name: openshift-console-operator Resource: namespaces Group: Name: openshift-console Resource: namespaces Group: Name: console-public Namespace: openshift-config-managed Resource: configmaps Versions: Name: operator Version: 4.2.0-0.ci-2019-09-09-021340 Events: <none> $ must-gather logs: http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/must-gather.local.6406989811990035970.tar.gz
It seems that kube-apiserver pod fails on being created because networking is not available at one of the nodes, here's the reported error: ``` 'Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-5-control-plane-1_openshift-kube-apiserver_900db7f3-d2ce-11e9-8fc8-005056be0641_0(6eb4a350ef81b980482f853dc2585bcac49e5b395ab03fb7472c0736833d91e3): Multus: Err adding pod to network "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cniserver/socket: connect: connection refused' ```
sdn-7llq6 is failing with "rm: cannot remove '/etc/cni/net.d/80-openshift-network.conf': Permission denied" How!?!?
Is this cluster still up? Can we get the node journal?
(In reply to Casey Callendrello from comment #4) > Is this cluster still up? Can we get the node journal? Cluster is not available. It was removed after collecting all logs.
Unfortunately must-gather doesn't actually gather everything we need to debug this. Please try and reproduce, and keep the cluster up.
I suspect this is a selinux issue. Running `ls -Z /etc/cni/net.d/80-openshift-network.conf` on all the nodes would tell us if different selinux permissions are being used.
> I have tried 3 times to reproduce the issue but it was successful every time. Below are builds used for installation 1st attempt: 4.2.0-0.ci-2019-09-10-121820 2nd attempt : 4.2.0-0.ci-2019-09-09-021340 ( same build where I faced issue previously ) 3rd attempt: 4.2.0-0.ci-2019-09-09-021340 > $ date;oc --kubeconfig /home/vavuthu/VJ/installations/clusterdirs/qe1/auth/kubeconfig get ClusterOperator Wed Sep 11 00:27:32 IST 2019 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.2.0-0.ci-2019-09-09-021340 True False False 3m4s cloud-credential 4.2.0-0.ci-2019-09-09-021340 True False False 40m cluster-autoscaler 4.2.0-0.ci-2019-09-09-021340 True False False 12m console 4.2.0-0.ci-2019-09-09-021340 True False False 6m38s dns 4.2.0-0.ci-2019-09-09-021340 True False False 28m image-registry 4.2.0-0.ci-2019-09-09-021340 True False False 17m >