This has been reported as https://github.com/openshift/installer/issues/1884 by a user, too.
Is this issue reproducible still?
(In reply to Abhinav Dahiya from comment #5) > Is this issue reproducible still? No, we did not hit it recently.
If this happens again just let us know.
Currently qe hit this issue several times in daily ci test. The latest hit(3 times/always) is on 4.3.0-0.nightly-2019-11-27-041100 for upi/vsphere installation with ovn network. The installation always can not finish at the stage that to patch image registry, which after bootstrap succeed. After waiting for 60mins, there is still not image registry operator generated. [root@preserve-jliu-worker tmp]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE cloud-credential True False False 74m dns unknown False True True 71m insights 4.3.0-0.nightly-2019-11-27-041100 True True False 72m kube-apiserver 4.3.0-0.nightly-2019-11-27-041100 True False False 71m kube-controller-manager 4.3.0-0.nightly-2019-11-27-041100 False True False 72m kube-scheduler 4.3.0-0.nightly-2019-11-27-041100 False True False 72m machine-api 4.3.0-0.nightly-2019-11-27-041100 True False False 71m machine-config 4.3.0-0.nightly-2019-11-27-041100 False True False 72m network 4.3.0-0.nightly-2019-11-27-041100 True False False 71m openshift-apiserver 4.3.0-0.nightly-2019-11-27-041100 Unknown False False 72m openshift-controller-manager False True False 72m operator-lifecycle-manager 4.3.0-0.nightly-2019-11-27-041100 True True False 71m operator-lifecycle-manager-catalog 4.3.0-0.nightly-2019-11-27-041100 True True False 71m operator-lifecycle-manager-packageserver False True False 71m service-ca 4.3.0-0.nightly-2019-11-27-041100 True False False 72m [root@preserve-jliu-worker tmp]# oc get configs.imageregistry.operator.openshift.io cluster Error from server (NotFound): configs.imageregistry.operator.openshift.io "cluster" not found In this broken status, must-gather can not work. [root@preserve-jliu-worker tmp]# oc adm must-gather [must-gather ] OUT the server could not find the requested resource (get imagestreams.image.openshift.io must-gather) [must-gather ] OUT [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest [must-gather ] OUT namespace/openshift-must-gather-nqxkf created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-rfqgc created [must-gather ] OUT pod for plug-in image quay.io/openshift/origin-must-gather:latest created [must-gather-d2ft8] POD Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout ... So i attach cvo log and master/worker node log for debug. Some of logs about openshift-apiserver. # oc describe co openshift-apiserver Name: openshift-apiserver Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-11-29T02:21:41Z Generation: 1 Resource Version: 2595 Self Link: /apis/config.openshift.io/v1/clusteroperators/openshift-apiserver UID: ee1def8d-dfdf-45ef-b222-b95342d653f7 Spec: Status: Conditions: Last Transition Time: 2019-11-29T02:21:42Z Message: EncryptionPruneControllerDegraded: daemonset.apps "apiserver" not found EncryptionMigrationControllerDegraded: daemonset.apps "apiserver" not found EncryptionStateControllerDegraded: daemonset.apps "apiserver" not found ResourceSyncControllerDegraded: namespaces "openshift-apiserver" not found EncryptionKeyControllerDegraded: daemonset.apps "apiserver" not found Reason: AsExpected Status: False Type: Degraded Last Transition Time: 2019-11-29T02:21:42Z Reason: AsExpected Status: False Type: Progressing Last Transition Time: 2019-11-29T02:21:41Z Reason: NoData Status: Unknown Type: Available Last Transition Time: 2019-11-29T02:21:42Z Reason: AsExpected Status: True Type: Upgradeable Extension: <nil> ... # oc logs pod/openshift-apiserver-operator-5f7dcd8c88-lc9nf -n openshift-apiserver-operator I1129 04:08:57.737594 1 cmd.go:188] Using service-serving-cert provided certificates I1129 04:08:57.738165 1 observer_polling.go:136] Starting file observer I1129 04:08:57.738273 1 observer_polling.go:97] Observed change: file:/var/run/secrets/serving-cert/tls.crt (current: "ee4e4285ab6420066fac19de6bafd4e52ee8d92f6d3be1e31be188904ab35cb6", lastKnown: "ee4e4285ab6420066fac19de6bafd4e52ee8d92f6d3be1e31be188904ab35cb6") ... W1129 04:09:27.739583 1 builder.go:181] unable to get owner reference (falling back to namespace): Get https://172.30.0.1:443/api/v1/namespaces/openshift-apiserver-operator/pods: dial tcp 172.30.0.1:443: i/o timeout ... If any preserved cluster needed, please contact me for a reproduce and reservation.
Hit it again on 4.3.0-0.nightly-2019-11-29-013902 when deploy upi/vsphere cluster with http proxy enable. http_proxy: "http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@139.178.76.57:3128" https_proxy: "http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@139.178.76.57:3128" no_proxy: "test.no-proxy.com" [root@preserve-jliu-worker tmp]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 77m Working towards 4.3.0-0.nightly-2019-11-29-013902: 72% complete [root@preserve-jliu-worker tmp]# [root@preserve-jliu-worker tmp]# oc get configs.imageregistry.operator.openshift.io cluster Error from server (NotFound): configs.imageregistry.operator.openshift.io "cluster" not found [root@preserve-jliu-worker tmp]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE cloud-credential True False False 74m dns 4.3.0-0.nightly-2019-11-29-013902 True False False 66m insights 4.3.0-0.nightly-2019-11-29-013902 True False False 70m kube-apiserver 4.3.0-0.nightly-2019-11-29-013902 True True True 67m kube-controller-manager 4.3.0-0.nightly-2019-11-29-013902 True True True 66m kube-scheduler 4.3.0-0.nightly-2019-11-29-013902 True True True 66m machine-api 4.3.0-0.nightly-2019-11-29-013902 True False False 67m machine-config 4.3.0-0.nightly-2019-11-29-013902 False True True 70m network 4.3.0-0.nightly-2019-11-29-013902 True False False 61m openshift-apiserver 4.3.0-0.nightly-2019-11-29-013902 False False False 67m openshift-controller-manager False True False 70m operator-lifecycle-manager-catalog 4.3.0-0.nightly-2019-11-29-013902 True False False 67m service-ca 4.3.0-0.nightly-2019-11-29-013902 True False False 70m [root@preserve-jliu-worker tmp]# oc get machineconfig NAME GENERATEDBYCONTROLLER IGNITIONVERSION CREATED 99-master-ssh 2.2.0 70m 99-worker-ssh 2.2.0 70m [root@preserve-jliu-worker tmp]# oc describe co machine-config Name: machine-config Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-11-29T06:32:57Z Generation: 1 Resource Version: 16403 Self Link: /apis/config.openshift.io/v1/clusteroperators/machine-config UID: 02e1ecab-cdb9-4c34-80ee-4de528c4e7e6 Spec: Status: Conditions: Last Transition Time: 2019-11-29T06:32:57Z Message: Cluster not available for 4.3.0-0.nightly-2019-11-29-013902 Status: False Type: Available Last Transition Time: 2019-11-29T06:32:57Z Message: Cluster is bootstrapping 4.3.0-0.nightly-2019-11-29-013902 Status: True Type: Progressing Last Transition Time: 2019-11-29T06:46:36Z Message: Failed to resync 4.3.0-0.nightly-2019-11-29-013902 because: timed out waiting for the condition during waitForDeploymentRollout: Deployment machine-config-controller is not ready. status: (replicas: 1, updated: 1, ready: 0, unavailable: 1) Reason: MachineConfigControllerFailed Status: True Type: Degraded Last Transition Time: 2019-11-29T06:46:36Z Reason: AsExpected Status: True Type: Upgradeable Extension: [root@preserve-jliu-worker tmp]# oc describe pod machine-config-controller-65d4889785-2c9kc -n openshift-machine-config-operator Name: machine-config-controller-65d4889785-2c9kc Namespace: openshift-machine-config-operator Priority: 2000000000 Priority Class Name: system-cluster-critical Node: <none> Labels: k8s-app=machine-config-controller pod-template-hash=65d4889785 Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/machine-config-controller-65d4889785 Containers: machine-config-controller: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4a439c4a128260accac47c791bed2a318f95bdd17d93b5903ab7f8780ef99baf Port: <none> Host Port: <none> Command: /usr/bin/machine-config-controller Args: start --resourcelock-namespace=openshift-machine-config-operator --v=2 Requests: cpu: 20m memory: 50Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from machine-config-controller-token-zcfn5 (ro) Volumes: machine-config-controller-token-zcfn5: Type: Secret (a volume populated by a Secret) SecretName: machine-config-controller-token-zcfn5 Optional: false QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/not-ready:NoExecute for 120s node.kubernetes.io/unreachable:NoExecute for 120s Events: <none> # oc describe co openshift-apiserver Name: openshift-apiserver Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2019-11-29T06:33:30Z Generation: 1 Resource Version: 5894 Self Link: /apis/config.openshift.io/v1/clusteroperators/openshift-apiserver UID: 079fc917-a6b4-4766-80d7-a4137f5471b5 Spec: Status: Conditions: Last Transition Time: 2019-11-29T06:36:19Z Reason: AsExpected Status: False Type: Degraded Last Transition Time: 2019-11-29T06:36:39Z Reason: AsExpected Status: False Type: Progressing Last Transition Time: 2019-11-29T06:36:26Z Message: Available: no openshift-apiserver daemon pods available on any node. Reason: AvailableNoAPIServerPod Status: False Type: Available Last Transition Time: 2019-11-29T06:33:31Z Reason: AsExpected Status: True Type: Upgradeable Extension: <nil>
All previously reported failures here are different from one another. Rather than re-opening this one please create a new bug with complete set of standard Installer debugging data and we'll look into that.
> All previously reported failures here are different from one another. Rather than re-opening this one please create a new bug with complete set of standard Installer debugging data and we'll look into that. For this bug(bz1717257), it's only used to track one issue that "no image registry generated after bootstrap complete". There are not many previous failures, but only one failure which have been splitted from bug #1702615 here. Since it's not 100% reproduce in v4.1, and not reproduce in v4.2, so the bug was closed with INSUFFICIENT_DATA in v4.2. Now in v4.3, from 4.3.0-0.nightly-2019-11-27-041100, we always hit it, so reopen the bug again due to the same issue. If it's not convinient to track the same issue in the same bug, then we can open a new bug to track it, and restore this one to correct status.
Tracked the v4.3 issue in https://bugzilla.redhat.com/show_bug.cgi?id=1779005.