Description of problem: ClusterAutoscalerOperatorDown shows down after upgrade from 4.2.16 to 4.3.0 Version-Release number of the following components: 4.3.0 How reproducible: Unsure Steps to Reproduce: 1. Install OCP 4.2.9 on 3 masters & 2 workers 2. Upgrade successfully to OCP 4.2.16, no issues 3. Upgrade to OCP 4.3.0 Actual results: UI shows 2 errors (see screenshots). # failure1.png Name: TargetDown Message: 100% of the cluster-autoscaler-operator targets in openshift-machine-api namespace are down. # failure2.png Name: ClusterAutoscalerOperatorDown Message: cluster-autoscaler-operator has disappeared from Prometheus target discovery. Expected results: There should be no issues upgrading from a clean environment. Additional info: must-gather fails as well. # oc adm must-gather [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fa97dba5c09052c95c1f6471eb6a917cf18028368b2ce66dd95907aaa7bda4ac [must-gather ] OUT namespace/openshift-must-gather-lvwx7 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-wzs4k created [must-gather ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fa97dba5c09052c95c1f6471eb6a917cf18028368b2ce66dd95907aaa7bda4ac created [must-gather-zkpdm] POD Wrote inspect data to must-gather. [must-gather-zkpdm] POD Gathering data for ns/openshift-cluster-version... [must-gather-zkpdm] POD Wrote inspect data to must-gather. [must-gather-zkpdm] POD Gathering data for ns/openshift-config... [must-gather-zkpdm] POD Gathering data for ns/openshift-config-managed... [must-gather-zkpdm] POD Gathering data for ns/openshift-authentication... [must-gather-zkpdm] POD Gathering data for ns/openshift-authentication-operator... [must-gather-zkpdm] POD Gathering data for ns/openshift-ingress... [must-gather-zkpdm] POD Gathering data for ns/openshift-cloud-credential-operator... [must-gather-zkpdm] POD E0212 00:21:35.201118 48 portforward.go:400] an error occurred forwarding 37587 -> 9876: error forwarding port 9876 to pod 3c72c893b395ee6f72ba9b8cb489ce07fdf84515b8111317e55319aecc24718e, uid : exit status 1: 2020/02/12 00:21:35 socat[62883] E connect(5, AF=2 127.0.0.1:9876, 16): Connection refused [must-gather-zkpdm] POD E0212 00:21:35.357299 48 portforward.go:400] an error occurred forwarding 37587 -> 9876: error forwarding port 9876 to pod 3c72c893b395ee6f72ba9b8cb489ce07fdf84515b8111317e55319aecc24718e, uid : exit status 1: 2020/02/12 00:21:35 socat[62892] E connect(5, AF=2 127.0.0.1:9876, 16): Connection refused [must-gather-zkpdm] POD E0212 00:21:35.468147 48 portforward.go:400] an error occurred forwarding 37587 -> 9876: error forwarding port 9876 to pod 3c72c893b395ee6f72ba9b8cb489ce07fdf84515b8111317e55319aecc24718e, uid : exit status 1: 2020/02/12 00:21:35 socat[62899] E connect(5, AF=2 127.0.0.1:9876, 16): Connection refused [must-gather-zkpdm] POD E0212 00:21:35.605111 48 portforward.go:400] an error occurred forwarding 37587 -> 9876: error forwarding port 9876 to pod 3c72c893b395ee6f72ba9b8cb489ce07fdf84515b8111317e55319aecc24718e, uid : exit status 1: 2020/02/12 00:21:35 socat[62922] E connect(5, AF=2 127.0.0.1:9876, 16): Connection refused [must-gather-zkpdm] OUT gather logs unavailable: unexpected EOF [must-gather-zkpdm] OUT waiting for gather to complete [must-gather-zkpdm] OUT gather never finished: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-wzs4k deleted [must-gather ] OUT namespace/openshift-must-gather-lvwx7 deleted error: gather never finished for pod must-gather-zkpdm: timed out waiting for the condition
Created attachment 1662543 [details] TargetDown alert
Created attachment 1662544 [details] ClusterAutoscalerOperatorDown alert
# oc version Client Version: openshift-clients-4.3.0-201910250623-88-g6a937dfe Server Version: 4.3.0 Kubernetes Version: v1.16.2
looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1801300
Closing this as I was able to workaround this [1] issue and there is an existing BZ related to this. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1801300#c18 *** This bug has been marked as a duplicate of bug 1801300 ***