Bug 1801960 - [Upgrade] ClusterAutoscalerOperatorDown shows down after upgrade from 4.2.16 to 4.3.0
Summary: [Upgrade] ClusterAutoscalerOperatorDown shows down after upgrade from 4.2.16 ...
Keywords:
Status: CLOSED DUPLICATE of bug 1801300
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.3.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Alberto
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-12 00:34 UTC by Sam Yangsao
Modified: 2020-02-13 14:50 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-13 14:50:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
TargetDown alert (121.90 KB, image/png)
2020-02-12 00:35 UTC, Sam Yangsao
no flags Details
ClusterAutoscalerOperatorDown alert (123.66 KB, image/png)
2020-02-12 00:35 UTC, Sam Yangsao
no flags Details

Description Sam Yangsao 2020-02-12 00:34:25 UTC
Description of problem:

ClusterAutoscalerOperatorDown shows down after upgrade from 4.2.16 to 4.3.0

Version-Release number of the following components:

4.3.0

How reproducible:

Unsure

Steps to Reproduce:

1.  Install OCP 4.2.9 on 3 masters & 2 workers
2.  Upgrade successfully to OCP 4.2.16, no issues
3.  Upgrade to OCP 4.3.0

Actual results:

UI shows 2 errors (see screenshots).

# failure1.png
Name: TargetDown
Message: 100% of the cluster-autoscaler-operator targets in openshift-machine-api namespace are down.

# failure2.png
Name: ClusterAutoscalerOperatorDown
Message: cluster-autoscaler-operator has disappeared from Prometheus target discovery.

Expected results:

There should be no issues upgrading from a clean environment.

Additional info:

must-gather fails as well.

# oc adm must-gather
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fa97dba5c09052c95c1f6471eb6a917cf18028368b2ce66dd95907aaa7bda4ac
[must-gather      ] OUT namespace/openshift-must-gather-lvwx7 created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-wzs4k created
[must-gather      ] OUT pod for plug-in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fa97dba5c09052c95c1f6471eb6a917cf18028368b2ce66dd95907aaa7bda4ac created
[must-gather-zkpdm] POD Wrote inspect data to must-gather.
[must-gather-zkpdm] POD Gathering data for ns/openshift-cluster-version...
[must-gather-zkpdm] POD Wrote inspect data to must-gather.
[must-gather-zkpdm] POD Gathering data for ns/openshift-config...
[must-gather-zkpdm] POD Gathering data for ns/openshift-config-managed...
[must-gather-zkpdm] POD Gathering data for ns/openshift-authentication...
[must-gather-zkpdm] POD Gathering data for ns/openshift-authentication-operator...
[must-gather-zkpdm] POD Gathering data for ns/openshift-ingress...
[must-gather-zkpdm] POD Gathering data for ns/openshift-cloud-credential-operator...
[must-gather-zkpdm] POD E0212 00:21:35.201118      48 portforward.go:400] an error occurred forwarding 37587 -> 9876: error forwarding port 9876 to pod 3c72c893b395ee6f72ba9b8cb489ce07fdf84515b8111317e55319aecc24718e, uid : exit status 1: 2020/02/12 00:21:35 socat[62883] E connect(5, AF=2 127.0.0.1:9876, 16): Connection refused
[must-gather-zkpdm] POD E0212 00:21:35.357299      48 portforward.go:400] an error occurred forwarding 37587 -> 9876: error forwarding port 9876 to pod 3c72c893b395ee6f72ba9b8cb489ce07fdf84515b8111317e55319aecc24718e, uid : exit status 1: 2020/02/12 00:21:35 socat[62892] E connect(5, AF=2 127.0.0.1:9876, 16): Connection refused
[must-gather-zkpdm] POD E0212 00:21:35.468147      48 portforward.go:400] an error occurred forwarding 37587 -> 9876: error forwarding port 9876 to pod 3c72c893b395ee6f72ba9b8cb489ce07fdf84515b8111317e55319aecc24718e, uid : exit status 1: 2020/02/12 00:21:35 socat[62899] E connect(5, AF=2 127.0.0.1:9876, 16): Connection refused
[must-gather-zkpdm] POD E0212 00:21:35.605111      48 portforward.go:400] an error occurred forwarding 37587 -> 9876: error forwarding port 9876 to pod 3c72c893b395ee6f72ba9b8cb489ce07fdf84515b8111317e55319aecc24718e, uid : exit status 1: 2020/02/12 00:21:35 socat[62922] E connect(5, AF=2 127.0.0.1:9876, 16): Connection refused
[must-gather-zkpdm] OUT gather logs unavailable: unexpected EOF
[must-gather-zkpdm] OUT waiting for gather to complete
[must-gather-zkpdm] OUT gather never finished: timed out waiting for the condition
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-wzs4k deleted
[must-gather      ] OUT namespace/openshift-must-gather-lvwx7 deleted
error: gather never finished for pod must-gather-zkpdm: timed out waiting for the condition

Comment 1 Sam Yangsao 2020-02-12 00:35:10 UTC
Created attachment 1662543 [details]
TargetDown alert

Comment 2 Sam Yangsao 2020-02-12 00:35:50 UTC
Created attachment 1662544 [details]
ClusterAutoscalerOperatorDown alert

Comment 3 Sam Yangsao 2020-02-12 00:36:43 UTC
# oc version
Client Version: openshift-clients-4.3.0-201910250623-88-g6a937dfe
Server Version: 4.3.0
Kubernetes Version: v1.16.2

Comment 5 Abhinav Dahiya 2020-02-12 17:40:43 UTC
looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1801300

Comment 6 Sam Yangsao 2020-02-13 14:50:59 UTC
Closing this as I was able to workaround this [1] issue and there is an existing BZ related to this.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1801300#c18

*** This bug has been marked as a duplicate of bug 1801300 ***


Note You need to log in before you can comment on or make changes to this bug.