1802214 – Upgrading from 4.2 to 4.3 creates new alerting issues

Bug 1802214 - Upgrading from 4.2 to 4.3 creates new alerting issues

Summary: Upgrading from 4.2 to 4.3 creates new alerting issues

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Sergiusz Urbaniak
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-12 15:53 UTC by Matt Woodson
Modified:	2020-02-13 09:20 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-02-13 09:20:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Matt Woodson 2020-02-12 15:53:26 UTC

Description of problem:

After upgradign a cluster from 4.2.16 to 4.3.0, certain alerts start firing from alertmanager.

-----------------------------------------------------------
{"__name__":"ALERTS","alertname":"ClusterAutoscalerOperatorDown","alertstate":"firing","severity":"critical"}
{"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"cluster-autoscaler-operator","namespace":"openshift-machine-api","service":"cluster-autoscaler-operator","severity":"warning"}
{"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"metrics","namespace":"openshift-ingress-operator","service":"metrics","severity":"warning"}
{"__name__":"ALERTS","alertname":"UsingDeprecatedAPIExtensionsV1Beta1","alertstate":"firing","client":"Go-http-client/2.0","code":"200","component":"apiserver","contentType":"application/json","endpoint":"https","group":"extensions","instance":"10.0.138.104:6443","job":"apiserver","namespace":"default","resource":"deployments","scope":"namespace","service":"kubernetes","severity":"warning","verb":"GET","version":"v1beta1"}
{"__name__":"ALERTS","alertname":"UsingDeprecatedAPIExtensionsV1Beta1","alertstate":"firing","client":"velero-server/v1.1.0 (linux/amd64) a357f21aec6b39a8244dd23e469cc4519f1fe608","code":"200","component":"apiserver","contentType":"application/json","endpoint":"https","group":"extensions","instance":"10.0.138.104:6443","job":"apiserver","namespace":"default","resource":"podsecuritypolicies","scope":"cluster","service":"kubernetes","severity":"warning","verb":"LIST","version":"v1beta1"}

-----------------------------------------------------------

One thing to note is that when installing a fresh 4.3.0 cluster, we do not see these alerts.


Version-Release number of selected component (if applicable):

4.3.0


How reproducible:

Every time we upgrade

Comment 1 Matt Woodson 2020-02-12 15:55:24 UTC

This is the internal issues OSD is tracking:

https://issues.redhat.com/browse/OSD-2861

Comment 5 brad.williams 2020-02-12 16:15:11 UTC

After upgrading to 4.3, we observed these same alerts in the Starter clusters as well. 

The ones that seem most concerning are these 2:

Critical alert is firing: {u'state': u'firing', u'labels': {u'severity': u'critical', u'alertname': u'ClusterAutoscalerOperatorDown'}, u'annotations': {u'message': u'cluster-autoscaler-operator has disappeared from Prometheus target discovery.'}, u'value': u'1e+00', u'activeAt': u'2020-01-28T23:29:41.800595892Z'}

Alert is firing: {u'state': u'firing', u'labels': {u'job': u'cluster-autoscaler-operator', u'namespace': u'openshift-machine-api', u'alertname': u'TargetDown', u'service': u'cluster-autoscaler-operator', u'severity': u'warning'}, u'annotations': {u'message': u'100% of the cluster-autoscaler-operator targets in openshift-machine-api namespace are down.'}, u'value': u'1e+02', u'activeAt': u'2020-01-28T23:29:30.163677339Z'}

Comment 6 Matt Woodson 2020-02-12 17:58:40 UTC

following up here.

Here are two bugs that call are causing the issues:

Autoscaler:

https://bugzilla.redhat.com/show_bug.cgi?id=1801300

Ingress Operator:

https://bugzilla.redhat.com/show_bug.cgi?id=1802248

Comment 7 Lili Cosic 2020-02-13 09:20:46 UTC

Closing as discussed with Matt, as all the bugzillas are already open.

Note You need to log in before you can comment on or make changes to this bug.