Bug 1802214 - Upgrading from 4.2 to 4.3 creates new alerting issues
Summary: Upgrading from 4.2 to 4.3 creates new alerting issues
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-12 15:53 UTC by Matt Woodson
Modified: 2020-02-13 09:20 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-13 09:20:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Matt Woodson 2020-02-12 15:53:26 UTC
Description of problem:

After upgradign a cluster from 4.2.16 to 4.3.0, certain alerts start firing from alertmanager.

-----------------------------------------------------------
{"__name__":"ALERTS","alertname":"ClusterAutoscalerOperatorDown","alertstate":"firing","severity":"critical"}
{"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"cluster-autoscaler-operator","namespace":"openshift-machine-api","service":"cluster-autoscaler-operator","severity":"warning"}
{"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"metrics","namespace":"openshift-ingress-operator","service":"metrics","severity":"warning"}
{"__name__":"ALERTS","alertname":"UsingDeprecatedAPIExtensionsV1Beta1","alertstate":"firing","client":"Go-http-client/2.0","code":"200","component":"apiserver","contentType":"application/json","endpoint":"https","group":"extensions","instance":"10.0.138.104:6443","job":"apiserver","namespace":"default","resource":"deployments","scope":"namespace","service":"kubernetes","severity":"warning","verb":"GET","version":"v1beta1"}
{"__name__":"ALERTS","alertname":"UsingDeprecatedAPIExtensionsV1Beta1","alertstate":"firing","client":"velero-server/v1.1.0 (linux/amd64) a357f21aec6b39a8244dd23e469cc4519f1fe608","code":"200","component":"apiserver","contentType":"application/json","endpoint":"https","group":"extensions","instance":"10.0.138.104:6443","job":"apiserver","namespace":"default","resource":"podsecuritypolicies","scope":"cluster","service":"kubernetes","severity":"warning","verb":"LIST","version":"v1beta1"}

-----------------------------------------------------------

One thing to note is that when installing a fresh 4.3.0 cluster, we do not see these alerts.


Version-Release number of selected component (if applicable):

4.3.0


How reproducible:

Every time we upgrade

Comment 1 Matt Woodson 2020-02-12 15:55:24 UTC
This is the internal issues OSD is tracking:

https://issues.redhat.com/browse/OSD-2861

Comment 5 brad.williams 2020-02-12 16:15:11 UTC
After upgrading to 4.3, we observed these same alerts in the Starter clusters as well. 

The ones that seem most concerning are these 2:

Critical alert is firing: {u'state': u'firing', u'labels': {u'severity': u'critical', u'alertname': u'ClusterAutoscalerOperatorDown'}, u'annotations': {u'message': u'cluster-autoscaler-operator has disappeared from Prometheus target discovery.'}, u'value': u'1e+00', u'activeAt': u'2020-01-28T23:29:41.800595892Z'}

Alert is firing: {u'state': u'firing', u'labels': {u'job': u'cluster-autoscaler-operator', u'namespace': u'openshift-machine-api', u'alertname': u'TargetDown', u'service': u'cluster-autoscaler-operator', u'severity': u'warning'}, u'annotations': {u'message': u'100% of the cluster-autoscaler-operator targets in openshift-machine-api namespace are down.'}, u'value': u'1e+02', u'activeAt': u'2020-01-28T23:29:30.163677339Z'}

Comment 6 Matt Woodson 2020-02-12 17:58:40 UTC
following up here.

Here are two bugs that call are causing the issues:

Autoscaler:

https://bugzilla.redhat.com/show_bug.cgi?id=1801300

Ingress Operator:

https://bugzilla.redhat.com/show_bug.cgi?id=1802248

Comment 7 Lili Cosic 2020-02-13 09:20:46 UTC
Closing as discussed with Matt, as all the bugzillas are already open.


Note You need to log in before you can comment on or make changes to this bug.