Description of problem: After upgradign a cluster from 4.2.16 to 4.3.0, certain alerts start firing from alertmanager. ----------------------------------------------------------- {"__name__":"ALERTS","alertname":"ClusterAutoscalerOperatorDown","alertstate":"firing","severity":"critical"} {"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"cluster-autoscaler-operator","namespace":"openshift-machine-api","service":"cluster-autoscaler-operator","severity":"warning"} {"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"metrics","namespace":"openshift-ingress-operator","service":"metrics","severity":"warning"} {"__name__":"ALERTS","alertname":"UsingDeprecatedAPIExtensionsV1Beta1","alertstate":"firing","client":"Go-http-client/2.0","code":"200","component":"apiserver","contentType":"application/json","endpoint":"https","group":"extensions","instance":"10.0.138.104:6443","job":"apiserver","namespace":"default","resource":"deployments","scope":"namespace","service":"kubernetes","severity":"warning","verb":"GET","version":"v1beta1"} {"__name__":"ALERTS","alertname":"UsingDeprecatedAPIExtensionsV1Beta1","alertstate":"firing","client":"velero-server/v1.1.0 (linux/amd64) a357f21aec6b39a8244dd23e469cc4519f1fe608","code":"200","component":"apiserver","contentType":"application/json","endpoint":"https","group":"extensions","instance":"10.0.138.104:6443","job":"apiserver","namespace":"default","resource":"podsecuritypolicies","scope":"cluster","service":"kubernetes","severity":"warning","verb":"LIST","version":"v1beta1"} ----------------------------------------------------------- One thing to note is that when installing a fresh 4.3.0 cluster, we do not see these alerts. Version-Release number of selected component (if applicable): 4.3.0 How reproducible: Every time we upgrade
This is the internal issues OSD is tracking: https://issues.redhat.com/browse/OSD-2861
After upgrading to 4.3, we observed these same alerts in the Starter clusters as well. The ones that seem most concerning are these 2: Critical alert is firing: {u'state': u'firing', u'labels': {u'severity': u'critical', u'alertname': u'ClusterAutoscalerOperatorDown'}, u'annotations': {u'message': u'cluster-autoscaler-operator has disappeared from Prometheus target discovery.'}, u'value': u'1e+00', u'activeAt': u'2020-01-28T23:29:41.800595892Z'} Alert is firing: {u'state': u'firing', u'labels': {u'job': u'cluster-autoscaler-operator', u'namespace': u'openshift-machine-api', u'alertname': u'TargetDown', u'service': u'cluster-autoscaler-operator', u'severity': u'warning'}, u'annotations': {u'message': u'100% of the cluster-autoscaler-operator targets in openshift-machine-api namespace are down.'}, u'value': u'1e+02', u'activeAt': u'2020-01-28T23:29:30.163677339Z'}
following up here. Here are two bugs that call are causing the issues: Autoscaler: https://bugzilla.redhat.com/show_bug.cgi?id=1801300 Ingress Operator: https://bugzilla.redhat.com/show_bug.cgi?id=1802248
Closing as discussed with Matt, as all the bugzillas are already open.