Bug 1802214
| Summary: | Upgrading from 4.2 to 4.3 creates new alerting issues | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Matt Woodson <mwoodson> |
| Component: | Monitoring | Assignee: | Sergiusz Urbaniak <surbania> |
| Status: | CLOSED NOTABUG | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.3.0 | CC: | alegrand, anpicker, brad.williams, eparis, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-02-13 09:20:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This is the internal issues OSD is tracking: https://issues.redhat.com/browse/OSD-2861 After upgrading to 4.3, we observed these same alerts in the Starter clusters as well.
The ones that seem most concerning are these 2:
Critical alert is firing: {u'state': u'firing', u'labels': {u'severity': u'critical', u'alertname': u'ClusterAutoscalerOperatorDown'}, u'annotations': {u'message': u'cluster-autoscaler-operator has disappeared from Prometheus target discovery.'}, u'value': u'1e+00', u'activeAt': u'2020-01-28T23:29:41.800595892Z'}
Alert is firing: {u'state': u'firing', u'labels': {u'job': u'cluster-autoscaler-operator', u'namespace': u'openshift-machine-api', u'alertname': u'TargetDown', u'service': u'cluster-autoscaler-operator', u'severity': u'warning'}, u'annotations': {u'message': u'100% of the cluster-autoscaler-operator targets in openshift-machine-api namespace are down.'}, u'value': u'1e+02', u'activeAt': u'2020-01-28T23:29:30.163677339Z'}
following up here. Here are two bugs that call are causing the issues: Autoscaler: https://bugzilla.redhat.com/show_bug.cgi?id=1801300 Ingress Operator: https://bugzilla.redhat.com/show_bug.cgi?id=1802248 Closing as discussed with Matt, as all the bugzillas are already open. |
Description of problem: After upgradign a cluster from 4.2.16 to 4.3.0, certain alerts start firing from alertmanager. ----------------------------------------------------------- {"__name__":"ALERTS","alertname":"ClusterAutoscalerOperatorDown","alertstate":"firing","severity":"critical"} {"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"cluster-autoscaler-operator","namespace":"openshift-machine-api","service":"cluster-autoscaler-operator","severity":"warning"} {"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"metrics","namespace":"openshift-ingress-operator","service":"metrics","severity":"warning"} {"__name__":"ALERTS","alertname":"UsingDeprecatedAPIExtensionsV1Beta1","alertstate":"firing","client":"Go-http-client/2.0","code":"200","component":"apiserver","contentType":"application/json","endpoint":"https","group":"extensions","instance":"10.0.138.104:6443","job":"apiserver","namespace":"default","resource":"deployments","scope":"namespace","service":"kubernetes","severity":"warning","verb":"GET","version":"v1beta1"} {"__name__":"ALERTS","alertname":"UsingDeprecatedAPIExtensionsV1Beta1","alertstate":"firing","client":"velero-server/v1.1.0 (linux/amd64) a357f21aec6b39a8244dd23e469cc4519f1fe608","code":"200","component":"apiserver","contentType":"application/json","endpoint":"https","group":"extensions","instance":"10.0.138.104:6443","job":"apiserver","namespace":"default","resource":"podsecuritypolicies","scope":"cluster","service":"kubernetes","severity":"warning","verb":"LIST","version":"v1beta1"} ----------------------------------------------------------- One thing to note is that when installing a fresh 4.3.0 cluster, we do not see these alerts. Version-Release number of selected component (if applicable): 4.3.0 How reproducible: Every time we upgrade