Bug 2154250
| Summary: | NooBaa Bucket Quota alerts are not working | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Filip Balák <fbalak> |
| Component: | Multi-Cloud Object Gateway | Assignee: | Vinayak Hariharmath <vharihar> |
| Status: | CLOSED ERRATA | QA Contact: | Filip Balák <fbalak> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.12 | CC: | dzaken, ebenahar, muagarwa, nbecker, ocs-bugs, odf-bz-bot, vharihar |
| Target Milestone: | --- | Keywords: | AutomationBlocker |
| Target Release: | ODF 4.13.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.13.0-197 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-06-21 15:22:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Filip Balák
2022-12-16 11:22:42 UTC
Filip, is it a regression? @nbecker What's the feasibility of fixing and backporting to an upcoming 4.12.z? For a z ? High feasibility :) Depending on timing ofc, wouldn't target this for 4.12.1 for example, but let's say 4.12.2 sure NooBaa bucket quota alerts are not raised: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7514/consoleFull Tested with ocs-build 4.13.0-130 Hello Filip, We need to modify the alert names in the quota tests according to https://github.com/noobaa/noobaa-operator/pull/1067 and https://github.com/noobaa/noobaa-operator/pull/1117 Regards Vinayak No noobaa alerts are raised with a given reproducer (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7984). Was there any change related to the rpc? What would be a valid test case to test the scenario? Tested with ocs-registry:4.13.0-206 Hi Filip, Did you modify the test according to comment 22? Yes, alert names are edited. We also gather all alerts that get raised from prometheus during time period of the test. This is the list of alerts for a test with ocs 4.13.0-207 (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/25106/). There are no noobaaa alerts: [{'labels': {'alertname': 'InsightsRecommendationActive', 'container': 'insights-operator', 'description': 'Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated', 'endpoint': 'https', 'info_link': 'https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY', 'instance': '10.128.0.17:8443', 'job': 'metrics', 'namespace': 'openshift-insights', 'pod': 'insights-operator-f79f95cd7-qvsjw', 'service': 'metrics', 'severity': 'info', 'total_risk': 'Low'}, 'annotations': {'description': 'Insights recommendation "Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated" with total risk "Low" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY.', 'summary': 'An Insights recommendation is active for this cluster.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:50.82591032Z', 'value': '1e+00'}, {'labels': {'alertname': 'InsightsRecommendationActive', 'container': 'insights-operator', 'description': 'The Image Registry Operator fails to apply Image Registry configuration when multiple storage types are specified', 'endpoint': 'https', 'info_link': 'https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.image_registry_multiple_storage_types|IMAGE_REGISTRY_MULTIPLE_STORAGE_TYPES', 'instance': '10.128.0.17:8443', 'job': 'metrics', 'namespace': 'openshift-insights', 'pod': 'insights-operator-f79f95cd7-qvsjw', 'service': 'metrics', 'severity': 'info', 'total_risk': 'Moderate'}, 'annotations': {'description': 'Insights recommendation "The Image Registry Operator fails to apply Image Registry configuration when multiple storage types are specified" with total risk "Moderate" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.image_registry_multiple_storage_types|IMAGE_REGISTRY_MULTIPLE_STORAGE_TYPES.', 'summary': 'An Insights recommendation is active for this cluster.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:50.82591032Z', 'value': '1e+00'}, {'labels': {'alertname': 'AlertmanagerReceiversNotConfigured', 'namespace': 'openshift-monitoring', 'severity': 'warning'}, 'annotations': {'description': 'Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.', 'summary': 'Receivers (notification integrations) are not configured on Alertmanager'}, 'state': 'pending', 'activeAt': '2023-05-31T12:34:54.177208044Z', 'value': '0e+00'}, {'labels': {'alertname': 'Watchdog', 'namespace': 'openshift-monitoring', 'severity': 'none'}, 'annotations': {'description': 'This is an alert meant to ensure that the entire alerting pipeline is functional.\nThis alert is always firing, therefore it should always be firing in Alertmanager\nand always fire against a receiver. There are integrations with various notification\nmechanisms that send a notification when this alert is not firing. For example the\n"DeadMansSnitch" integration in PagerDuty.\n', 'summary': 'An alert that should always be firing to certify that Alertmanager is working properly.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:26.791051673Z', 'value': '1e+00'}, {'labels': {'alertname': 'AlertmanagerReceiversNotConfigured', 'namespace': 'openshift-monitoring', 'severity': 'warning'}, 'annotations': {'description': 'Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.', 'summary': 'Receivers (notification integrations) are not configured on Alertmanager'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:54.177208044Z', 'value': '0e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-storage', 'pod': 's3cli-0', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-storage/s3cli-0 has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeStatefulSetReplicasMismatch', 'container': 'kube-rbac-proxy-main', 'endpoint': 'https-main', 'job': 'kube-state-metrics', 'namespace': 'openshift-storage', 'service': 'kube-state-metrics', 'severity': 'warning', 'statefulset': 's3cli'}, 'annotations': {'description': 'StatefulSet openshift-storage/s3cli has not matched the expected number of replicas for longer than 15 minutes.', 'summary': 'Deployment has not matched the expected number of replicas.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '0e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 's3cli', 'namespace': 'openshift-storage', 'pod': 's3cli-0', 'severity': 'warning'}, 'annotations': {'description': 'pod/s3cli-0 in namespace openshift-storage on container s3cli has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'collect-profiles', 'namespace': 'openshift-operator-lifecycle-manager', 'pod': 'collect-profiles-28092285-vmbr9', 'severity': 'warning'}, 'annotations': {'description': 'pod/collect-profiles-28092285-vmbr9 in namespace openshift-operator-lifecycle-manager on container collect-profiles has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:45:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-marketplace', 'pod': 'community-operators-v5cqb', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-marketplace/community-operators-v5cqb has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:51:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'registry-server', 'namespace': 'openshift-marketplace', 'pod': 'community-operators-v5cqb', 'severity': 'warning'}, 'annotations': {'description': 'pod/community-operators-v5cqb in namespace openshift-marketplace on container registry-server has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:51:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-marketplace', 'pod': 'redhat-operators-r4dw8', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-marketplace/redhat-operators-r4dw8 has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:52:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'registry-server', 'namespace': 'openshift-marketplace', 'pod': 'redhat-operators-r4dw8', 'severity': 'warning'}, 'annotations': {'description': 'pod/redhat-operators-r4dw8 in namespace openshift-marketplace on container registry-server has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:52:04.49920567Z', 'value': '1e+00'}] Doc text is not required The collection of alerts from comment 26 took 14 minutes. During that time there should be at least pending alert. Size quota alerts work when max-size is set via noobaa cli. --> VERIFEID Tested with 4.13.0-0.nightly-2023-06-03-192019 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742 |