Bug 2154250 - NooBaa Bucket Quota alerts are not working
Summary: NooBaa Bucket Quota alerts are not working
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.13.0
Assignee: Vinayak Hariharmath
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-16 11:22 UTC by Filip Balák
Modified: 2023-09-26 11:44 UTC (History)
7 users (show)

Fixed In Version: 4.13.0-197
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-21 15:22:55 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-operator pull 1067 0 None Merged Noobaa bucket quota alerts are not working 2023-03-06 13:57:44 UTC
Github noobaa noobaa-operator pull 1117 0 None Merged Adding quantity quota alerts 2023-05-11 11:26:23 UTC
Github noobaa noobaa-operator pull 1125 0 None Merged changed the description of quota alerts to align with mixins 2023-05-09 10:03:08 UTC
Github noobaa noobaa-operator pull 1126 0 None Merged [Backport to 5.13] - Adding quantity quota alerts 2023-05-11 11:26:20 UTC
Github noobaa noobaa-operator pull 1127 0 None Merged Remove unwanted character "\" from the description 2023-05-11 11:28:02 UTC
Red Hat Product Errata RHBA-2023:3742 0 None None None 2023-06-21 15:23:42 UTC

Description Filip Balák 2022-12-16 11:22:42 UTC
Description of problem (please be detailed as possible and provide log
snippests):
Alerts NooBaaBucketReachingQuotaState and BucketExceedingQuotaState are not triggered when NooBaa Bucket is fully utilized and quota reached.

Version of all relevant components (if applicable):
ocs-registry:4.12.0-114

Can this issue reproducible?
https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/6998/291593/291682/291683/log
https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/6898/286901/286995/286996/log
https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/6545/269251/269322/269323/log

Steps to Reproduce:
1. Create bucket in NooBaa and set capacity quota to 2 GB (RPC call {'name': '<bucket-name>', 'quota': {'unit': 'GIGABYTE', 'size': 2}}).
2. Upload 5 files with size 500MB into the bucket.
3. Check alerts in ODF Monitoring.

Actual results:
Alerts NooBaaBucketReachingQuotaState and BucketExceedingQuotaState are not triggered.

Expected results:
Alerts NooBaaBucketReachingQuotaState and BucketExceedingQuotaState should be triggered.

Additional info:
Tested with automation: https://github.com/red-hat-storage/ocs-ci/blob/b28dfcd0e3f7fcf624d6590dd40255821058fbf7/tests/manage/monitoring/prometheus/test_noobaa.py#L21

Comment 3 Elad 2022-12-20 13:49:00 UTC
Filip, is it a regression?

Comment 10 Elad 2023-01-12 10:01:39 UTC
@nbecker What's the feasibility of fixing and backporting to an upcoming 4.12.z?

Comment 11 Nimrod Becker 2023-01-12 11:52:14 UTC
For a z ? High feasibility :)
Depending on timing ofc, wouldn't target this for 4.12.1 for example, but let's say 4.12.2 sure

Comment 21 Filip Balák 2023-04-28 09:46:20 UTC
NooBaa bucket quota alerts are not raised: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7514/consoleFull
Tested with ocs-build 4.13.0-130

Comment 22 Vinayak Hariharmath 2023-05-03 10:16:01 UTC
Hello Filip,

We need to modify the alert names in the quota tests according to https://github.com/noobaa/noobaa-operator/pull/1067 and https://github.com/noobaa/noobaa-operator/pull/1117

Regards
Vinayak

Comment 24 Filip Balák 2023-05-31 10:15:43 UTC
No noobaa alerts are raised with a given reproducer (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7984). Was there any change related to the rpc? What would be a valid test case to test the scenario?

Tested with ocs-registry:4.13.0-206

Comment 25 Danny 2023-05-31 14:13:35 UTC
Hi Filip,

Did you modify the test according to comment 22?

Comment 26 Filip Balák 2023-06-01 08:15:18 UTC
Yes, alert names are edited. We also gather all alerts that get raised from prometheus during time period of the test. This is the list of alerts for a test with ocs 4.13.0-207 (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/25106/). There are no noobaaa alerts:
[{'labels': {'alertname': 'InsightsRecommendationActive', 'container': 'insights-operator', 'description': 'Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated', 'endpoint': 'https', 'info_link': 'https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY', 'instance': '10.128.0.17:8443', 'job': 'metrics', 'namespace': 'openshift-insights', 'pod': 'insights-operator-f79f95cd7-qvsjw', 'service': 'metrics', 'severity': 'info', 'total_risk': 'Low'}, 'annotations': {'description': 'Insights recommendation "Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated" with total risk "Low" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY.', 'summary': 'An Insights recommendation is active for this cluster.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:50.82591032Z', 'value': '1e+00'}, {'labels': {'alertname': 'InsightsRecommendationActive', 'container': 'insights-operator', 'description': 'The Image Registry Operator fails to apply Image Registry configuration when multiple storage types are specified', 'endpoint': 'https', 'info_link': 'https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.image_registry_multiple_storage_types|IMAGE_REGISTRY_MULTIPLE_STORAGE_TYPES', 'instance': '10.128.0.17:8443', 'job': 'metrics', 'namespace': 'openshift-insights', 'pod': 'insights-operator-f79f95cd7-qvsjw', 'service': 'metrics', 'severity': 'info', 'total_risk': 'Moderate'}, 'annotations': {'description': 'Insights recommendation "The Image Registry Operator fails to apply Image Registry configuration when multiple storage types are specified" with total risk "Moderate" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.image_registry_multiple_storage_types|IMAGE_REGISTRY_MULTIPLE_STORAGE_TYPES.', 'summary': 'An Insights recommendation is active for this cluster.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:50.82591032Z', 'value': '1e+00'}, {'labels': {'alertname': 'AlertmanagerReceiversNotConfigured', 'namespace': 'openshift-monitoring', 'severity': 'warning'}, 'annotations': {'description': 'Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.', 'summary': 'Receivers (notification integrations) are not configured on Alertmanager'}, 'state': 'pending', 'activeAt': '2023-05-31T12:34:54.177208044Z', 'value': '0e+00'}, {'labels': {'alertname': 'Watchdog', 'namespace': 'openshift-monitoring', 'severity': 'none'}, 'annotations': {'description': 'This is an alert meant to ensure that the entire alerting pipeline is functional.\nThis alert is always firing, therefore it should always be firing in Alertmanager\nand always fire against a receiver. There are integrations with various notification\nmechanisms that send a notification when this alert is not firing. For example the\n"DeadMansSnitch" integration in PagerDuty.\n', 'summary': 'An alert that should always be firing to certify that Alertmanager is working properly.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:26.791051673Z', 'value': '1e+00'}, {'labels': {'alertname': 'AlertmanagerReceiversNotConfigured', 'namespace': 'openshift-monitoring', 'severity': 'warning'}, 'annotations': {'description': 'Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.', 'summary': 'Receivers (notification integrations) are not configured on Alertmanager'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:54.177208044Z', 'value': '0e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-storage', 'pod': 's3cli-0', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-storage/s3cli-0 has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeStatefulSetReplicasMismatch', 'container': 'kube-rbac-proxy-main', 'endpoint': 'https-main', 'job': 'kube-state-metrics', 'namespace': 'openshift-storage', 'service': 'kube-state-metrics', 'severity': 'warning', 'statefulset': 's3cli'}, 'annotations': {'description': 'StatefulSet openshift-storage/s3cli has not matched the expected number of replicas for longer than 15 minutes.', 'summary': 'Deployment has not matched the expected number of replicas.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '0e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 's3cli', 'namespace': 'openshift-storage', 'pod': 's3cli-0', 'severity': 'warning'}, 'annotations': {'description': 'pod/s3cli-0 in namespace openshift-storage on container s3cli has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'collect-profiles', 'namespace': 'openshift-operator-lifecycle-manager', 'pod': 'collect-profiles-28092285-vmbr9', 'severity': 'warning'}, 'annotations': {'description': 'pod/collect-profiles-28092285-vmbr9 in namespace openshift-operator-lifecycle-manager on container collect-profiles has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:45:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-marketplace', 'pod': 'community-operators-v5cqb', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-marketplace/community-operators-v5cqb has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:51:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'registry-server', 'namespace': 'openshift-marketplace', 'pod': 'community-operators-v5cqb', 'severity': 'warning'}, 'annotations': {'description': 'pod/community-operators-v5cqb in namespace openshift-marketplace on container registry-server has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:51:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-marketplace', 'pod': 'redhat-operators-r4dw8', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-marketplace/redhat-operators-r4dw8 has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:52:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'registry-server', 'namespace': 'openshift-marketplace', 'pod': 'redhat-operators-r4dw8', 'severity': 'warning'}, 'annotations': {'description': 'pod/redhat-operators-r4dw8 in namespace openshift-marketplace on container registry-server has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:52:04.49920567Z', 'value': '1e+00'}]

Comment 29 Mudit Agarwal 2023-06-02 11:15:14 UTC
Doc text is not required

Comment 30 Filip Balák 2023-06-02 11:51:45 UTC
The collection of alerts from comment 26 took 14 minutes. During that time there should be at least pending alert.

Comment 33 Filip Balák 2023-06-06 11:14:15 UTC
Size quota alerts work when max-size is set via noobaa cli. --> VERIFEID

Tested with 4.13.0-0.nightly-2023-06-03-192019

Comment 35 errata-xmlrpc 2023-06-21 15:22:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742


Note You need to log in before you can comment on or make changes to this bug.