Bug 2154250

Summary: NooBaa Bucket Quota alerts are not working
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Filip Balák <fbalak>
Component: Multi-Cloud Object GatewayAssignee: Vinayak Hariharmath <vharihar>
Status: CLOSED ERRATA QA Contact: Filip Balák <fbalak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.12CC: dzaken, ebenahar, muagarwa, nbecker, ocs-bugs, odf-bz-bot, vharihar
Target Milestone: ---Keywords: AutomationBlocker
Target Release: ODF 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.13.0-197 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-21 15:22:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Filip Balák 2022-12-16 11:22:42 UTC
Description of problem (please be detailed as possible and provide log
snippests):
Alerts NooBaaBucketReachingQuotaState and BucketExceedingQuotaState are not triggered when NooBaa Bucket is fully utilized and quota reached.

Version of all relevant components (if applicable):
ocs-registry:4.12.0-114

Can this issue reproducible?
https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/6998/291593/291682/291683/log
https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/6898/286901/286995/286996/log
https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/6545/269251/269322/269323/log

Steps to Reproduce:
1. Create bucket in NooBaa and set capacity quota to 2 GB (RPC call {'name': '<bucket-name>', 'quota': {'unit': 'GIGABYTE', 'size': 2}}).
2. Upload 5 files with size 500MB into the bucket.
3. Check alerts in ODF Monitoring.

Actual results:
Alerts NooBaaBucketReachingQuotaState and BucketExceedingQuotaState are not triggered.

Expected results:
Alerts NooBaaBucketReachingQuotaState and BucketExceedingQuotaState should be triggered.

Additional info:
Tested with automation: https://github.com/red-hat-storage/ocs-ci/blob/b28dfcd0e3f7fcf624d6590dd40255821058fbf7/tests/manage/monitoring/prometheus/test_noobaa.py#L21

Comment 3 Elad 2022-12-20 13:49:00 UTC
Filip, is it a regression?

Comment 10 Elad 2023-01-12 10:01:39 UTC
@nbecker What's the feasibility of fixing and backporting to an upcoming 4.12.z?

Comment 11 Nimrod Becker 2023-01-12 11:52:14 UTC
For a z ? High feasibility :)
Depending on timing ofc, wouldn't target this for 4.12.1 for example, but let's say 4.12.2 sure

Comment 21 Filip Balák 2023-04-28 09:46:20 UTC
NooBaa bucket quota alerts are not raised: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7514/consoleFull
Tested with ocs-build 4.13.0-130

Comment 22 Vinayak Hariharmath 2023-05-03 10:16:01 UTC
Hello Filip,

We need to modify the alert names in the quota tests according to https://github.com/noobaa/noobaa-operator/pull/1067 and https://github.com/noobaa/noobaa-operator/pull/1117

Regards
Vinayak

Comment 24 Filip Balák 2023-05-31 10:15:43 UTC
No noobaa alerts are raised with a given reproducer (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7984). Was there any change related to the rpc? What would be a valid test case to test the scenario?

Tested with ocs-registry:4.13.0-206

Comment 25 Danny 2023-05-31 14:13:35 UTC
Hi Filip,

Did you modify the test according to comment 22?

Comment 26 Filip Balák 2023-06-01 08:15:18 UTC
Yes, alert names are edited. We also gather all alerts that get raised from prometheus during time period of the test. This is the list of alerts for a test with ocs 4.13.0-207 (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/25106/). There are no noobaaa alerts:
[{'labels': {'alertname': 'InsightsRecommendationActive', 'container': 'insights-operator', 'description': 'Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated', 'endpoint': 'https', 'info_link': 'https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY', 'instance': '10.128.0.17:8443', 'job': 'metrics', 'namespace': 'openshift-insights', 'pod': 'insights-operator-f79f95cd7-qvsjw', 'service': 'metrics', 'severity': 'info', 'total_risk': 'Low'}, 'annotations': {'description': 'Insights recommendation "Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated" with total risk "Low" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY.', 'summary': 'An Insights recommendation is active for this cluster.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:50.82591032Z', 'value': '1e+00'}, {'labels': {'alertname': 'InsightsRecommendationActive', 'container': 'insights-operator', 'description': 'The Image Registry Operator fails to apply Image Registry configuration when multiple storage types are specified', 'endpoint': 'https', 'info_link': 'https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.image_registry_multiple_storage_types|IMAGE_REGISTRY_MULTIPLE_STORAGE_TYPES', 'instance': '10.128.0.17:8443', 'job': 'metrics', 'namespace': 'openshift-insights', 'pod': 'insights-operator-f79f95cd7-qvsjw', 'service': 'metrics', 'severity': 'info', 'total_risk': 'Moderate'}, 'annotations': {'description': 'Insights recommendation "The Image Registry Operator fails to apply Image Registry configuration when multiple storage types are specified" with total risk "Moderate" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.image_registry_multiple_storage_types|IMAGE_REGISTRY_MULTIPLE_STORAGE_TYPES.', 'summary': 'An Insights recommendation is active for this cluster.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:50.82591032Z', 'value': '1e+00'}, {'labels': {'alertname': 'AlertmanagerReceiversNotConfigured', 'namespace': 'openshift-monitoring', 'severity': 'warning'}, 'annotations': {'description': 'Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.', 'summary': 'Receivers (notification integrations) are not configured on Alertmanager'}, 'state': 'pending', 'activeAt': '2023-05-31T12:34:54.177208044Z', 'value': '0e+00'}, {'labels': {'alertname': 'Watchdog', 'namespace': 'openshift-monitoring', 'severity': 'none'}, 'annotations': {'description': 'This is an alert meant to ensure that the entire alerting pipeline is functional.\nThis alert is always firing, therefore it should always be firing in Alertmanager\nand always fire against a receiver. There are integrations with various notification\nmechanisms that send a notification when this alert is not firing. For example the\n"DeadMansSnitch" integration in PagerDuty.\n', 'summary': 'An alert that should always be firing to certify that Alertmanager is working properly.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:26.791051673Z', 'value': '1e+00'}, {'labels': {'alertname': 'AlertmanagerReceiversNotConfigured', 'namespace': 'openshift-monitoring', 'severity': 'warning'}, 'annotations': {'description': 'Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.', 'summary': 'Receivers (notification integrations) are not configured on Alertmanager'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:54.177208044Z', 'value': '0e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-storage', 'pod': 's3cli-0', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-storage/s3cli-0 has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeStatefulSetReplicasMismatch', 'container': 'kube-rbac-proxy-main', 'endpoint': 'https-main', 'job': 'kube-state-metrics', 'namespace': 'openshift-storage', 'service': 'kube-state-metrics', 'severity': 'warning', 'statefulset': 's3cli'}, 'annotations': {'description': 'StatefulSet openshift-storage/s3cli has not matched the expected number of replicas for longer than 15 minutes.', 'summary': 'Deployment has not matched the expected number of replicas.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '0e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 's3cli', 'namespace': 'openshift-storage', 'pod': 's3cli-0', 'severity': 'warning'}, 'annotations': {'description': 'pod/s3cli-0 in namespace openshift-storage on container s3cli has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'collect-profiles', 'namespace': 'openshift-operator-lifecycle-manager', 'pod': 'collect-profiles-28092285-vmbr9', 'severity': 'warning'}, 'annotations': {'description': 'pod/collect-profiles-28092285-vmbr9 in namespace openshift-operator-lifecycle-manager on container collect-profiles has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:45:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-marketplace', 'pod': 'community-operators-v5cqb', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-marketplace/community-operators-v5cqb has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:51:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'registry-server', 'namespace': 'openshift-marketplace', 'pod': 'community-operators-v5cqb', 'severity': 'warning'}, 'annotations': {'description': 'pod/community-operators-v5cqb in namespace openshift-marketplace on container registry-server has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:51:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-marketplace', 'pod': 'redhat-operators-r4dw8', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-marketplace/redhat-operators-r4dw8 has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:52:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'registry-server', 'namespace': 'openshift-marketplace', 'pod': 'redhat-operators-r4dw8', 'severity': 'warning'}, 'annotations': {'description': 'pod/redhat-operators-r4dw8 in namespace openshift-marketplace on container registry-server has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:52:04.49920567Z', 'value': '1e+00'}]

Comment 29 Mudit Agarwal 2023-06-02 11:15:14 UTC
Doc text is not required

Comment 30 Filip Balák 2023-06-02 11:51:45 UTC
The collection of alerts from comment 26 took 14 minutes. During that time there should be at least pending alert.

Comment 33 Filip Balák 2023-06-06 11:14:15 UTC
Size quota alerts work when max-size is set via noobaa cli. --> VERIFEID

Tested with 4.13.0-0.nightly-2023-06-03-192019

Comment 35 errata-xmlrpc 2023-06-21 15:22:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742