Description of problem (please be detailed as possible and provide log snippests): Alerts NooBaaBucketReachingQuotaState and BucketExceedingQuotaState are not triggered when NooBaa Bucket is fully utilized and quota reached. Version of all relevant components (if applicable): ocs-registry:4.12.0-114 Can this issue reproducible? https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/6998/291593/291682/291683/log https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/6898/286901/286995/286996/log https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/362/6545/269251/269322/269323/log Steps to Reproduce: 1. Create bucket in NooBaa and set capacity quota to 2 GB (RPC call {'name': '<bucket-name>', 'quota': {'unit': 'GIGABYTE', 'size': 2}}). 2. Upload 5 files with size 500MB into the bucket. 3. Check alerts in ODF Monitoring. Actual results: Alerts NooBaaBucketReachingQuotaState and BucketExceedingQuotaState are not triggered. Expected results: Alerts NooBaaBucketReachingQuotaState and BucketExceedingQuotaState should be triggered. Additional info: Tested with automation: https://github.com/red-hat-storage/ocs-ci/blob/b28dfcd0e3f7fcf624d6590dd40255821058fbf7/tests/manage/monitoring/prometheus/test_noobaa.py#L21
Filip, is it a regression?
@nbecker What's the feasibility of fixing and backporting to an upcoming 4.12.z?
For a z ? High feasibility :) Depending on timing ofc, wouldn't target this for 4.12.1 for example, but let's say 4.12.2 sure
NooBaa bucket quota alerts are not raised: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7514/consoleFull Tested with ocs-build 4.13.0-130
Hello Filip, We need to modify the alert names in the quota tests according to https://github.com/noobaa/noobaa-operator/pull/1067 and https://github.com/noobaa/noobaa-operator/pull/1117 Regards Vinayak
No noobaa alerts are raised with a given reproducer (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7984). Was there any change related to the rpc? What would be a valid test case to test the scenario? Tested with ocs-registry:4.13.0-206
Hi Filip, Did you modify the test according to comment 22?
Yes, alert names are edited. We also gather all alerts that get raised from prometheus during time period of the test. This is the list of alerts for a test with ocs 4.13.0-207 (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/25106/). There are no noobaaa alerts: [{'labels': {'alertname': 'InsightsRecommendationActive', 'container': 'insights-operator', 'description': 'Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated', 'endpoint': 'https', 'info_link': 'https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY', 'instance': '10.128.0.17:8443', 'job': 'metrics', 'namespace': 'openshift-insights', 'pod': 'insights-operator-f79f95cd7-qvsjw', 'service': 'metrics', 'severity': 'info', 'total_risk': 'Low'}, 'annotations': {'description': 'Insights recommendation "Prometheus metrics data will be lost when the Prometheus pod is restarted or recreated" with total risk "Low" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.empty_prometheus_db_volume|PROMETHEUS_DB_VOLUME_IS_EMPTY.', 'summary': 'An Insights recommendation is active for this cluster.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:50.82591032Z', 'value': '1e+00'}, {'labels': {'alertname': 'InsightsRecommendationActive', 'container': 'insights-operator', 'description': 'The Image Registry Operator fails to apply Image Registry configuration when multiple storage types are specified', 'endpoint': 'https', 'info_link': 'https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.image_registry_multiple_storage_types|IMAGE_REGISTRY_MULTIPLE_STORAGE_TYPES', 'instance': '10.128.0.17:8443', 'job': 'metrics', 'namespace': 'openshift-insights', 'pod': 'insights-operator-f79f95cd7-qvsjw', 'service': 'metrics', 'severity': 'info', 'total_risk': 'Moderate'}, 'annotations': {'description': 'Insights recommendation "The Image Registry Operator fails to apply Image Registry configuration when multiple storage types are specified" with total risk "Moderate" was detected on the cluster. More information is available at https://console.redhat.com/openshift/insights/advisor/clusters/8b93be87-48b3-4db3-8d4d-c7b4383344b1?first=ccx_rules_ocp.external.rules.image_registry_multiple_storage_types|IMAGE_REGISTRY_MULTIPLE_STORAGE_TYPES.', 'summary': 'An Insights recommendation is active for this cluster.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:50.82591032Z', 'value': '1e+00'}, {'labels': {'alertname': 'AlertmanagerReceiversNotConfigured', 'namespace': 'openshift-monitoring', 'severity': 'warning'}, 'annotations': {'description': 'Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.', 'summary': 'Receivers (notification integrations) are not configured on Alertmanager'}, 'state': 'pending', 'activeAt': '2023-05-31T12:34:54.177208044Z', 'value': '0e+00'}, {'labels': {'alertname': 'Watchdog', 'namespace': 'openshift-monitoring', 'severity': 'none'}, 'annotations': {'description': 'This is an alert meant to ensure that the entire alerting pipeline is functional.\nThis alert is always firing, therefore it should always be firing in Alertmanager\nand always fire against a receiver. There are integrations with various notification\nmechanisms that send a notification when this alert is not firing. For example the\n"DeadMansSnitch" integration in PagerDuty.\n', 'summary': 'An alert that should always be firing to certify that Alertmanager is working properly.'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:26.791051673Z', 'value': '1e+00'}, {'labels': {'alertname': 'AlertmanagerReceiversNotConfigured', 'namespace': 'openshift-monitoring', 'severity': 'warning'}, 'annotations': {'description': 'Alerts are not configured to be sent to a notification system, meaning that you may not be notified in a timely fashion when important failures occur. Check the OpenShift documentation to learn how to configure notifications with Alertmanager.', 'summary': 'Receivers (notification integrations) are not configured on Alertmanager'}, 'state': 'firing', 'activeAt': '2023-05-31T12:34:54.177208044Z', 'value': '0e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-storage', 'pod': 's3cli-0', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-storage/s3cli-0 has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeStatefulSetReplicasMismatch', 'container': 'kube-rbac-proxy-main', 'endpoint': 'https-main', 'job': 'kube-state-metrics', 'namespace': 'openshift-storage', 'service': 'kube-state-metrics', 'severity': 'warning', 'statefulset': 's3cli'}, 'annotations': {'description': 'StatefulSet openshift-storage/s3cli has not matched the expected number of replicas for longer than 15 minutes.', 'summary': 'Deployment has not matched the expected number of replicas.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '0e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 's3cli', 'namespace': 'openshift-storage', 'pod': 's3cli-0', 'severity': 'warning'}, 'annotations': {'description': 'pod/s3cli-0 in namespace openshift-storage on container s3cli has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:44:34.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'collect-profiles', 'namespace': 'openshift-operator-lifecycle-manager', 'pod': 'collect-profiles-28092285-vmbr9', 'severity': 'warning'}, 'annotations': {'description': 'pod/collect-profiles-28092285-vmbr9 in namespace openshift-operator-lifecycle-manager on container collect-profiles has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:45:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-marketplace', 'pod': 'community-operators-v5cqb', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-marketplace/community-operators-v5cqb has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:51:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'registry-server', 'namespace': 'openshift-marketplace', 'pod': 'community-operators-v5cqb', 'severity': 'warning'}, 'annotations': {'description': 'pod/community-operators-v5cqb in namespace openshift-marketplace on container registry-server has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:51:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubePodNotReady', 'namespace': 'openshift-marketplace', 'pod': 'redhat-operators-r4dw8', 'severity': 'warning'}, 'annotations': {'description': 'Pod openshift-marketplace/redhat-operators-r4dw8 has been in a non-ready state for longer than 15 minutes.', 'runbook_url': 'https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubePodNotReady.md', 'summary': 'Pod has been in a non-ready state for more than 15 minutes.'}, 'state': 'pending', 'activeAt': '2023-05-31T12:52:04.49920567Z', 'value': '1e+00'}, {'labels': {'alertname': 'KubeContainerWaiting', 'container': 'registry-server', 'namespace': 'openshift-marketplace', 'pod': 'redhat-operators-r4dw8', 'severity': 'warning'}, 'annotations': {'description': 'pod/redhat-operators-r4dw8 in namespace openshift-marketplace on container registry-server has been in waiting state for longer than 1 hour.', 'summary': 'Pod container waiting longer than 1 hour'}, 'state': 'pending', 'activeAt': '2023-05-31T12:52:04.49920567Z', 'value': '1e+00'}]
Doc text is not required
The collection of alerts from comment 26 took 14 minutes. During that time there should be at least pending alert.
Size quota alerts work when max-size is set via noobaa cli. --> VERIFEID Tested with 4.13.0-0.nightly-2023-06-03-192019
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742