Bug 2314998

Summary: [ODF on ROSA HCP] MDSCacheUsageHigh not found with active node drained
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Daniel Osypenko <dosypenk>
Component: ocs-operatorAssignee: Mudit Agarwal <muagarwa>
Status: ASSIGNED --- QA Contact: Elad <ebenahar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.17CC: kmajumde, nberry, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Osypenko 2024-09-26 20:06:56 UTC
Description of problem (please be detailed as possible and provide log
snippests):
During the test execution of the test_mds_cache_alert_with_active_node_drain we were running metadata io with cephfs by steps:
    1. Create PVC with Cephfs, access mode RWX
    2. Create dc pod with Fedora image
    3. Copy helper_scripts/meta_data_io.py to Fedora dc pod
    4. Run meta_data_io.py on fedora pod 
script can be found by link https://github.com/red-hat-storage/ocs-ci/blob/e4bcbb284280862d03b7f6b5ab2b40e2727482f3/ocs_ci/templates/workloads/helper_scripts/meta_data_io.py

This script triggers high cache usage in scenario when standby-replay mds scaled down, but does not trigger when active node drained, showing the problem is related to active mds node disruption happens

Version of all relevant components (if applicable):
OC version:
Client Version: 4.16.11
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.16.12
Kubernetes Version: v1.29.8+f10c92d

OCS version:
ocs-operator.v4.16.2-rhodf              OpenShift Container Storage        4.16.2-rhodf   ocs-operator.v4.16.1-rhodf              Succeeded

ODF operator full version: 4.16.2-4

Cluster version:
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.12   True        False         12h     Error while reconciling 4.16.12: the cluster operator insights is not available


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
pottentially

Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
1/1

Can this issue reproduce from the UI?
no

If this is a regression, please provide more details to justify this:
new deployment. Tech preview

Steps to Reproduce:
1. Deploy ROSA HCP cluster with ODF and run test_mds_cache_alert_with_active_node_drain
2.
3.


Actual results:
There was not found alert MDSCacheUsageHigh

Expected results:
MDSCacheUsageHigh is fired when conditions met

Additional info:
cluster to capture necessary data will be created upon request to qe