Bug 2261881
Summary: | Documentation need to be corrected for MDSCacheUsageHigh alert. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Nagendra Reddy <nagreddy> |
Component: | ocs-operator | Assignee: | Santosh Pillai <sapillai> |
Status: | ASSIGNED --- | QA Contact: | Elad <ebenahar> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.15 | CC: | hnallurv, muagarwa, nigoyal, odf-bz-bot, sapillai |
Target Milestone: | --- | Flags: | sapillai:
needinfo?
(nagreddy) |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
Cause: Ceph returns `ceph_mds_mem_rss` metric in Kilobytes (KB)
Consequence: When the user is searching for the metric in OCS UI, the graphs shows the y axis in MB. This can cause confusion when the user is comparing the results for `MDSCacheUsageHigh` alert.
Workaround (if any): Use `ceph_mds_mem_rss * 1000` when searching for this metric in the Openshift UI to see the graph y axis in GB.
Result: Using `ceph_mds_mem_rss * 1000` will show the graph in GB, and user can easily compare the results shown in `MDSCacheUsageHigh` alert.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2246375 |
Description
Nagendra Reddy
2024-01-30 08:41:58 UTC
Below changes are required: 1. ceph_mds_mem_rss gives the wrong output. When there is an actual cache usage of 3GB, it will show it as 3MB. Please fix either query or documentation. Based on our previous discussions, we used "ceph_mds_mem_rss*1000" for testing. 2. Default is 4GB, but recomended is minimum 8GB. --> Default was 4GB in 4.14. But after upgrading to 4.15, we observed that the default reduced to 3GB. Need to be corrected in documentation. 3. Patch command need to be corrected -->When you recommended minimum 8GB of cache limit. You should increase MDS memory to 16GB, then only user will get 8GB of cache limit which is recommened when the alert is firing. Give the patch command with recommended values. Given patch: oc patch -n openshift-storage storagecluster ocs-storagecluster \ --type merge \ --patch '{"spec": {"resources": {"mds": {"limits": {"memory": "8Gi"},"requests": {"memory": "8Gi"}}}}}' Expecting below patch to have recommended Cache limit [8GB]: oc patch -n openshift-storage storagecluster ocs-storagecluster \ --type merge \ --patch '{"spec": {"resources": {"mds": {"limits": {"memory": "16Gi"},"requests": {"memory": "16Gi"}}}}}' Opened a PR for point number 2 - https://github.com/openshift/runbooks/pull/169 We don't need to change anything for point number 3. We decided to discuss the changes for point number 1 in 4.16. (In reply to Santosh Pillai from comment #3) > Opened a PR for point number 2 - > https://github.com/openshift/runbooks/pull/169 > > We don't need to change anything for point number 3. > > We decided to discuss the changes for point number 1 in 4.16. we discussed to give instructions/notes to use metric in a correct way like "ceph_mds_mem_rss*1000" to pull the accurate mds memory usage. It can be fixed in 4.16, till then we should provide instructions to use the metric with multiplier 1000 to convert the data MB to GB. Please make changes accordingly. (In reply to Nagendra Reddy from comment #4) > (In reply to Santosh Pillai from comment #3) > > Opened a PR for point number 2 - > > https://github.com/openshift/runbooks/pull/169 > > > > We don't need to change anything for point number 3. > > > > We decided to discuss the changes for point number 1 in 4.16. > > we discussed to give instructions/notes to use metric in a correct way like > "ceph_mds_mem_rss*1000" to pull the accurate mds memory usage. It can be > fixed in 4.16, till then we should provide instructions to use the metric > with multiplier 1000 to convert the data MB to GB. This will add more confusion to the customer. The customer can anyway see the correct units in the graph in the alert itself, correct? > > Please make changes accordingly. (In reply to Santosh Pillai from comment #5) > (In reply to Nagendra Reddy from comment #4) > > (In reply to Santosh Pillai from comment #3) > > > Opened a PR for point number 2 - > > > https://github.com/openshift/runbooks/pull/169 > > > > > > We don't need to change anything for point number 3. > > > > > > We decided to discuss the changes for point number 1 in 4.16. > > > > we discussed to give instructions/notes to use metric in a correct way like > > "ceph_mds_mem_rss*1000" to pull the accurate mds memory usage. It can be > > fixed in 4.16, till then we should provide instructions to use the metric > > with multiplier 1000 to convert the data MB to GB. > > This will add more confusion to the customer. The customer can anyway see > the correct units in the graph in the alert itself, correct? > > > > Please make changes accordingly. Let's make this as a known issue in 4.15 and work toward fixing it in 4.16. Santosh, could you please provide the doc text for known issue? since the documentation was fixed in 4.15 on how the use the query (ceph_mds_mem_rss * 1000) and changing the `ceph_mds_mem_rss` unit might require changes in ceph, I'll move this to 4.17 for now. |