Bug 1944148
Summary: | [GSS][CephFS] health warning "MDS cache is too large (3GB/1GB); 0 inodes in use by clients, 0 stray files" for the standby-replay | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Geo Jose <gjose> | |
Component: | rook | Assignee: | Sébastien Han <shan> | |
Status: | CLOSED ERRATA | QA Contact: | Petr Balogh <pbalogh> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.7 | CC: | asriram, bkunal, ceph-eng-bugs, edonnell, etamir, madam, muagarwa, musoni, nravinas, ocs-bugs, pbalogh, pdonnell, shan, sweil, tdesala, tnielsen, vereddy | |
Target Milestone: | --- | Keywords: | AutomationBackLog, ZStream | |
Target Release: | OCS 4.7.1 | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | 4.7.1-403.ci | Doc Type: | Bug Fix | |
Doc Text: |
Previously, Rook did not apply `mds_cache_memory_limit` upon upgrades. This means OpenShift Container Storage 4.2 clusters that did not have that option applied were not updated with the correct value, which is typically half the size of the pod's memory limit. Therefore, MDSs in standby-replay may report oversized cache.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1951348 (view as bug list) | Environment: | ||
Last Closed: | 2021-06-15 16:50:37 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1938134, 1951348 |
Comment 1
RHEL Program Management
2021-03-29 12:10:48 UTC
Patrick, if mds_cache_memory_limit is 1GB, this means that the pod memory limit is 2GB. As far as I can tell the ocs-op resources are set to 8GB back to release-4.2 https://github.com/openshift/ocs-operator/blob/release-4.2/pkg/controller/defaults/resources.go So I'm not sure why that pod got such low memory allocated. Rook simply looks up the memory limit and applies a 50% ratio to it. Here are some notes from our code: // MDS cache memory limit should be set to 50-60% of RAM reserved for the MDS container // MDS uses approximately 125% of the value of mds_cache_memory_limit in RAM. // Eventually we will tune this automatically: http://tracker.ceph.com/issues/36663 mds_cache_memory_limit should be in the "ceph config dump" output. 1GB seems to be the default value of mds_cache_memory_limit. Could you look at the audit logs (from the mons) and grep for "mds_cache_memory_limit", I don't know why but it seems that the mds_cache_memory_limit was removed. Thanks. Mudit, looks like the doc text is filled already. (In reply to Sébastien Han from comment #30) > Mudit, looks like the doc text is filled already. That was filled by ceph folks when the initial issue was reported. They fixed it so the doc text type was "Bug Fix", but now this is a rook issue and we have decided not to fix it in 4.7 so we should provide doc text as "Known issue" If the existing doc text is still relevant then its ok but it talks about the ceph fix. Doc text was updated so removing my needinfo Mudit, I edited the doc_text. Went with OCP 4.3 - OCS 4.2 as initial deployment and continued by upgrading one by one version of OCS and OCP. When I was on OCS 4.6.4 I upgraded OCP to 4.7 and then OCS directly to 4.7.1-403.ci internal build. $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE lib-bucket-provisioner.v2.0.0 lib-bucket-provisioner 2.0.0 lib-bucket-provisioner.v1.0.0 Succeeded ocs-operator.v4.7.1-403.ci OpenShift Container Storage 4.7.1-403.ci ocs-operator.v4.6.4 Succeeded oc rsh -n openshift-storage rook-ceph-tools-784547f7c7-qxfz7 sh-4.4# ceph config dump|grep mds_cache_memory_limit mds.ocs-storagecluster-cephfilesystem-a basic mds_cache_memory_limit 4294967296 mds.ocs-storagecluster-cephfilesystem-b basic mds_cache_memory_limit 4294967296 So looks OK and will mark as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.7.1 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2449 |