Steps to reproduce: * Deploy 4.2 * See that mds_cache_memory_limit is not set in the ceph config centralized mon store (ceph config dump|grep mds_cache_memory_limit) * Upgrade all the ways up to the 4.6.5 release and see that mds_cache_memory_limit is set with a value of 4GB Thanks!
This can be backported now.
Backported
Deployed OCP 4.3 - OCS 4.2 initially and then upgraded one by one version of OCS and OCP till OCS 4.6.5 - OCP 4.6.32.The cluster was healthy after multiple upgrades. [root@localhost ocs4_2]# oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE lib-bucket-provisioner.v2.0.0 lib-bucket-provisioner 2.0.0 lib-bucket-provisioner.v1.0.0 Succeeded ocs-operator.v4.6.5-411.ci OpenShift Container Storage 4.6.5-411.ci ocs-operator.v4.6.4 Succeeded [root@localhost ocs4_2]# oc version Client Version: 4.5.6 Server Version: 4.6.32 Kubernetes Version: v1.20.0+a0b09eb sh-4.4# ceph config dump WHO MASK LEVEL OPTION VALUE RO global advanced bluestore_warn_on_legacy_statfs false global advanced mon_allow_pool_delete true global advanced mon_pg_warn_min_per_osd 0 global advanced osd_pool_default_pg_autoscale_mode on global advanced rbd_default_features 3 mgr advanced mgr/balancer/active true mgr advanced mgr/balancer/mode upmap mgr advanced mgr/orchestrator_cli/orchestrator rook * osd.0 advanced osd_delete_sleep 2.000000 osd.0 advanced osd_recovery_sleep 0.100000 osd.0 advanced osd_snap_trim_sleep 2.000000 osd.1 advanced osd_delete_sleep 2.000000 osd.1 advanced osd_recovery_sleep 0.100000 osd.1 advanced osd_snap_trim_sleep 2.000000 osd.2 advanced osd_delete_sleep 2.000000 osd.2 advanced osd_recovery_sleep 0.100000 osd.2 advanced osd_snap_trim_sleep 2.000000 mds_cache_memory_limit was not set in the ceph config centralized mon store. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- sh-4.4# ceph status cluster: id: 3aa16ada-a02c-4861-a57f-1d720cfc6f4e health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 10h) mgr: a(active, since 10h) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 10h), 3 in (since 2d) task status: scrub status: mds.ocs-storagecluster-cephfilesystem-a: idle mds.ocs-storagecluster-cephfilesystem-b: idle data: pools: 3 pools, 72 pgs objects: 7.65k objects, 29 GiB usage: 89 GiB used, 1.4 TiB / 1.5 TiB avail pgs: 72 active+clean io: client: 1.2 KiB/s rd, 60 KiB/s wr, 2 op/s rd, 6 op/s wr sh-4.4# ceph health detail HEALTH_OK ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- sh-4.4# ceph fs status ocs-storagecluster-cephfilesystem - 2 clients ================================= +------+----------------+-------------------------------------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+----------------+-------------------------------------+---------------+-------+-------+ | 0 | active | ocs-storagecluster-cephfilesystem-a | Reqs: 0 /s | 15 | 18 | | 0-s | standby-replay | ocs-storagecluster-cephfilesystem-b | Evts: 0 /s | 5 | 8 | +------+----------------+-------------------------------------+---------------+-------+-------+ +--------------------------------------------+----------+-------+-------+ | Pool | type | used | avail | +--------------------------------------------+----------+-------+-------+ | ocs-storagecluster-cephfilesystem-metadata | metadata | 672k | 455G | | ocs-storagecluster-cephfilesystem-data0 | data | 48.0k | 455G | +--------------------------------------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ +-------------+ MDS version: ceph version 14.2.11-147.el8cp (1f54d52f20d93c1b91f1ec6af4c67a4b81402800) nautilus (stable) ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- sh-4.4# ceph config dump|grep mds_cache_memory_limit mds.ocs-storagecluster-cephfilesystem-a basic mds_cache_memory_limit 4294967296 mds.ocs-storagecluster-cephfilesystem-b basic mds_cache_memory_limit 4294967296 The mds_cache_memory_limit is set with a value of 4GB, looks good to me. Hence moving this bug to verifed . Thanks Mugdha Soni
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.5 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2479