Description of problem (please be detailed as possible and provide log snippests): Sorry, I was pulled into this case in the middle of it, our APAC folks are offline - they can add additional details if needed. IBM customer recently upgraded from OCS 4.8 to ODF 4.10. OCP currently on 4.12 This is IBM Cloud VPC Version of all relevant components (if applicable): $ omc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.14 NooBaa Operator 4.10.14 mcg-operator.v4.9.15 Succeeded ocs-operator.v4.10.14 OpenShift Container Storage 4.10.14 ocs-operator.v4.9.15 Succeeded odf-csi-addons-operator.v4.10.14 CSI Addons 4.10.14 odf-csi-addons-operator.v4.10.13 Succeeded odf-operator.v4.10.14 OpenShift Data Foundation 4.10.14 odf-operator.v4.9.15 Succeeded $ omc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.26 True False 41h Cluster version is 4.12.26 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? pre-upgrade (4.8), customer had a large number of snapshots pushing their cluster up to high usage. Ashish Singh had them set the following options to help speed up the space recovery. # ceph config set osd osd_max_trimming_pgs 5 # ceph config set osd osd_pg_max_concurrent_snap_trims 15 # ceph config set osd osd_snap_trim_sleep_hdd 0 Customer was able to recover 690GB of the expected 1010GB used by the snapshots. After upgrade to 4.10, customer noticed osd pods restarting and upon further investigation, the osds were flapping. must gather was uploaded - see must-gather-post-odf4.10upgrade.tar.gz Ashish S. reverted the above settings back to defaults, scaled down the rook-ceph-operator and ocs-operator and removed the liveness prob from the osd's to try to combat the flapping. He also increased the HB timeouts osd_op_thread_suicide_timeout and osd_op_thread_timeout (these are currently the only non-default settings). cluster still reporting slow requests captured logs with full debug enabled across all 3 osds Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
> Also, what could be possible reason for large omap for snap objects ? The DB could grow very significantly if one does snapshot & overwrites on objects that have large OMAP data. BlueStore is COPYING OMAP data when Bluestore::clone() in done on object, there is no COW here. I think the best we can do to help SnapMapper trim is to periodically inject "compact" command. The problem with RocksDB & deletion & slow iterators is that deletion is very compact operation, and default triggers for auto compaction are relying on L0 sst tables size. It is possible to accumulate significant amount of key remove operation and not trigger compaction, but be significantly burdened with iterating over deleted keys. It can be done by either admin command "compact" #ceph tell osd.0 compact If OSD suicides before finishing compaction, one can take it offline and apply: #ceph-kvstore-tool ./bin/ceph-kvstore-tool bluestore-kv path-to-data compact or just compact omap: #ceph-kvstore-tool ./bin/ceph-kvstore-tool bluestore-kv path-to-data compact-prefix p I think as SnapMapper deletion progresses, it might be required to retrigger compaction multiple times.