Bug 2003776

Summary: [Workload-DFG] MGR: The osd_memory_target_autotune option is not working as expected
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: CephadmAssignee: Adam King <adking>
Status: CLOSED ERRATA QA Contact: Pawan <pdhiran>
Severity: high Docs Contact: Mary Frances Hull <mhull>
Priority: unspecified    
Version: 5.0CC: adking, agunn, akupczyk, bhubbard, ceph-eng-bugs, ceph-qe-bugs, dpivonka, nojha, pdhiran, racpatel, rzarzyns, skanta, sseshasa, tserlin, twilkins, vashastr, vereddy, vumrao
Target Milestone: ---   
Target Release: 5.0z1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.2.0-140.el8cp Doc Type: Enhancement
Doc Text:
.{storage-product} can now automatically tune the Ceph OSD memory target With this release, `osd_memory_target_autotune` option is fixed, and works as expected. Users can enable {storage-product} to automatically tune the Ceph OSD memory target for the Ceph OSDs in the storage cluster for improved performance without explicitly setting the memory target for the Ceph OSDs. {storage-product} sets the Ceph OSD memory target on a per-node basis by evaluating the total memory available, and the daemons running on the node. Users can enable the memory auto-tuning feature for the Ceph OSD by running the following command: ---- ceph config set osd osd_memory_target_autotune true ----
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-02 16:39:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1959686, 1973155    

Description Vikhyat Umrao 2021-09-13 16:54:13 UTC
Description of problem:
MGR: The osd_memory_target_autotune option is not working as expected

- set osd_memory_target_autotune = true to enable and we should see the autotune start when this option is enabled.
- but in the workload-dfg RHCS 5 cluster, we are not seeing it happen.

What we see right now:

- The osd_memory_target_autotune enabled for all the 192 OSDs. Listing for one of them - 

- We did dump osd_memory_target value `config show` for all 192 OSDs and they still have the default value as 4G. 


# ceph tell osd.1 config show | grep osd_memory_target
    "osd_memory_target": "4294967296",
    "osd_memory_target_autotune": "true",
    "osd_memory_target_cgroup_limit_ratio": "0.800000",

- We went ahead and ran `ceph orch ps --daemon-type osd --format json-pretty` 

  {
    "container_id": "fedc4933cd26",
    "container_image_digests": [
      "registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:55cb1de88341300daa1ab6d59e4897edc733
a3f90162149c21f18abe49ed87c7"
    ],
    "container_image_id": "2142b60d797408c7a0e9210489bd599cd7addb2d4c6e31da769eba248208ca44
",
    "container_image_name": "registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:55cb1de8834130
0daa1ab6d59e4897edc733a3f90162149c21f18abe49ed87c7",
    "created": "2021-09-04T12:31:42.062065Z",
    "daemon_id": "1",
    "daemon_type": "osd",
    "hostname": "f23-h05-000-6048r.rdu2.scalelab.redhat.com",
    "is_active": false,
    "last_refresh": "2021-09-13T16:40:43.590086Z",
    "memory_usage": 9093519507,
    "osdspec_affinity": "defaultDG",
    "ports": [],
    "started": "2021-09-04T12:31:49.409595Z",
    "status": 1,
    "status_desc": "running",
    "version": "16.2.0-117.el8cp"
  },

- We did run the top command to see what is the exact usage at the system for this particular OSD


    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 576023 ceph      20   0 6609044   3.9g  15824 S   5.6   1.5 449:20.13 /usr/bin/ceph-osd -n osd.1 -f --setuser ceph --setgroup ceph --default-log-to-file=false --defau+

Looks like the current set `osd_memory_target` is limiting it to the default value 4G but in `ceph orch` command we can clearly see the usage is listed around 9G.

"memory_usage": 9093519507


We will attach all the commands dumps!


Version-Release number of selected component (if applicable):
RHCS 5 - 16.2.0-117.el8cp

How reproducible:
Always

Comment 27 errata-xmlrpc 2021-11-02 16:39:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4105