Description of problem: Continually increasing ceph-osd memory usage. Currently using 8gb+ of RAM and continuing to climb Version-Release number of selected component (if applicable): RHCS 2.5.1 async 1 ceph-base-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:43:34 2018 ceph-common-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:39:54 2018 ceph-mon-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:43:37 2018 ceph-osd-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:43:36 2018 ceph-radosgw-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:43:37 2018 ceph-selinux-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:43:09 2018 libcephfs1-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:39:51 2018 librados2-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:39:47 2018 librbd1-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:39:50 2018 libvirt-daemon-driver-storage-rbd-3.2.0-14.el7_4.7.x86_64 Fri Apr 13 11:27:22 2018 puppet-ceph-2.4.1-3.el7ost.noarch Fri Apr 13 11:27:24 2018 python-cephfs-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:39:51 2018 python-rados-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:39:49 2018 python-rbd-10.2.10-17.el7cp.x86_64 Fri Apr 13 10:39:51 2018 How reproducible: Memory begins climbing again after restart of OSD Services Additional info: - Somewhat high PG to OSD ratio ( 228 to 308 pgs per OSD ), but wouldn't expect this memory usage - leveldb compaction only taking ~ 0.9% of OSD run time, with most compactions completing in < 1 second. heap stats from 1 day to the next: Here's one from yesterday and today: 5/7 3pm? osd.38 tcmalloc heap stats:------------------------------------------------ MALLOC: 8595705224 ( 8197.5 MiB) Bytes in use by application MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 48206544 ( 46.0 MiB) Bytes in central cache freelist MALLOC: + 589824 ( 0.6 MiB) Bytes in transfer cache freelist MALLOC: + 172015528 ( 164.0 MiB) Bytes in thread cache freelists MALLOC: + 37171360 ( 35.4 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 8853688480 ( 8443.5 MiB) Actual memory used (physical + swap) MALLOC: + 380485632 ( 362.9 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 9234174112 ( 8806.4 MiB) Virtual address space used MALLOC: MALLOC: 507443 Spans in use MALLOC: 1104 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. 5/8 12:18 osd.38 tcmalloc heap stats:------------------------------------------------ MALLOC: 9195426544 ( 8769.4 MiB) Bytes in use by application MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 48593344 ( 46.3 MiB) Bytes in central cache freelist MALLOC: + 61440 ( 0.1 MiB) Bytes in transfer cache freelist MALLOC: + 170845008 ( 162.9 MiB) Bytes in thread cache freelists MALLOC: + 39268512 ( 37.4 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 9454194848 ( 9016.2 MiB) Actual memory used (physical + swap) MALLOC: + 453730304 ( 432.7 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 9907925152 ( 9448.9 MiB) Virtual address space used MALLOC: MALLOC: 532916 Spans in use MALLOC: 1062 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------
I have seen problems in the past with RHOSP12+RHCS2.4 OSD memory increase during situations where a lot of backfilling is occurring. There was sort of a chain reaction where OSDs got too big, ran past their cgroup limit, an d died, setting off more backfilling and more OSD memory growth. Here's the article about how we resolved it then, and how we might prevent it in the future. https://docs.google.com/document/d/1e2jn8DbVbpwYcuhPG18tP3tDzaJ6DBHiacdzqiZcFow/edit#heading=h.x4uti0xeq736 Short summary: The workaround for this problem was: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/040030.html Josh Durgin's recommendation to prevent this problem was: osd_max_pg_log_entries = 3000 (default 10000) osd_min_pg_log_entries = 3000 (default 1500) I have not yet tried this, we plan to try it in the scale lab in the next month or so but just wanted you to know about it. Let me know if this helps, or if it needs clarification.
Alex Calhoun reported some OSD memory growth in his RHCS 3.2 Bluestore testing here: https://docs.google.com/document/d/1yiYxsxSP__SWMm-FkJjn1kFu1rkVLwaaVnUV4G720hA/edit#heading=h.e8tdgt9to8b2 This is an admittedly extreme I/O workload, but still it shouldn't happen.