Bug 1576095

Summary: Continually increasing memory consumption in ceph-osd
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Michael J. Kidd <linuxkidd>
Component: RADOSAssignee: Kefu Chai <kchai>
Status: CLOSED CURRENTRELEASE QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.5CC: acalhoun, anharris, bengland, bhubbard, brian.fife, ceph-eng-bugs, dzafman, jdurgin, kchai, linuxkidd, mhackett, vumrao
Target Milestone: rc   
Target Release: 3.*   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-13 22:08:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael J. Kidd 2018-05-08 19:48:20 UTC
Description of problem:
Continually increasing ceph-osd memory usage.
Currently using 8gb+ of RAM and continuing to climb

Version-Release number of selected component (if applicable):
RHCS 2.5.1 async 1
ceph-base-10.2.10-17.el7cp.x86_64                           Fri Apr 13 10:43:34 2018
ceph-common-10.2.10-17.el7cp.x86_64                         Fri Apr 13 10:39:54 2018
ceph-mon-10.2.10-17.el7cp.x86_64                            Fri Apr 13 10:43:37 2018
ceph-osd-10.2.10-17.el7cp.x86_64                            Fri Apr 13 10:43:36 2018
ceph-radosgw-10.2.10-17.el7cp.x86_64                        Fri Apr 13 10:43:37 2018
ceph-selinux-10.2.10-17.el7cp.x86_64                        Fri Apr 13 10:43:09 2018
libcephfs1-10.2.10-17.el7cp.x86_64                          Fri Apr 13 10:39:51 2018
librados2-10.2.10-17.el7cp.x86_64                           Fri Apr 13 10:39:47 2018
librbd1-10.2.10-17.el7cp.x86_64                             Fri Apr 13 10:39:50 2018
libvirt-daemon-driver-storage-rbd-3.2.0-14.el7_4.7.x86_64   Fri Apr 13 11:27:22 2018
puppet-ceph-2.4.1-3.el7ost.noarch                           Fri Apr 13 11:27:24 2018
python-cephfs-10.2.10-17.el7cp.x86_64                       Fri Apr 13 10:39:51 2018
python-rados-10.2.10-17.el7cp.x86_64                        Fri Apr 13 10:39:49 2018
python-rbd-10.2.10-17.el7cp.x86_64                          Fri Apr 13 10:39:51 2018


How reproducible:
Memory begins climbing again after restart of OSD Services

Additional info:
- Somewhat high PG to OSD ratio ( 228 to 308 pgs per OSD ), but wouldn't expect this memory usage
- leveldb compaction only taking ~ 0.9% of OSD run time, with most compactions completing in < 1 second.

heap stats from 1 day to the next:
Here's one from yesterday and today:

5/7 3pm?
osd.38 tcmalloc heap stats:------------------------------------------------
MALLOC:     8595705224 ( 8197.5 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +     48206544 (   46.0 MiB) Bytes in central cache freelist
MALLOC: +       589824 (    0.6 MiB) Bytes in transfer cache freelist
MALLOC: +    172015528 (  164.0 MiB) Bytes in thread cache freelists
MALLOC: +     37171360 (   35.4 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   8853688480 ( 8443.5 MiB) Actual memory used (physical + swap)
MALLOC: +    380485632 (  362.9 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =   9234174112 ( 8806.4 MiB) Virtual address space used
MALLOC:
MALLOC:         507443              Spans in use
MALLOC:           1104              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

5/8 12:18
osd.38 tcmalloc heap stats:------------------------------------------------
MALLOC:     9195426544 ( 8769.4 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +     48593344 (   46.3 MiB) Bytes in central cache freelist
MALLOC: +        61440 (    0.1 MiB) Bytes in transfer cache freelist
MALLOC: +    170845008 (  162.9 MiB) Bytes in thread cache freelists
MALLOC: +     39268512 (   37.4 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   9454194848 ( 9016.2 MiB) Actual memory used (physical + swap)
MALLOC: +    453730304 (  432.7 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =   9907925152 ( 9448.9 MiB) Virtual address space used
MALLOC:
MALLOC:         532916              Spans in use
MALLOC:           1062              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------

Comment 64 Ben England 2018-06-06 12:17:55 UTC
I have seen problems in the past with RHOSP12+RHCS2.4 OSD memory increase during situations where a lot of backfilling is occurring.  There was sort of a chain reaction where OSDs got too big, ran past their cgroup limit, an d died, setting off more backfilling and more OSD memory growth.  Here's the article about how we resolved it then, and how we might prevent it in the future.

https://docs.google.com/document/d/1e2jn8DbVbpwYcuhPG18tP3tDzaJ6DBHiacdzqiZcFow/edit#heading=h.x4uti0xeq736

Short summary: The workaround for this problem was:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-May/040030.html

Josh Durgin's recommendation to prevent this problem was:

osd_max_pg_log_entries = 3000 (default 10000)
osd_min_pg_log_entries = 3000 (default 1500)

I have not yet tried this, we plan to try it in the scale lab in the next month or so but just wanted you to know about it.  Let me know if this helps, or if it needs clarification.

Comment 82 Ben England 2019-03-29 11:46:20 UTC
Alex Calhoun reported some OSD memory growth in his RHCS 3.2 Bluestore testing here:

https://docs.google.com/document/d/1yiYxsxSP__SWMm-FkJjn1kFu1rkVLwaaVnUV4G720hA/edit#heading=h.e8tdgt9to8b2

This is an admittedly extreme I/O workload, but still it shouldn't happen.