Bug 1825312
Summary: | [ceph-mgr] memory growth leading to oom kill | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Preethi <pnataraj> |
Component: | RADOS | Assignee: | Neha Ojha <nojha> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Preethi <pnataraj> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.1 | CC: | akupczyk, bhubbard, ceph-eng-bugs, dzafman, epuertat, hyelloji, jdurgin, kchai, mnelson, nojha, rzarzyns, sseshasa, tchandra |
Target Milestone: | z1 | ||
Target Release: | 4.1 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-06-10 05:17:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Preethi
2020-04-17 16:26:35 UTC
That crash indicates the mgr ran out of memory. Looking at the host it's on, magna120, it appears the monitor there is using excessive memory (almost 7GiB), but it is all in tcmalloc's freelist (out of the monitor's control). 'heap release' fixes it: [root@magna120 ~]# ceph tell mon.magna120 heap stats mon.magna120 tcmalloc heap stats:------------------------------------------------ MALLOC: 563092728 ( 537.0 MiB) Bytes in use by application MALLOC: + 6616170496 ( 6309.7 MiB) Bytes in page heap freelist MALLOC: + 85302688 ( 81.4 MiB) Bytes in central cache freelist MALLOC: + 8931584 ( 8.5 MiB) Bytes in transfer cache freelist MALLOC: + 56794216 ( 54.2 MiB) Bytes in thread cache freelists MALLOC: + 43253760 ( 41.2 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 7373545472 ( 7032.0 MiB) Actual memory used (physical + swap) MALLOC: + 15990784 ( 15.2 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 7389536256 ( 7047.2 MiB) Virtual address space used MALLOC: MALLOC: 28394 Spans in use MALLOC: 24 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. [root@magna120 ~]# ceph tell mon.magna120 heap release mon.magna120 releasing free RAM back to system. [root@magna120 ~]# ceph tell mon.magna120 heap stats mon.magna120 tcmalloc heap stats:------------------------------------------------ MALLOC: 563129616 ( 537.0 MiB) Bytes in use by application MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 85320096 ( 81.4 MiB) Bytes in central cache freelist MALLOC: + 8558336 ( 8.2 MiB) Bytes in transfer cache freelist MALLOC: + 57334352 ( 54.7 MiB) Bytes in thread cache freelists MALLOC: + 43253760 ( 41.2 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 757596160 ( 722.5 MiB) Actual memory used (physical + swap) MALLOC: + 6631940096 ( 6324.7 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 7389536256 ( 7047.2 MiB) Virtual address space used MALLOC: MALLOC: 28362 Spans in use MALLOC: 24 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. Mark, have you seen this kind of behavior with tcmalloc before? @Josh, I also filed this tracker issue (https://tracker.ceph.com/issues/45136) for this same issue, as reported by Preethi. If you have a look at the picture there, between Apr 16 20:30 and Apr 17 04.45 CEST (in the logs: 18.30-02.45 UTC), there's a ramp-up on used memory (excluding cache/buffers/slab) and the mgr logs show 6 warnings about tcmalloc large allocations of 2GB. (In reply to Ernesto Puerta from comment #2) > @Josh, I also filed this tracker issue > (https://tracker.ceph.com/issues/45136) for this same issue, as reported by > Preethi. If you have a look at the picture there, between Apr 16 20:30 and > Apr 17 04.45 CEST (in the logs: 18.30-02.45 UTC), there's a ramp-up on used > memory (excluding cache/buffers/slab) and the mgr logs show 6 warnings about > tcmalloc large allocations of 2GB. Thanks Ernesto, that shows the mgr process was definitely the culprit. Unfortunately abrt deleted the coredump, so we can't see what was using that memory within the process. I fixed the abrt settings on the machine per https://access.redhat.com/solutions/168603 and increased the abrt storage limit to 100G so this won't happen in the future. Was anything in particular being tested around this time? Any dashboard functionality, or cli commands that could have impacted the mgr? Let's see if this happens again, we'll be able to get a coredump this time at least. If you see the mgr memory usage growing before it crashes, it'd be helpful to increase its debug mgr log level. Hi Josh, It's been a long time since I've seen the mons go crazy with memory usage, and previously it was due to rocksdb compaction not being able to keep up with ingest. On the OSD side we are now automatically releasing memory periodically in the PriorityCacheManager. I believe we are using that for the mon now as well right? (In reply to Josh Durgin from comment #5) > Let's see if this happens again, we'll be able to get a coredump this time > at least. If you see the mgr memory usage growing before it crashes, it'd be > helpful to increase its debug mgr log level. Who's task is it? NEEDINFO on reporter? (In reply to Yaniv Kaul from comment #7) > (In reply to Josh Durgin from comment #5) > > Let's see if this happens again, we'll be able to get a coredump this time > > at least. If you see the mgr memory usage growing before it crashes, it'd be > > helpful to increase its debug mgr log level. > > Who's task is it? NEEDINFO on reporter? Yes, reproducing the problem is a task for QE. If we can reproduce I agree it is a blocker. If we cannot, we won't have enough info to investigate and should move to the 5.* backlog until we can reproduce. |