Description of problem: Each image being replicated results in the librbd cache of up to 32MB (default) being utilized. When replicating lots of images, this can quickly result in gigabytes of memory being used. Version-Release number of selected component (if applicable): 10.2.1 How reproducible: 100% Steps to Reproduce: 1. Replicate hundreds of images with data Actual results: The rbd-mirror daemon will be utilizing gigabytes of memory Expected results: The rbd-mirror daemon should not have the client-side cache enabled for each replicated image. Additional info:
@Jason: Can you please let me know if below stat is OK. Slave Node: top - 10:03:26 up 5 days, 1:58, 8 users, load average: 0.28, 0.41, 0.37 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 98.7 id, 1.2 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32747292 total, 558536 free, 7260360 used, 24928396 buff/cache KiB Swap: 0 total, 0 free, 0 used. 15675632 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25723 ceph 20 0 13.013g 6.448g 8704 S 2.0 20.6 15:13.35 rbd-mirror free -m total used free shared buff/cache available Mem: 31979 7089 542 9272 24347 15308 Swap: 0 0 0 Lots of memory is held by Buff/Cache ( I didn't do any changes in ceph.conf, w.r.t Cache ) This data is taken while creating almost 100 Images with Data + replication On Master Node: top - 15:36:42 up 7 days, 4:23, 3 users, load average: 1.18, 0.71, 0.58 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.2 sy, 0.0 ni, 99.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 13161976+total, 10420131+free, 2524472 used, 24893984 buff/cache KiB Swap: 4194300 total, 4194300 free, 0 used. 11910392+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14779 ceph 20 0 7534376 93324 8848 S 3.3 0.1 16:18.18 rbd-mirror free -m total used free shared buff/cache available Mem: 128534 2467 101754 9270 24312 116309 Swap: 4095 0 4095
Image cache is disabled, but the default journal object size also results in high-memory usage (up to 128MB per replicated image at worse-case). I opened an upstream ticket about this about a month ago.
Marking this BUG as Verified, as i am not seeing any leak or heavy memory consumption. ceph version 10.2.2-27.el7cp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html