Bug 1340489

Summary: [rbd-mirror] high memory usage when replicating hundreds of images
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Jason Dillaman <jdillama>
Component: RBDAssignee: Jason Dillaman <jdillama>
Status: CLOSED ERRATA QA Contact: Tanay Ganguly <tganguly>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.0CC: amaredia, ceph-eng-bugs, hnallurv, jdillama, kurs
Target Milestone: rc   
Target Release: 2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.1-12.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:39:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jason Dillaman 2016-05-27 14:16:33 UTC
Description of problem:
Each image being replicated results in the librbd cache of up to 32MB (default) being utilized.  When replicating lots of images, this can quickly result in gigabytes of memory being used.

Version-Release number of selected component (if applicable):
10.2.1

How reproducible:
100%

Steps to Reproduce:
1. Replicate hundreds of images with data

Actual results:
The rbd-mirror daemon will be utilizing gigabytes of memory

Expected results:
The rbd-mirror daemon should not have the client-side cache enabled for each replicated image.

Additional info:

Comment 4 Tanay Ganguly 2016-07-04 10:15:44 UTC
@Jason:
Can you please let me know if below stat is OK.

Slave Node:
top - 10:03:26 up 5 days,  1:58,  8 users,  load average: 0.28, 0.41, 0.37
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 98.7 id,  1.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32747292 total,   558536 free,  7260360 used, 24928396 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 15675632 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                             
25723 ceph      20   0 13.013g 6.448g   8704 S   2.0 20.6  15:13.35 rbd-mirror               

free -m
              total        used        free      shared  buff/cache   available
Mem:          31979        7089         542        9272       24347       15308
Swap:             0           0           0

Lots of memory is held by Buff/Cache ( I didn't do any changes in ceph.conf, w.r.t Cache )

This data is taken while creating almost 100 Images with Data + replication On


Master Node:
top - 15:36:42 up 7 days,  4:23,  3 users,  load average: 1.18, 0.71, 0.58
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.2 sy,  0.0 ni, 99.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 13161976+total, 10420131+free,  2524472 used, 24893984 buff/cache
KiB Swap:  4194300 total,  4194300 free,        0 used. 11910392+avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                            
 14779 ceph      20   0 7534376  93324   8848 S   3.3  0.1  16:18.18 rbd-mirror                                                                                                         
free -m
              total        used        free      shared  buff/cache   available
Mem:         128534        2467      101754        9270       24312      116309
Swap:          4095           0        4095

Comment 5 Jason Dillaman 2016-07-05 14:04:22 UTC
Image cache is disabled, but the default journal object size also results in high-memory usage (up to 128MB per replicated image at worse-case). I opened an upstream ticket about this about a month ago.

Comment 6 Tanay Ganguly 2016-08-01 10:29:49 UTC
Marking this BUG as Verified, as i am not seeing any leak or heavy memory consumption.

ceph version 10.2.2-27.el7cp

Comment 8 errata-xmlrpc 2016-08-23 19:39:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html