1340489 – [rbd-mirror] high memory usage when replicating hundreds of images

Bug 1340489 - [rbd-mirror] high memory usage when replicating hundreds of images

Summary: [rbd-mirror] high memory usage when replicating hundreds of images

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	2.0
Assignee:	Jason Dillaman
QA Contact:	Tanay Ganguly
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-27 14:16 UTC by Jason Dillaman
Modified:	2022-02-21 18:17 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHEL: ceph-10.2.1-12.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-23 19:39:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	15930	None	None	None	2016-07-05 14:02:59 UTC
Ceph Project Bug Tracker	16223	None	None	None	2016-07-05 13:59:30 UTC
Red Hat Product Errata	RHBA-2016:1755	normal	SHIPPED_LIVE	Red Hat Ceph Storage 2.0 bug fix and enhancement update	2016-08-23 23:23:52 UTC

Description Jason Dillaman 2016-05-27 14:16:33 UTC

Description of problem:
Each image being replicated results in the librbd cache of up to 32MB (default) being utilized.  When replicating lots of images, this can quickly result in gigabytes of memory being used.

Version-Release number of selected component (if applicable):
10.2.1

How reproducible:
100%

Steps to Reproduce:
1. Replicate hundreds of images with data

Actual results:
The rbd-mirror daemon will be utilizing gigabytes of memory

Expected results:
The rbd-mirror daemon should not have the client-side cache enabled for each replicated image.

Additional info:

Comment 4 Tanay Ganguly 2016-07-04 10:15:44 UTC

@Jason:
Can you please let me know if below stat is OK.

Slave Node:
top - 10:03:26 up 5 days,  1:58,  8 users,  load average: 0.28, 0.41, 0.37
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 98.7 id,  1.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32747292 total,   558536 free,  7260360 used, 24928396 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 15675632 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                             
25723 ceph      20   0 13.013g 6.448g   8704 S   2.0 20.6  15:13.35 rbd-mirror               

free -m
              total        used        free      shared  buff/cache   available
Mem:          31979        7089         542        9272       24347       15308
Swap:             0           0           0

Lots of memory is held by Buff/Cache ( I didn't do any changes in ceph.conf, w.r.t Cache )

This data is taken while creating almost 100 Images with Data + replication On


Master Node:
top - 15:36:42 up 7 days,  4:23,  3 users,  load average: 1.18, 0.71, 0.58
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.2 sy,  0.0 ni, 99.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 13161976+total, 10420131+free,  2524472 used, 24893984 buff/cache
KiB Swap:  4194300 total,  4194300 free,        0 used. 11910392+avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                            
 14779 ceph      20   0 7534376  93324   8848 S   3.3  0.1  16:18.18 rbd-mirror                                                                                                         
free -m
              total        used        free      shared  buff/cache   available
Mem:         128534        2467      101754        9270       24312      116309
Swap:          4095           0        4095

Comment 5 Jason Dillaman 2016-07-05 14:04:22 UTC

Image cache is disabled, but the default journal object size also results in high-memory usage (up to 128MB per replicated image at worse-case). I opened an upstream ticket about this about a month ago.

Comment 6 Tanay Ganguly 2016-08-01 10:29:49 UTC

Marking this BUG as Verified, as i am not seeing any leak or heavy memory consumption.

ceph version 10.2.2-27.el7cp

Comment 8 errata-xmlrpc 2016-08-23 19:39:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html

Note You need to log in before you can comment on or make changes to this bug.