841617 – after geo-replication start: glusterfs process eats memory until OOM kills it

Bug 841617 - after geo-replication start: glusterfs process eats memory until OOM kills it

Summary: after geo-replication start: glusterfs process eats memory until OOM kills it

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Raghavendra Bhat
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1060259
TreeView+	depends on / blocked

Reported:	2012-07-19 15:15 UTC by Andreas Kurz
Modified:	2014-04-17 13:12 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.4.3
Clone Of:
Environment:
Last Closed:	2014-04-17 13:12:13 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
State dump of glusterfs process consumig ~6GB memory (1.34 MB, application/octet-stream) 2012-07-19 15:15 UTC, Andreas Kurz	no flags	Details
glusterfs core dump of "stressed" local mount (9.42 MB, application/octet-stream) 2012-07-26 13:26 UTC, Andreas Kurz	no flags	Details
View All

Description Andreas Kurz 2012-07-19 15:15:41 UTC

Created attachment 599186 [details]
State dump of glusterfs process consumig ~6GB memory

Description of problem:

Once I start geo-replication the corresponding glusterfs process starts eating memory until killed by OOM after some hours.

Version-Release number of selected component (if applicable):

gluster 3.2.6
Centos 6.2 in AWS

How reproducible:

Always

Steps to Reproduce:
1.start geo-replication to a third node
2.do updates on data so gsyncd crawls through directories
3.wait
  
Actual results:

glusterfs process memory consumption increases beyond all limits (like performance.cache-size)

Expected results:

stable memory consumption of glusterfs once cache-size is reached


Additional info:
Bricks are on ecryptfs, sitting on XFS

Current volume configuration:

Volume Name: varema
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: ea-nas-01:/brick01/varema
Brick2: ea-nas-02:/brick01/varema
Brick3: ea-nas-01:/brick02/varema
Brick4: ea-nas-02:/brick02/varema
Brick5: ea-nas-01:/brick03/varema
Brick6: ea-nas-02:/brick03/varema
Brick7: ea-nas-01:/brick04/varema
Brick8: ea-nas-02:/brick04/varema
Options Reconfigured:
performance.io-cache: off
performance.cache-max-file-size: 1073741824
performance.cache-min-file-size: 0
performance.stat-prefetch: off
performance.quick-read: off
performance.cache-refresh-timeout: 1
performance.read-ahead: off
geo-replication.indexing: on
nfs.disable: on
auth.allow: 10.0.2.*,127.*
network.ping-timeout: 10
performance.cache-size: 1073741824

Comment 1 Amar Tumballi 2012-07-20 08:12:01 UTC

Can you help us with getting the statedump (by sending SIGUSR1 (kill -USR1 <pid of glusterfs process which consumes memory>)) and getting /tmp/glusterdump.<pid> file. That will help to pinpoint the reason for leak.


Also, now that we have 3.3.0 release out, is it an option to upgrade? if yes, we recommend you to upgrade. It has multiple fixes for memory leaks etc overall.

Comment 2 Andreas Kurz 2012-07-20 08:25:20 UTC

(In reply to comment #1)
> Can you help us with getting the statedump (by sending SIGUSR1 (kill -USR1
> <pid of glusterfs process which consumes memory>)) and getting
> /tmp/glusterdump.<pid> file. That will help to pinpoint the reason for leak.

The one I already attached is not ok?

> 
> 
> Also, now that we have 3.3.0 release out, is it an option to upgrade? if
> yes, we recommend you to upgrade. It has multiple fixes for memory leaks etc
> overall.

Well, this is a live system now and without good reasons I prefer to not update to a new major release that has already some know problems. Of course if you can identify the problem and definitely say this is fixed in 3.3.0 an update is an option. 

I also wonder if this is esspecially related to geo-replication. Unfortunately there is not a lot of information except source-code about how exactly geo-replication adds that extra load to the system. In the productive system I see nearly permanent 10Mbit traffic between nodes doing lookups and getxattr -- is self-healing triggered by gsyncd crawling?

Comment 3 Amar Tumballi 2012-07-20 10:07:55 UTC

Thanks for the statedump. I had missed mail on attachment and hence asked. That is ok.

Now, the question for you is, how many files are in filesystem (in terms of inodes used, so include directory depth also). It has an impact on geo replication mount because it will traverse through the whole filesystem entries and it will result in creating inode entries in inode table.

Next time you see spike in memory usage, try issuing below command on system and check if it has any impact.

# echo 2 > /proc/sys/vm/drop_caches

Comment 4 Andreas Kurz 2012-07-20 10:40:46 UTC

(In reply to comment #3)
> Thanks for the statedump. I had missed mail on attachment and hence asked.
> That is ok.
> 
> Now, the question for you is, how many files are in filesystem (in terms of
> inodes used, so include directory depth also). It has an impact on geo
> replication mount because it will traverse through the whole filesystem
> entries and it will result in creating inode entries in inode table.

There are 2366605 inodes used, maximum directory depth 21 (for code sources kept on the volume), depth 9 for the active files in use by application servers.

Would you expect that moving that src-files with this high directory depth in an extra volume being a good workaround to solve this situation? ... though these files are not accessed/changed that often, as application data is.

> 
> Next time you see spike in memory usage, try issuing below command on system
> and check if it has any impact.
> 
> # echo 2 > /proc/sys/vm/drop_caches

droping dentries and inodes when glusterfs process consumed more then 9GB freed about 1GB at the first time, then memory consumption continued and dropping caches had much lower impact.

Comment 5 Andreas Kurz 2012-07-26 13:20:09 UTC

(In reply to comment #1)
> Can you help us with getting the statedump (by sending SIGUSR1 (kill -USR1
> <pid of glusterfs process which consumes memory>)) and getting
> /tmp/glusterdump.<pid> file. That will help to pinpoint the reason for leak.
> 
> 
> Also, now that we have 3.3.0 release out, is it an option to upgrade? if
> yes, we recommend you to upgrade. It has multiple fixes for memory leaks etc
> overall.

I just installed Gluster 3.3.0 in our Lab to see if it also eats all memory. Doing a local Gluster mount on the server and starting a simple shell script to create directories and touch empty files. After about 900k inodes uses I get:

mktemp: failed to create directory via template `XXXXXX': Transport endpoint is not connected

and a core dump of a glusterfs prozess. I attach it to this bug.

Comment 6 Andreas Kurz 2012-07-26 13:26:02 UTC

Created attachment 600507 [details]
glusterfs core dump of "stressed" local mount

Comment 7 Amar Tumballi 2012-07-26 17:13:39 UTC

Andreas,

Core doesn't help a lot as we would need same binary bits to get the right information out of it... 

two possible work around and see if it helps:

before starting next mount, do 

----- 
gluster volume set varema quick-read off
gluster volume set varema io-cache off
gluster volume set varema stat-prefetch off
-----

Start the mount process after these commands are complete, and do a volume stop/start. 

Meantime, we are looking into this as mostly in our testbed, the leak is not this obviously hit.

If it is very frequent and if you can help more with the debugging, the best possible way is to do is running below command..

step 0: gluster volume stop varema --mode=script; gluster volume start varema
step 1: "bash# valgrind --leak-check=full --log-file=/tmp/valgrind.gluster-leak.log glusterfs -s <servername> --volfile-id varema /mount/point"

step 2: run your tests ; keep watching memory growth; once reaches above 6-7GB, do umount /mount/point.


That would pin-point the leak with the exact trace..

Comment 8 Andreas Kurz 2012-07-26 21:00:25 UTC

(In reply to comment #7)
> Andreas,
> 
> Core doesn't help a lot as we would need same binary bits to get the right
> information out of it... 
> 
> two possible work around and see if it helps:
> 
> before starting next mount, do 
> 
> ----- 
> gluster volume set varema quick-read off
> gluster volume set varema io-cache off
> gluster volume set varema stat-prefetch off
> -----
> 
> Start the mount process after these commands are complete, and do a volume
> stop/start. 
> 
> Meantime, we are looking into this as mostly in our testbed, the leak is not
> this obviously hit.
> 
> If it is very frequent and if you can help more with the debugging, the best
> possible way is to do is running below command..
> 
> step 0: gluster volume stop varema --mode=script; gluster volume start varema
> step 1: "bash# valgrind --leak-check=full
> --log-file=/tmp/valgrind.gluster-leak.log glusterfs -s <servername>
> --volfile-id varema /mount/point"

I can do this ... but how can this be done for the mount that geo-replication does locally?

> 
> step 2: run your tests ; keep watching memory growth; once reaches above
> 6-7GB, do umount /mount/point.
> 
> 
> That would pin-point the leak with the exact trace..

Comment 9 Amar Tumballi 2012-07-28 08:54:24 UTC

Andreas,

Do 'gluster volume set varema memory-accounting on' and then restart the geo-rep session. When the memory usage is high, take a statedump (kill -SIGUSR1). Don't leave it to OOM kill, because memory-accounting info will be in memory, and will be lost with the process (unlike valgrind, where valgrind itself starts glusterfs process).

As its hard to get the geo-replication's mount process attached to valgrind, the option is the best for now.

Meantime, did you see any improvement by switching off the performance translators?

Comment 10 Andreas Kurz 2012-07-30 06:49:20 UTC

(In reply to comment #9)
> Andreas,
> 
> Do 'gluster volume set varema memory-accounting on' and then restart the
> geo-rep session. When the memory usage is high, take a statedump (kill
> -SIGUSR1). Don't leave it to OOM kill, because memory-accounting info will
> be in memory, and will be lost with the process (unlike valgrind, where
> valgrind itself starts glusterfs process).

I assume this is a 3.3 feature? In the customer system where I see this problem, there is 3.2.6 running and it does not accept this option. I'll try this in the lab, unfortunately on the quite small VMs in the lab where I am testing Gluster, it is extremely unstable and core-dumps regularly on higher load ...  

> 
> As its hard to get the geo-replication's mount process attached to valgrind,
> the option is the best for now.
> 
> Meantime, did you see any improvement by switching off the performance
> translators?

I already tried that with 3.2.6 in the productive customer system without seeing any improvements, before opening this bug-report.

Comment 11 Amar Tumballi 2013-02-04 12:32:53 UTC

Andreas,

Wanted to check if 3.3.1 fixes the issue for you. (We had not seen any OOM kills in our testing inhouse for similar workload).

Also, if its possible to have a dry run of 3.4.0qa builds (now @ qa8), it would be great to evaluate the newer versions.

For now, closing the bug as WORKSFORME, as there are significant patches which went into handle leaks/crashes etc.

Comment 12 Kaleb KEITHLEY 2013-06-11 13:04:02 UTC

reopen, user reports they still get OOM killed during geo-rep with 3.3.1 and 3.3.2qa3

Comment 13 Kaleb KEITHLEY 2013-06-11 13:12:36 UTC

(09:11:39 AM) social__: btw quick reproducer : mount gluster volume somewhere and run on it  sysbench --num-threads=1 --max-requests=0 --init-rng=on --max-time=0 --test=fileio --file-total-size=$((1*10000))M --file-num=10000 --file-test-mode=seqwr run

Comment 14 Lukas Bezdicka 2013-06-11 13:16:53 UTC

Probably same issue but another way around:
[root@vrana:~] gluster volume info
 
Volume Name: kokot
Type: Replicate
Volume ID: 92f0a0cb-d552-4df5-8dac-54e7eb1be5c6
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: havran.int.na.kokot.com:/mnt/kokot
Brick2: vrana.int.na.kokot.com:/mnt/kokot
Options Reconfigured:
nfs.disable: On
diagnostics.client-sys-log-level: WARNING
diagnostics.brick-sys-log-level: WARNING
diagnostics.client-log-level: NONE
diagnostics.brick-log-level: NONE
server.statedump-path: /mnt/tmp

[root@havran:/media/kokot] mount
...
havran.int.na.kokot.com:/kokot on /media/kokot type fuse.glusterfs (rw,allow_other,max_read=131072)

[root@havran:/media/kokot] sysbench --num-threads=1 --max-requests=0 --init-rng=on --max-time=0 --test=fileio --file-total-size=$((1*10000))M --file-num=10000 --file-test-mode=seqwr run
WARNING: Both max-requests and max-time are 0, running endless test
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Initializing random number generator from timer.


Extra file open flags: 0
10000 files, 1Mb each
9.7656Gb total file size
Block size 16Kb
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing sequential write (creation) test
Threads started!
WARNING: Operation time (0.000000) is less than minimal counted value, counting as 1.000000
WARNING: Percentile statistics will be inaccurate
FATAL: Failed to write file! file: 8350 pos: 32768 errno = 103 ()
Done.

Operations performed:  0 Read, 534210 Write, 0 Other = 534210 Total
Read 0b  Written 8.1514Gb  Total transferred 8.1514Gb  (80.566Mb/sec)
 5156.20 Requests/sec executed

Test execution summary:
    total time:                          103.6053s
    total number of events:              534210
    total time taken by event execution: 74.6433
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.14ms
         max:                                177.90ms
         approx.  95 percentile:               0.20ms

Threads fairness:
    events (avg/stddev):           534210.0000/0.00
    execution time (avg/stddev):   74.6433/0.00





The error: FATAL: Failed to write file! file: 8350 pos: 32768 errno = 103 () is caused by: 
Out of memory: Kill process 26409 (glusterfs) score 906 or sacrifice child

Comment 15 Lukas Bezdicka 2013-08-21 15:19:37 UTC

Geo-replication memleak can be seen also in 3.4.0

Comment 16 Lukas Bezdicka 2014-01-29 10:14:38 UTC

==28620== 109,728 bytes in 1,143 blocks are definitely lost in loss record 216 of 232
==28620==    at 0x4A0577B: calloc (vg_replace_malloc.c:593)
==28620==    by 0x319A83B132: __gf_calloc (mem-pool.h:75)
==28620==    by 0x99F7126: cluster_getmarkerattr (libxlator.c:319)
==28620==    by 0x99EC340: dht_getxattr (dht-common.c:2281)
==28620==    by 0x319A81CB6A: default_getxattr (defaults.c:1083)
==28620==    by 0x319A81CB6A: default_getxattr (defaults.c:1083)
==28620==    by 0x319A81CB6A: default_getxattr (defaults.c:1083)
==28620==    by 0x319A81CB6A: default_getxattr (defaults.c:1083)
==28620==    by 0x319A81CB6A: default_getxattr (defaults.c:1083)
==28620==    by 0xA646A8D: mdc_getxattr (md-cache.c:1780)
==28620==    by 0xA851406: io_stats_getxattr (io-stats.c:2216)
==28620==    by 0x648D772: fuse_getxattr_resume (fuse-bridge.c:3194)

Comment 17 Lukas Bezdicka 2014-01-29 10:16:08 UTC

==5908== 387,120 (217,800 direct, 169,320 indirect) bytes in 605 blocks are definitely lost in loss record 304 of 322
==5908==    at 0x4A0577B: calloc (vg_replace_malloc.c:593)
==5908==    by 0x38B9A3B132: __gf_calloc (mem-pool.h:75)
==5908==    by 0x38B9A2B27A: __inode_create (inode.c:531)
==5908==    by 0x38B9A2B662: inode_new (inode.c:562)
==5908==    by 0x6482D85: fuse_resolve_entry (fuse-resolve.c:114)
==5908==    by 0x64831A7: fuse_resolve_parent (fuse-resolve.c:309)
==5908==    by 0x6483487: fuse_resolve (fuse-resolve.c:642)
==5908==    by 0x648374D: fuse_resolve_all (fuse-resolve.c:681)
==5908==    by 0x6483797: fuse_resolve_and_resume (fuse-resolve.c:721)
==5908==    by 0x649A117: fuse_thread_proc (fuse-bridge.c:4607)
==5908==    by 0x3B21E07850: start_thread (pthread_create.c:301)
==5908==    by 0xC8116FF: ???

Comment 18 Anand Avati 2014-01-29 10:22:09 UTC

REVIEW: http://review.gluster.org/6850 (libxlator: fix memleak in cluster_markerxtime_cbk and cluster_markeruuid_cbk) posted (#1) for review on master by Lukáš Bezdička (lukas.bezdicka)

Comment 19 Lukas Bezdicka 2014-01-30 11:32:01 UTC

==418== 635 (264 direct, 371 indirect) bytes in 1 blocks are definitely lost in loss record 194 of 239
==418==    at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==418==    by 0x4C577FC: __gf_default_calloc (mem-pool.h:75)
==418==    by 0x4C57C16: __gf_calloc (mem-pool.c:104)
==418==    by 0x4C41140: __inode_create (inode.c:531)
==418==    by 0x4C4122A: inode_new (inode.c:562)
==418==    by 0x6B5EA32: fuse_mkdir_resume (fuse-bridge.c:1450)
==418==    by 0x6B55D07: fuse_resolve_done (fuse-resolve.c:663)
==418==    by 0x6B55DDD: fuse_resolve_all (fuse-resolve.c:692)
==418==    by 0x6B55CD0: fuse_resolve (fuse-resolve.c:649)
==418==    by 0x6B55DB4: fuse_resolve_all (fuse-resolve.c:688)
==418==    by 0x6B55E3B: fuse_resolve_continue (fuse-resolve.c:708)
==418==    by 0x6B53FED: fuse_resolve_entry_cbk (fuse-resolve.c:97)
==418==    by 0xA2D4A7F: io_stats_lookup_cbk (io-stats.c:1479)
==418==    by 0xA0BCF0E: mdc_lookup_cbk (md-cache.c:780)
==418==    by 0x9CA96B0: qr_lookup_cbk (quick-read.c:438)
==418==    by 0x9A9597D: ioc_lookup_cbk (io-cache.c:255)
==418==    by 0x943A551: dht_lookup_everywhere_done (dht-common.c:805)
==418==    by 0x943B8FD: dht_lookup_everywhere_cbk (dht-common.c:1026)
==418==    by 0x91F6454: client3_3_lookup_cbk (client-rpc-fops.c:2630)
==418==    by 0x4EB5686: rpc_clnt_handle_reply (rpc-clnt.c:771)
==418==    by 0x4EB5A05: rpc_clnt_notify (rpc-clnt.c:891)
==418==    by 0x4EB2172: rpc_transport_notify (rpc-transport.c:497)
==418==    by 0x858E089: socket_event_poll_in (socket.c:2118)
==418==    by 0x858E51F: socket_event_handler (socket.c:2230)
==418==    by 0x4C83DFD: event_dispatch_epoll_handler (event-epoll.c:384)
==418==    by 0x4C83FE9: event_dispatch_epoll (event-epoll.c:445)
==418==    by 0x4C57694: event_dispatch (event.c:113)
==418==    by 0x4086DE: main (glusterfsd.c:1934)

Comment 20 Lukas Bezdicka 2014-01-30 12:17:47 UTC

==6400== 1,115,896 (627,840 direct, 488,056 indirect) bytes in 2,180 blocks are definitely lost in loss record 249 of 255
==6400==    at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==6400==    by 0x4C577FC: __gf_default_calloc (mem-pool.h:75)
==6400==    by 0x4C57C16: __gf_calloc (mem-pool.c:104)
==6400==    by 0x4C41140: __inode_create (inode.c:531)
==6400==    by 0x4C4122A: inode_new (inode.c:562)
==6400==    by 0x6B54093: fuse_resolve_entry (fuse-resolve.c:114)
==6400==    by 0x6B551B7: fuse_resolve_parent (fuse-resolve.c:309)
==6400==    by 0x6B55CA0: fuse_resolve (fuse-resolve.c:642)
==6400==    by 0x6B55D5F: fuse_resolve_all (fuse-resolve.c:681)
==6400==    by 0x6B55E79: fuse_resolve_and_resume (fuse-resolve.c:721)
==6400==    by 0x6B5816D: fuse_lookup (fuse-bridge.c:573)
==6400==    by 0x6B72696: fuse_thread_proc (fuse-bridge.c:4607)
==6400==    by 0x3375207F32: start_thread (pthread_create.c:309)
==6400==    by 0x3374AF4EAC: clone (clone.S:111)


==6400== 10,394 bytes in 2,187 blocks are definitely lost in loss record 210 of 255
==6400==    at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==6400==    by 0x4C577FC: __gf_default_calloc (mem-pool.h:75)
==6400==    by 0x4C57C16: __gf_calloc (mem-pool.c:104)
==6400==    by 0x4C3FF63: gf_strdup (mem-pool.h:140)
==6400==    by 0x4C40F80: __dentry_create (inode.c:489)
==6400==    by 0x4C41B26: __inode_link (inode.c:852)
==6400==    by 0x4C41C9C: inode_link (inode.c:892)
==6400==    by 0x6B53FBE: fuse_resolve_entry_cbk (fuse-resolve.c:89)
==6400==    by 0xA563A7F: io_stats_lookup_cbk (io-stats.c:1479)
==6400==    by 0xA34BF0E: mdc_lookup_cbk (md-cache.c:780)
==6400==    by 0x9F386B0: qr_lookup_cbk (quick-read.c:438)
==6400==    by 0x9D2497D: ioc_lookup_cbk (io-cache.c:255)
==6400==    by 0x96C7E56: dht_lookup_dir_cbk (dht-common.c:498)
==6400==    by 0x947E593: afr_lookup_done (afr-common.c:1977)
==6400==    by 0x947F144: afr_lookup_cbk (afr-common.c:2208)
==6400==    by 0x91F6454: client3_3_lookup_cbk (client-rpc-fops.c:2630)
==6400==    by 0x4EB5686: rpc_clnt_handle_reply (rpc-clnt.c:771)
==6400==    by 0x4EB5A05: rpc_clnt_notify (rpc-clnt.c:891)
==6400==    by 0x4EB2172: rpc_transport_notify (rpc-transport.c:497)
==6400==    by 0x858E089: socket_event_poll_in (socket.c:2118)
==6400==    by 0x858E51F: socket_event_handler (socket.c:2230)
==6400==    by 0x4C83DFD: event_dispatch_epoll_handler (event-epoll.c:384)
==6400==    by 0x4C83FE9: event_dispatch_epoll (event-epoll.c:445)
==6400==    by 0x4C57694: event_dispatch (event.c:113)
==6400==    by 0x4086DE: main (glusterfsd.c:1934)

Comment 21 Lukas Bezdicka 2014-01-30 14:53:48 UTC

==13552== 18,224 (7,056 direct, 11,168 indirect) bytes in 49 blocks are definitely lost in loss record 269 of 288
==13552==    at 0x4A081D4: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==13552==    by 0x4C4E655: __gf_calloc (mem-pool.h:75)
==13552==    by 0x4C3E59D: __inode_create (inode.c:531)
==13552==    by 0x4C3EB9A: inode_new (inode.c:562)
==13552==    by 0x6B527F4: fuse_create_resume (fuse-bridge.c:1942)
==13552==    by 0x6B48ED5: fuse_resolve_all (fuse-resolve.c:663)
==13552==    by 0x6B48C17: fuse_resolve (fuse-resolve.c:649)
==13552==    by 0x6B48F1D: fuse_resolve_all (fuse-resolve.c:688)
==13552==    by 0x6B483A2: fuse_resolve_continue (fuse-resolve.c:708)
==13552==    by 0x6B4860E: fuse_resolve_entry_cbk (fuse-resolve.c:97)
==13552==    by 0x98DBAC0: io_stats_lookup_cbk (io-stats.c:1505)
==13552==    by 0x96A224B: dht_lookup_everywhere_done (dht-common.c:817)

Comment 22 Lukas Bezdicka 2014-01-30 15:11:00 UTC

ignore all the inode_new leaks, they clean up by "echo 3 > /proc/sys/vm/drop_caches" so I was chasing my tail with that. only leak is the one with patch sent.

Comment 23 Anand Avati 2014-01-30 15:23:20 UTC

REVIEW: http://review.gluster.org/6872 (libxlator: fix memleak in cluster_markerxtime_cbk and cluster_markeruuid_cbk) posted (#1) for review on release-3.5 by Lukáš Bezdička (lukas.bezdicka)

Comment 24 Anand Avati 2014-01-30 15:24:38 UTC

REVIEW: http://review.gluster.org/6873 (libxlator: fix memleak in cluster_markerxtime_cbk and cluster_markeruuid_cbk) posted (#1) for review on release-3.4 by Lukáš Bezdička (lukas.bezdicka)

Comment 25 Anand Avati 2014-02-04 19:29:42 UTC

COMMIT: http://review.gluster.org/6850 committed in master by Anand Avati (avati) 
------
commit 2f8a8cfc27e6a73ff0238681ea40563c8768d4e6
Author: Lukas Bezdicka <lukas.bezdicka>
Date:   Wed Jan 29 11:20:08 2014 +0100

    libxlator: fix memleak in cluster_markerxtime_cbk and cluster_markeruuid_cbk
    
    Change-Id: Ide3574bf9f661b077f707990d47821feda18b1a3
    BUG: 841617
    Reviewed-on: http://review.gluster.org/6850
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 26 Anand Avati 2014-02-07 16:53:49 UTC

COMMIT: http://review.gluster.org/6872 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 358b88087051d3b2534446c62f6bbac472e9198b
Author: Lukas Bezdicka <lukas.bezdicka>
Date:   Wed Jan 29 11:20:08 2014 +0100

    libxlator: fix memleak in cluster_markerxtime_cbk and cluster_markeruuid_cbk
    
    Change-Id: Ide3574bf9f661b077f707990d47821feda18b1a3
    BUG: 841617
    Reviewed-on: http://review.gluster.org/6872
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 27 Anand Avati 2014-02-13 13:35:17 UTC

REVIEW: http://review.gluster.org/6873 (libxlator: fix memleak in cluster_markerxtime_cbk and cluster_markeruuid_cbk) posted (#2) for review on release-3.4 by Lukáš Bezdička (lukas.bezdicka)

Comment 28 Anand Avati 2014-02-20 18:06:12 UTC

COMMIT: http://review.gluster.org/6873 committed in release-3.4 by Kaleb KEITHLEY (kkeithle) 
------
commit e8fca12ee9cc51ac8a4265454753a3c27dae93f8
Author: Lukas Bezdicka <lukas.bezdicka>
Date:   Wed Jan 29 11:20:08 2014 +0100

    libxlator: fix memleak in cluster_markerxtime_cbk and cluster_markeruuid_cbk
    
    Change-Id: Ide3574bf9f661b077f707990d47821feda18b1a3
    BUG: 841617
    Reviewed-on: http://review.gluster.org/6873
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 29 Lukas Bezdicka 2014-02-21 09:39:39 UTC

After applying patch to production we no longer leak and it's running smoothly.

Comment 30 Niels de Vos 2014-04-17 13:12:13 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.4.3, please reopen this bug report.

glusterfs-3.4.3 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.4.3. In the same line the recent release i.e. glusterfs-3.5.0 [3] likely to have the fix. You can verify this by reading the comments in this bug report and checking for comments mentioning "committed in release-3.5".

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978
[2] http://news.gmane.org/gmane.comp.file-systems.gluster.user
[3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137

Note You need to log in before you can comment on or make changes to this bug.