Bug 1126831

Summary: Memory leak in GlusterFs client
Product: [Community] GlusterFS Reporter: Poornima G <pgurusid>
Component: quick-readAssignee: Soumya Koduri <skoduri>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 3.7.6CC: bugs, kkeithle, krzysztof.pawlowski, kwade, markd, monotek23, pgurusid, primeroznl, rgowdapp, sama.bharat, wade.fitzpatrick
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-08 10:51:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Statedump of the Process leaking memory none

Description Poornima G 2014-08-05 11:51:05 UTC
Created attachment 924189 [details]
Statedump of the Process leaking memory

Description of problem:
Small cluster is exported to windows clients via smb though local mount.
The GlusterFS client process uses huge memory:
 2464 root      20   0 16.8g  11g 1292 S    0 71.3  33:09.14
/usr/sbin/glusterfs --direct-io-mode=disable --volfile-id=/w-vol
--volfile-server=gl0 /W/Projects
  2212 root      20   0 1491m  25m  844 S    3  0.2  18:37.37
/usr/sbin/glusterfsd -s gl0 --volfile-id w-vol.gl0.mnt-brick1-export -p
/var/lib/glusterd/vols/w-v

The mount process log (W-Projects.log) is full off these errors:

/3.4.4/xlator/performance/quick-read.so(qr_readv+0x62)
[0x7fcb6d5852e2]))) 0-iobuf: invalid argument: iobref
[2014-07-30 17:04:24.082004] E [iobuf.c:733:iobuf_unref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/debug/io-stats.so(io_stats_readv+0x182)
[0x7fcb6d161022]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/md-cache.so(mdc_readv+0x16b)
[0x7fcb6d377d0b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/quick-read.so(qr_readv+0x62)
[0x7fcb6d5852e2]))) 0-iobuf: invalid argument: iobuf
[2014-07-30 17:04:24.082036] E [iobuf.c:828:iobref_unref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/debug/io-stats.so(io_stats_readv+0x182)
[0x7fcb6d161022]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/md-cache.so(mdc_readv+0x16b)
[0x7fcb6d377d0b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/quick-read.so(qr_readv+0x62)
[0x7fcb6d5852e2]))) 0-iobuf: invalid argument: iobref
[2014-07-30 17:04:24.085388] E [iobuf.c:733:iobuf_unref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/debug/io-stats.so(io_stats_readv+0x182)
[0x7fcb6d161022]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/md-cache.so(mdc_readv+0x16b)
[0x7fcb6d377d0b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/quick-read.so(qr_readv+0x62)
[0x7fcb6d5852e2]))) 0-iobuf: invalid argument: iobuf
[2014-07-30 17:04:24.085419] E [iobuf.c:828:iobref_unref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/debug/io-stats.so(io_stats_readv+0x182)
[0x7fcb6d161022]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/md-cache.so(mdc_readv+0x16b)
[0x7fcb6d377d0b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/quick-read.so(qr_readv+0x62)
[0x7fcb6d5852e2]))) 0-iobuf: invalid argument: iobref
[2014-07-30 17:04:24.246769] E [iobuf.c:733:iobuf_unref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/debug/io-stats.so(io_stats_readv+0x182)
[0x7fcb6d161022]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/md-cache.so(mdc_readv+0x16b)
[0x7fcb6d377d0b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/quick-read.so(qr_readv+0x62)
[0x7fcb6d5852e2]))) 0-iobuf: invalid argument: iobuf


rebalance.log:

[2014-07-06 22:28:56.460558] I [dht-common.c:2614ht_setxattr]
0-w-vol-dht: fixing the layout of
/_torolni/60_Elements/Prod/T06_APPL/T06_APPL_PACKSHOT/T06_APPL_PACKSHOT_010/co/cp
[2014-07-06 22:28:56.463787] I [dht-common.c:2614ht_setxattr]
0-w-vol-dht: fixing the layout of
/_torolni/60_Elements/Prod/T06_APPL/T06_APPL_PACKSHOT/T06_APPL_PACKSHOT_010/co/cp/T06_APPL_060-co_cp_v001
[2014-07-06 22:28:56.466934] W [dht-common.c:419ht_lookup_dir_cbk]
0-w-vol-dht:
/_torolni/60_Elements/Prod/T06_APPL/T06_APPL_PACKSHOT/T06_APPL_PACKSHOT_010/co/cp/T06_APPL_060-co_cp_v001/f_800x800:
gfid different on w-vol-client-1
[2014-07-06 22:28:56.467127] I [dht-common.c:2614ht_setxattr]
0-w-vol-dht: fixing the layout of
/_torolni/60_Elements/Prod/T06_APPL/T06_APPL_PACKSHOT/T06_APPL_PACKSHOT_010/co/cp/T06_APPL_060-co_cp_v001/f_800x800
[2014-07-06 22:28:56.467553] W
[client-rpc-fops.c:1044:client3_3_setxattr_cbk] 0-w-vol-client-1: remote
operation failed: No such file or directory
[2014-07-06 22:28:56.468787] W
[client-rpc-fops.c:1994:client3_3_setattr_cbk] 0-w-vol-client-1: remote
operation failed: No such file or directory
[2014-07-06 22:28:56.468892] E [dht-selfheal.c:1015ht_dir_attr_heal]
0-dht: Failed to set uid/gid on
/_torolni/60_Elements/Prod/T06_APPL/T06_APPL_PACKSHOT/T06_APPL_PACKSHOT_010/co/cp/T06_APPL_060-co_cp_v001/f_800x800
on w-vol-client-1 subvol (No such file or directory)
[2014-07-06 22:28:56.468906] W
[client-rpc-fops.c:2541:client3_3_opendir_cbk] 0-w-vol-client-1: remote
operation failed: No such file or directory. Path:
/_torolni/60_Elements/Prod/T06_APPL/T06_APPL_PACKSHOT/T06_APPL_PACKSHOT_010/co/cp/T06_APPL_060-co_cp_v001/f_800x800
(00000000-0000-0000-0000-000000000000)
[2014-07-06 22:28:56.470407] W
[client-rpc-fops.c:2329:client3_3_readdirp_cbk] 0-w-vol-client-1: remote
operation failed: No such file or directory
[2014-07-06 22:28:56.518470] I
[dht-rebalance.c:1729:gf_defrag_status_get] 0-glusterfs: Rebalance is
completed. Time taken is 1350.00 secs
[2014-07-06 22:28:56.518494] I
[dht-rebalance.c:1732:gf_defrag_status_get] 0-glusterfs: Files migrated:
0, size: 0, lookups: 0, failures: 0, skipped: 0
[2014-07-06 22:28:56.527984] W [glusterfsd.c:1002:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f71868543fd]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f7186b27e9a]
(-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xc5) [0x7f7187611e75])))
0-: received signum (15), shutting down



Version-Release number of selected component (if applicable):
3.5.2, also seen in 3.4.4

How reproducible:
No steps to reproduce.

Steps to Reproduce:
1.
2.
3.

Actual results:
OOM kill

Expected results:
OOM kill is not expected

Additional info:

Comment 1 Poornima G 2014-08-05 11:52:55 UTC
From the statedump it is evident that the iobufs are leaking.
Also the hot count of the pool-name=w-vol-io-cache:rbthash_entry_t is 10053, implies io-cache xlator could be the cause of the leak.
From the logs, it looks like, quick-read performance xlator is calling iobuf_free with NULL pointers, implies quick-read could be leaking iobufs as well.

As a temperory solution, could you disable io-cache and/or quick-read and see if the leak still persists?

$gluster volume set io-cache off
$gluster volume set quick-read off

This may reduce the performance to certain extent.

Regards,
Poornima

I too was experiencing this issue on my bricks. I am using version 3.5.2 and after setting io-cache and quick-read to "off", as suggested, I am no longer seeing glusterfs gobbling memory. I noticed it first when I enabled quotas and during the quota-crawl glusterfs process would be OOM killed by the kernel. Before, my bricks would consume all available memory until swap was exhausted and the kernel OOMs the glusterfs process. There is a rebalance running right now and glusterfs is behaving. Here is some output of my current config. Let me know if I can provide anything else to help.

Comment 2 Justin Clift 2014-09-02 23:17:31 UTC
Raising the priority of this a bit, as it sounds like a memory leak we want to get rid of in near future. :)

We should also add Tamas Papp to the CC list, as he reported this upstream so may be able to test potential fixes for us. (I don't seem to be able to at the moment :()

Comment 3 Kaleb KEITHLEY 2014-09-11 10:57:27 UTC
I've been running a 4×2 dist/repl cluster with io-cache and quick-read set to off.

In <24 hours the RSZ and VSZ of the client-side glusterfs (fuse bridge) have grown continuously and appear to be leaking memory.

You can see the data at http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longevity/client.out

N.B. this is running in a new "longevity cluster" running here in the lab inside Red Hat.

Comment 4 Wade Fitzpatrick 2015-10-28 04:52:05 UTC
I think we have hit this bug too in gluster-3.7.5. There is nothing interesting in the client logs.

core@comet ~ $ top
top - 13:07:04 up 1 day, 19:03,  1 user,  load average: 0.30, 0.23, 0.22
Tasks: 160 total,   2 running, 158 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.7 us,  0.7 sy,  0.0 ni, 95.8 id,  0.1 wa,  0.1 hi,  0.6 si,  0.0 st
KiB Mem:  32978184 total, 32746628 used,   231556 free,   105032 buffers
KiB Swap: 16777212 total,   220340 used, 16556872 free.  1002344 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  742 root      20   0 29.179g 0.028t   3404 S  78.8 91.2 412:29.02 glusterfs
  706 root      20   0  810552 598904   4780 S  32.8  1.8  83:50.18 fleetd
  902 root      20   0  574400 402476   8552 R  26.3  1.2 193:03.93 ringcap
26527 root      20   0   18900   4104   1636 S  13.1  0.0   0:00.38 rsync
  755 root      20   0  869676  60408   6644 S   6.6  0.2   4:54.66 node

We have 93 identical client hosts but only this one (which runs dropboxd on ext4 and rsyncs data to the gluster volume) has exhibited the memory leak. I have set quick-read and io-cache off on the volume so will see how it progresses.

# gluster volume info static

Volume Name: static
Type: Striped-Replicate
Volume ID: 3f9f810d-a988-4914-a5ca-5bd7b251a273
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: james:/data/gluster1/static/brick1
Brick2: cupid:/data/gluster1/static/brick2
Brick3: hilton:/data/gluster1/static/brick3
Brick4: present:/data/gluster1/static/brick4
Options Reconfigured:
performance.quick-read: off
performance.io-cache: off
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
changelog.rollover-time: 10
changelog.fsync-interval: 3

Comment 5 Jiffin 2016-03-01 12:40:46 UTC
They are lots of memleak related bugs are going to be fixed in 3.7.9.Can u check the same after that

Comment 6 Raghavendra G 2017-02-14 17:49:14 UTC
(In reply to Poornima G from comment #1)
> From the logs, it looks like, quick-read performance xlator is calling
> iobuf_free with NULL pointers, implies quick-read could be leaking iobufs as
> well.

Patch [1] adds check before unref. So, we no longer should see these messages. Also, just the presence of these logs cannot prove quick-read is leaking iobufs. The reason is, 

* quick-read prunes an inode even after data is cached and an qr_inode_prune deletes the data from qr_inode.
* qr_readv calls qr_readv_cached if it finds qr_inode set in inode ctx. However this is not a guarantee that data is present as a qr_inode_prune might be called on this inode.
* qr_readv_cached used to (before [1]) unconditionally call iobuf_unref/iobref_unref which resulted in the log messages. Note that no iobuf/iobref is allocated in this codepath as there is no data in qr_inode. Hence its not a leak.

However there is one leak of "content" allocated. See, the definition of qr_content_update:

void
qr_content_update (xlator_t *this, qr_inode_t *qr_inode, void *data,
                   struct iatt *buf)
{
        qr_private_t      *priv = NULL;
       	qr_inode_table_t  *table = NULL;

        priv = this->private;
        table = &priv->table;

	LOCK (&table->lock);
        {
                __qr_inode_prune (table, qr_inode);

		qr_inode->data = data;

As can be seen above qr_inode->data is unconditionally set. So, if there is a data already present, "data" will leak. However, qr_lookup won't set GF_CONTENT_KEY if a qr_inode is already set in inode ctx and hence there not too many cases where we find file content in xdata in qr_lookup_cbk (which calls this function). But if there are simultaneous parallel fresh lookups (where qr_lookup happening on the same file, we'll have more than one lookup response containing data. This in turn can result in qr_content_update invoked on a qr_inode which already has a non-NULL data and can result in a leak. Please note that this is a race that can be hit only with more than one parallel fresh lookup on a file. So, this leak _MAY_ not amount to large memory.

[1] http://review.gluster.org/10206

Comment 7 Kaushal 2017-03-08 10:51:41 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Comment 8 Red Hat Bugzilla 2023-09-14 02:45:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days