Bug 1657743

Summary:

Very high memory usage (25GB) on Gluster FUSE mountpoint

Product:

[Community] GlusterFS

Reporter:

ryan

Component:

fuse

Assignee:

Sunny Kumar <sunkumar>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

3.12

CC:

bugs, gdeschner, jahernan, pasik, pkarampu, sunkumar

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-02-10 08:59:43 UTC

Type:

Bug

Regression:

---

Mount Type:

fuse

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
PMAP for Gluster FUSE process	none
Gluster dump of thread with high ram usage	none
Gluster dump of thread with high ram usage - 10GB	none

Description ryan 2018-12-10 11:07:49 UTC

Description of problem:
Very high memory usage (25GB) on Gluster fuse mountpoint.
Process has been running for around 60 hours.
We're seeing this issue on multiple nodes and on multiple clusters.

Previously we had been using the glusterfs_vfs module for Samba, however due to memory issues we moved to FUSE. More about the VFS issues can be found here:
-https://bugzilla.samba.org/show_bug.cgi?id=13694
-https://access.redhat.com/solutions/2969381

Version-Release number of selected component (if applicable):
3.12.14

How reproducible:
Occurs very frequently.
Application using the cluster is a DB based video file ingest + transcoding system with multiple worker nodes.

Steps to Reproduce:
1.Mount gluster volume via FUSE mountpoint
2.Share FUSE mountpoint via SMB
3.Wait for memory usage to steadily rise

Actual results:
Process slowly uses more memory until all system memory is used. Dmesg reports that the thread was killed due to high memory usage

Expected results:
Memory usage is consistent and does not consume all system memory


Additional info:
Statedumps for all bricks show around 300-400MB of usage per-brick process.

Gluster volume info output:
Volume Name: mcv01
Type: Distribute
Volume ID: aa451513-0d2e-4216-97c2-966dc6ca8b1d
Status: Started
Snapshot Count: 0
Number of Bricks: 15
Transport-type: tcp
Bricks:
Brick1: node-wip01:/mnt/h1a/data
Brick2: node-wip02:/mnt/h1a/data
Brick3: node-wip03:/mnt/h1a/data
Brick4: node-wip01:/mnt/h2a/data
Brick5: node-wip02:/mnt/h2a/data
Brick6: node-wip03:/mnt/h2a/data
Brick7: node-wip01:/mnt/h3a/data
Brick8: node-wip02:/mnt/h3a/data
Brick9: node-wip03:/mnt/h3a/data
Brick10: node-wip01:/mnt/h4a/data
Brick11: node-wip02:/mnt/h4a/data
Brick12: node-wip03:/mnt/h4a/data
Brick13: node-wip01:/mnt/h5a/data
Brick14: node-wip02:/mnt/h5a/data
Brick15: node-wip03:/mnt/h5a/data
Options Reconfigured:
performance.parallel-readdir: on
performance.nl-cache: on
performance.nl-cache-timeout: 600
cluster.lookup-optimize: off
performance.client-io-threads: on
client.event-threads: 4
server.event-threads: 4
storage.batch-fsync-delay-usec: 0
performance.write-behind-window-size: 1MB
performance.md-cache-timeout: 600
performance.cache-samba-metadata: on
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-size: 100MB
performance.io-thread-count: 32
server.allow-insecure: on
transport.address-family: inet
nfs.disable: on

Comment 1 ryan 2018-12-10 11:10:54 UTC

Created attachment 1513027 [details]
PMAP for Gluster FUSE process

Comment 2 ryan 2018-12-11 09:42:09 UTC

We're getting to the point where the FUSE mount is being killed off every day due to OOM.
I've tried dropping the caches however the process is still using 26GB of ram.

I will attach statedumps shortly.

Many thanks,
Ryan

Comment 3 Amar Tumballi 2018-12-11 13:16:36 UTC

Can you try 'gluster volume set mcv01 readdir-ahead off' and see if anything changes?

Comment 4 ryan 2018-12-11 16:54:16 UTC

Hi Amar,

I have made this change now and will update the ticket within 24 hours to let you know what sort of memory usage we're seeing.


Best,
Ryan

Comment 5 ryan 2018-12-12 16:44:41 UTC

Hi Amar,

Unfortunately the memory usage has increased to 32GB for that particular mount-point.
There seems to be a theme of high memory usage when databases are scanning/using the Shares on that mount, not sure if that could help shed some light?

Please let me know what data you need to assist.
Many thanks,
Ryan

Comment 6 Amar Tumballi 2018-12-13 05:13:36 UTC

Thanks for the update!

One of the other way to look at the leak is to see statedumps.


Send 'kill -USR1 <glusterfs PID>' at the beginning, and collect statedump again when the leak is high.

The statedump files are located @ /var/run/gluster/glusterdump.$pid.$timestamp

Comment 7 ryan 2018-12-13 10:52:26 UTC

Created attachment 1513980 [details]
Gluster dump of thread with high ram usage

Comment 8 ryan 2018-12-13 10:54:23 UTC

Hi Amar,

I've posted the Glusterdump you requested. This node had only been handling Samaba requests for around 1hr, so the usage isn't as high as it usually is.
The process that I initially posted about has been killed due to OOM. I will post another Glusterdump later on today when the usage is a bit higher for comparison.
Apologies, I've had to redact file paths from this dump.

Many thanks,
Ryan

Comment 9 ryan 2018-12-13 15:05:34 UTC

Created attachment 1514091 [details]
Gluster dump of thread with high ram usage - 10GB

Around 4-5HRs has passed and the process is now at ~10GB of memory usage. For some reason there's no file paths in this dump.

Comment 10 ryan 2019-07-03 16:17:46 UTC

Hi All,

We're still seeing this issue.
Can we provide any more data to assist in resolving the FUSE mem leak?

Best,
Ryan

Comment 11 Nithya Balachandran 2019-07-04 05:51:29 UTC

(In reply to ryan from comment #10)
> Hi All,
> 
> We're still seeing this issue.
> Can we provide any more data to assist in resolving the FUSE mem leak?
> 
> Best,
> Ryan

Hi Ryan,


Do you see any errors in the fuse mount log? Would you be able to send it across?

There have been some memory leak fixes in this code path:

https://review.gluster.org/#/c/glusterfs/+/20855/2/xlators/protocol/client/src/client-helpers.c
https://review.gluster.org/#/c/glusterfs/+/22741/12/xlators/protocol/client/src/client-helpers.c

These would fix any cleanup issues in case of errors.

Regards,
Nithya

Comment 12 ryan 2019-07-04 11:10:02 UTC

Hi Nithya,

Yes, there do seem to be a fair amount of errors on the mount.
[2019-07-01 09:23:38.178838] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 2-mcv01-client-1: server 172.30.30.2:49157 has not responded in the last 5 seconds, disconnecting.
[2019-07-01 09:23:38.178931] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 2-mcv01-client-2: server 172.30.30.3:49155 has not responded in the last 5 seconds, disconnecting.
[2019-07-01 09:23:38.179780] E [rpc-clnt.c:348:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f878a0ff2eb] (--> /lib64/libgfrpc.so.0(+0xcd6e)[0x7f8789ec8d6e] (--> /lib64/libgfrpc.so.0(+0xce8e)[0x7f8789ec8e8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8d)[0x7f8789eca4dd] (--> /lib64/libgfrpc.so.0(+0xf048)[0x7f8789ecb048] ))))) 2-mcv01-client-1: forced unwinding frame type(GlusterFS 4.x v1) op(OPENDIR(20)) called at 2019-07-01 09:23:32.334731 (xid=0x267b3d0f)
[2019-07-01 09:23:38.179809] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard (b58b070a-0915-4950-ac7d-06ba95d2f098) [Transport endpoint is not connected]
[2019-07-01 09:23:38.179853] E [rpc-clnt.c:348:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f878a0ff2eb] (--> /lib64/libgfrpc.so.0(+0xcd6e)[0x7f8789ec8d6e] (--> /lib64/libgfrpc.so.0(+0xce8e)[0x7f8789ec8e8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8d)[0x7f8789eca4dd] (--> /lib64/libgfrpc.so.0(+0xf048)[0x7f8789ecb048] ))))) 2-mcv01-client-2: forced unwinding frame type(GlusterFS 4.x v1) op(OPENDIR(20)) called at 2019-07-01 09:23:32.334756 (xid=0x26fbbfee)
[2019-07-01 09:23:38.179886] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-2: remote operation failed. Path: /data/shard (b58b070a-0915-4950-ac7d-06ba95d2f098) [Transport endpoint is not connected]
[2019-07-01 09:23:38.179970] E [rpc-clnt.c:348:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f878a0ff2eb] (--> /lib64/libgfrpc.so.0(+0xcd6e)[0x7f8789ec8d6e] (--> /lib64/libgfrpc.so.0(+0xce8e)[0x7f8789ec8e8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8d)[0x7f8789eca4dd] (--> /lib64/libgfrpc.so.0(+0xf048)[0x7f8789ecb048] ))))) 2-mcv01-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-07-01 09:23:32.334740 (xid=0x267b3d10)
[2019-07-01 09:23:38.180114] E [rpc-clnt.c:348:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f878a0ff2eb] (--> /lib64/libgfrpc.so.0(+0xcd6e)[0x7f8789ec8d6e] (--> /lib64/libgfrpc.so.0(+0xce8e)[0x7f8789ec8e8e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8d)[0x7f8789eca4dd] (--> /lib64/libgfrpc.so.0(+0xf048)[0x7f8789ecb048] ))))) 2-mcv01-client-2: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-07-01 09:23:32.334764 (xid=0x26fbbfef)
[2019-06-27 19:16:41.981791] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/190130_HSBA-S061_Customer Films/Media/Media/30012019_Recordings (3d6f3e6a-16ee-452e-8d88-7051262c7f96) [Transport endpoint is not connected]
[2019-06-27 19:16:41.992288] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/180703_HSBA-A456_John Flint Year in the Life_SM/Media/6. 1Q18 Results Call (9afd325f-952d-41ce-b3eb-f648e1b39ecf) [Transport endpoint is not connected]
[2019-06-27 19:16:42.004034] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/190313_HSB1-a487_The Edge GLC Films_jp/Media/Raw FIles/5279P HSBC GLC Films Day01 FS7 Card01/.Spotlight-V100/Store-V2/0B66EF32-49B7-476E-A0F4-6481613ECFA4/journals.live (b020e503-c4bc-4821-b75b-1fb37a2e3b0b) [Transport endpoint is not connected]
[2019-06-27 19:16:42.015580] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV Rushes/VANCOUVER/EXPORT (c8083b1d-d4e2-4a81-8fb4-dd76c92902e7) [Transport endpoint is not connected]
[2019-06-27 19:16:42.025179] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/180703_HSBA-A456_John Flint Year in the Life_SM/Media/5. AGM 2018 (Jack Morton)/Main Cam (662050a4-505f-438c-8375-e7dfced3ff47) [Transport endpoint is not connected]
[2019-06-27 19:16:42.034689] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV Rushes/SHANGHAI/DCIM/100EOS5D (eb6d15e3-ae6f-4204-88b5-edfbea106d35) [Transport endpoint is not connected]
[2019-06-27 19:16:42.280538] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/180703_HSBA-A456_John Flint Year in the Life_SM/Media/18_Extra clips - 071218/YITL L42 ALL SHOTS/JPEGS (cb59fca5-70d0-4f1a-9bf2-ec2f7c2dfc7d) [Transport endpoint is not connected]
[2019-06-27 19:16:42.502718] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV Rushes/SINGAPORE/DCIM/100EOS5D (471c59c9-8b18-4c71-99d3-607a469eaebd) [Transport endpoint is not connected]
[2019-06-27 19:16:42.626356] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV Rushes/SYDNEY/HSBC_SYDENY_OCT_2015/5D (060b7bd7-1f18-48bd-97f6-8b6b7f61b3af) [Transport endpoint is not connected]
[2019-06-27 19:16:42.648426] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/180813_HSBA-S019_HSBC_NOW_GV Archive_SP/Media/OPOS GVs/GV Rushes/NEW YORK/HSBC_Broll/Card 4/CONTENTS (9044fb07-3895-4886-894f-2269a9d1db56) [Transport endpoint is not connected]
[2019-06-27 19:16:42.661578] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 2-mcv01-client-1: remote operation failed. Path: /data/shard/CLIENT WORK/HSBC/180703_HSBA-A456_John Flint Year in the Life_SM/Media/18_Extra clips - 071218/YITL L42 ALL SHOTS/DCIM/100EOS5D (7a0f5383-cb22-4a52-aa0a-19f004c29ee7) [Transport endpoint is not connected]

We haven't any issues with the cluster reported, and when looking into the transport endpoint error, I can't see any disconnects. I've grepped through the brick logs to look for events at the time of some of the remote operation failures and the only thing i've found is:
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.265374] I [addr.c:55:compare_addr_and_update] 0-/mnt/ctdb/data: allowed = "*", received addr = "172.30.30.1"
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.265405] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 2af5fd84-b825-4df9-af5d-4c565930aec9
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.265417] I [MSGID: 115029] [server-handshake.c:495:server_setvolume] 0-ctdbv01-server: accepted client from CTX_ID:2b9c50e2-60d2-4973-832f-2e8eea597c83-GRAPH_ID:0-PID:176234-HOST:mc-ldn-mcn01-PC_NAME:ctdbv01-client-4-RECON_NO:-0 (version: 4.1.8)
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.287664] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-ctdbv01-server: disconnecting connection from CTX_ID:2b9c50e2-60d2-4973-832f-2e8eea597c83-GRAPH_ID:0-PID:176234-HOST:mc-ldn-mcn01-PC_NAME:ctdbv01-client-4-RECON_NO:-0
mnt-ctdb-data.log-20190630:[2019-06-27 19:16:32.287768] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-ctdbv01-server: Shutting down connection CTX_ID:2b9c50e2-60d2-4973-832f-2e8eea597c83-GRAPH_ID:0-PID:176234-HOST:mc-ldn-mcn01-PC_NAME:ctdbv01-client-4-RECON_NO:-0
mnt-h1a-data.log-20190630:[2019-06-27 19:16:31.741300] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x24c920bf, Program: GF-DUMP, ProgVers: 1, Proc: 2) to rpc-transport (tcp.mcv01-server)

What version of Gluster are these code improvements available in? We're testing 6.3 at the moment.

Many thanks for your help,
Ryan

Comment 13 Nithya Balachandran 2019-07-10 06:26:02 UTC

Hi Ryan,


[2019-07-01 09:23:38.178838] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 2-mcv01-client-1: server 172.30.30.2:49157 has not responded in the last 5 seconds, disconnecting.

I think the ping timer is set way too low here.  The default is 42 seconds.

Can you provide the gluster volume info for the volume?


As to the errors, if the issue is what I think it is , I would expect to see  "failed to unserialize xattr dict" errors in the client log. Do you see any such errors?

Comment 14 ryan 2019-07-10 07:05:31 UTC

Hi Nithya,

We set the value of the ping timer lower to reduce the delay in failover when a node or storage device fails, normally we only put it on clusters that are distribute-replicate, but seems we've put it on a distribute too.
Am I misunderstanding the role of that option? Would you be able to quickly explain its usage?

I've grep'd all gluster logs for that string and cannot find any occurrences of it. I've also just grep'd for the string 'failed' and cannot find "failed to unserialize xattr dict" in the mount point logs.

Here's the gluster vol info you requested:
Volume Name: mcv01
Type: Distribute
Volume ID: 66d50a1a-7c87-4712-8d7b-bddb19d76498
Status: Started
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: node01:/mnt/h1a/data
Brick2: node02:/mnt/h1a/data
Brick3: node03:/mnt/h1a/data
Options Reconfigured:
server.outstanding-rpc-limit: 128
performance.readdir-ahead: off
cluster.rebal-throttle: lazy
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
transport.address-family: inet
auth.allow: 172.30.30.*
performance.client-io-threads: on
performance.write-behind-window-size: 1MB
performance.nl-cache-timeout: 600
performance.nl-cache: on
performance.io-thread-count: 16
performance.md-cache-timeout: 600
performance.cache-samba-metadata: on
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.stat-prefetch: on
performance.cache-size: 1000MB
storage.batch-fsync-delay-usec: 0
network.ping-timeout: 5
performance.md-cache-statfs: off
client.event-threads: 8
server.event-threads: 8
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR

Many thanks,
Ryan

Comment 15 Nithya Balachandran 2019-07-10 07:40:26 UTC

(In reply to ryan from comment #14)
> Hi Nithya,
> 
> We set the value of the ping timer lower to reduce the delay in failover
> when a node or storage device fails, normally we only put it on clusters
> that are distribute-replicate, but seems we've put it on a distribute too.
> Am I misunderstanding the role of that option? Would you be able to quickly
> explain its usage?


I found an old email thread that explains this in detail. Please go through https://lists.gluster.org/pipermail/gluster-users/2017-December/033123.html



> 
> I've grep'd all gluster logs for that string and cannot find any occurrences
> of it. I've also just grep'd for the string 'failed' and cannot find "failed
> to unserialize xattr dict" in the mount point logs.


That is probably because these logs are logged as Warnings but the client-log-level is set to ERROR.
Can you set it to WARNING, try to reproduce the issue and search the logs again?


Any information that would help us reproduce this in house would be appreciated. It is difficult to debug memleaks otherwise.


> 
> Here's the gluster vol info you requested:
> Volume Name: mcv01
> Type: Distribute
> Volume ID: 66d50a1a-7c87-4712-8d7b-bddb19d76498
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3
> Transport-type: tcp
> Bricks:
> Brick1: node01:/mnt/h1a/data
> Brick2: node02:/mnt/h1a/data
> Brick3: node03:/mnt/h1a/data
> Options Reconfigured:
> server.outstanding-rpc-limit: 128
> performance.readdir-ahead: off
> cluster.rebal-throttle: lazy
> features.quota-deem-statfs: on
> features.inode-quota: on
> features.quota: on
> nfs.disable: on
> transport.address-family: inet
> auth.allow: 172.30.30.*
> performance.client-io-threads: on
> performance.write-behind-window-size: 1MB
> performance.nl-cache-timeout: 600
> performance.nl-cache: on
> performance.io-thread-count: 16
> performance.md-cache-timeout: 600
> performance.cache-samba-metadata: on
> performance.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> performance.stat-prefetch: on
> performance.cache-size: 1000MB
> storage.batch-fsync-delay-usec: 0
> network.ping-timeout: 5
> performance.md-cache-statfs: off
> client.event-threads: 8
> server.event-threads: 8
> diagnostics.client-log-level: ERROR
> diagnostics.brick-log-level: ERROR
> 
> Many thanks,
> Ryan

Comment 16 ryan 2019-07-10 08:00:22 UTC

Thanks for the link, I believe I may have read this previously and decided to set a lower ping based on it. I've increased it to 20s now.

I've set the logging level to warning and will monitor the system in use to see if the issue re-occurs and prints anything of interest to the log.

Only thing I've found currently is the following error:
[2019-07-09 11:26:43.314018] E [MSGID: 101046] [dht-common.c:1895:dht_revalidate_cbk] 3-mcv01-dht: dict is null

However I think i raised a bug for this before and that has been addressed as a separate issue.

Unfortunately we're still unsure on how this issue is reproduced. The best I can do at the moment is explain how the FUSE mountpoint is used:
Most of the users use the volume via the vfs_glusterfs module in Samba, however we had many issues getting OS X machines to work with SMB, mainly due to bugs between the Fruit, Streams_xattr and Glusterfs VFS modules. As a result, we run Netatalk (Program to present the mount point over the AFP protocol) to improve stability for OS X users. The users are mainly video editors accessing large files. The workload access pattern tends to be 80% sequential access and around 80% read, with the remainder being random read/write IO for metadata and sidecar files etc. The memory buildup occurs slowly overtime, which makes testing for the bug significantly harder.

I can provide statedumps or anything else that may be able to assist.

I'll update by the end of today with an update on the logs.

Best,
Ryan

Comment 17 ryan 2019-07-10 15:47:45 UTC

No new entries on the log at the moment, will check again in a week or so.

Best,
Ryan

Comment 18 Nithya Balachandran 2019-07-11 05:16:35 UTC

Hi Ryan,

As the volume seems to have changed from that in the description (3 bricks instead of 15), I think it best we start over. Can you get us the following:

1. Statedump of the client process immediately after the volume is mounted (this will be the baseline)
2. The statedumps at intervals when the client is being used.
3. Is there any point at which the client mount will be idle (no operations happening)? If yes, a statedump at that time as well to get a clearer picture of the memory usage when there are no fops in progress.


The earlier statedumps indicated an increasing number of gf_common_mt_memdup allocations which is what I thought might be contributing to the leak. However, they could also mean that the client was doing a lot of file listings at the time. If this is the leak, the two patches posted in an earlier comment should fix them.

As you also have client-io-threads enabled, another possibility is that you might be running into https://review.gluster.org/#/c/glusterfs/+/22918.

Comment 19 ryan 2019-07-11 08:23:59 UTC

Hi Nithya,

Unfortunately the only place I can see this is a live customer system, so I won't be able to remount the volume without booking in downtime.
We're planning to upgrade to 6.3 once we've tested it and ensured some of the bugs are ironed out. Currently in testing with 6.3 we have not been able to see this issue.

I can get statedumps of a node in the cluster which doesn't show the memory issue and then a statedump of a node which does have the issue if that would help?
I'll get a statedump at night when there is very little IO.

Best,
Ryan

Comment 20 Nithya Balachandran 2019-07-12 04:55:09 UTC

(In reply to ryan from comment #19)
> Hi Nithya,
> 
> Unfortunately the only place I can see this is a live customer system, so I
> won't be able to remount the volume without booking in downtime.
> We're planning to upgrade to 6.3 once we've tested it and ensured some of
> the bugs are ironed out. Currently in testing with 6.3 we have not been able
> to see this issue.
> 
> I can get statedumps of a node in the cluster which doesn't show the memory
> issue and then a statedump of a node which does have the issue if that would
> help?
> I'll get a statedump at night when there is very little IO.
> 
> Best,
> Ryan

Thanks Ryan. I'll wait for the statedumps before proceeding.

Comment 21 Nithya Balachandran 2019-11-04 06:32:10 UTC

(In reply to Nithya Balachandran from comment #20)
> (In reply to ryan from comment #19)
> > Hi Nithya,
> > 
> > Unfortunately the only place I can see this is a live customer system, so I
> > won't be able to remount the volume without booking in downtime.
> > We're planning to upgrade to 6.3 once we've tested it and ensured some of
> > the bugs are ironed out. Currently in testing with 6.3 we have not been able
> > to see this issue.
> > 
> > I can get statedumps of a node in the cluster which doesn't show the memory
> > issue and then a statedump of a node which does have the issue if that would
> > help?
> > I'll get a statedump at night when there is very little IO.
> > 
> > Best,
> > Ryan
> 
> Thanks Ryan. I'll wait for the statedumps before proceeding.

Hi Ryan,

Any update on this?
Thanks,
Nithya

Comment 22 ryan 2020-02-10 08:59:43 UTC

Hi Nithya,

Unfortunately we were not able to replicate this anywhere other than on one customer's system.
We've now upgraded them to Gluster 6.6 and do not see this issue.

Best,
Ryan