Bug 1636798

Summary: Memory Consumption of Glusterfs-fuse client mount point process keeps on increasing If processes which has opened the files gets killed abruptly
Product: [Community] GlusterFS Reporter: Vikas Rana <vikas.1.rana>
Component: fuseAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: atumball, bugs, pasik, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-6.x Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-15 06:27:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
glusterfs statedump before running the test script
none
glusterfs statedump immediately after running the test script
none
glusterfs statedump after running the test script for next 10 mins.
none
glusterfs statedump after running the test script for next 20 mins.
none
glusterfs statedump after running the test script for next 30 mins.
none
glusterfs statedump immediately after test script is stopped none

Description Vikas Rana 2018-10-07 17:33:38 UTC
Created attachment 1491385 [details]
glusterfs statedump before running the test script

Description of problem:Memory consumption of Glusterfs-fuse client mount process keeps on increasing if any Linux process which has opened the files for write operation gets killed abruptly before closing the files.


Version-Release number of selected component (if applicable): glusterfs 3.13.2


How reproducible:
This behavior is seen every time.


Steps to Reproduce:
1. Run a test script which spawns a child process responsible for opening 10-15 files in write mode.
2. The parent test scripts kills the child process abruptly using SIGKILL signal before the files open in write mode are closed.
3. Keep on doing this in a loop for a period of 30 mins. You will notice that the memory consumption of the glusterfs fuse mount point process keeps on increasing until we kill the test scripts.

Actual results: 
The continues increase in memory consumption of the glusterfs process is seen until the test script is killed.
Even after stopping the test script the increased memory consumption for the glusterfs process does not reduces over a period of time.


Expected results:
Continues increase in memory consumption of the glusterfs process should not be seen. Also after stopping the test script the memory consumption of the process should reduce over a period of time.

Additional info:

gluster volume info output:
# gluster volume info pmdb

Volume Name: pmdb
Type: Replicate
Volume ID: 1906452a-6062-4ec7-91b9-caa9ba86d4cb
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 169.254.0.6:/mnt/pmdb1/brick
Brick2: 169.254.0.7:/mnt/pmdb1/brick
Options Reconfigured:
performance.client-io-threads: off
cluster.consistent-metadata: on
network.ping-timeout: 10
cluster.heal-timeout: 300
cluster.granular-entry-heal: enable
transport.address-family: inet
nfs.disable: on


gluster volume status output:
# gluster volume status pmdb
Status of volume: pmdb
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 169.254.0.6:/mnt/pmdb1/brick          49155     0          Y       12375
Brick 169.254.0.7:/mnt/pmdb1/brick          49155     0          Y       16390
Self-heal Daemon on localhost               N/A       N/A        Y       17062
Self-heal Daemon on 169.254.0.7             N/A       N/A        Y       19392

Task Status of Volume pmdb
------------------------------------------------------------------------------
There are no active volume tasks




gluster volume status detail output:
# gluster volume status pmdb detail
Status of volume: pmdb
------------------------------------------------------------------------------
Brick                : Brick 169.254.0.6:/mnt/pmdb1/brick
TCP Port             : 49155
RDMA Port            : 0
Online               : Y
Pid                  : 12375
File System          : ext4
Device               : /dev/sde1
Mount Options        : rw,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 9.8GB
Total Disk Space     : 9.8GB
Inode Count          : 655360
Free Inodes          : 655322
------------------------------------------------------------------------------
Brick                : Brick 169.254.0.7:/mnt/pmdb1/brick
TCP Port             : 49155
RDMA Port            : 0
Online               : Y
Pid                  : 16390
File System          : ext4
Device               : /dev/sde1
Mount Options        : rw,relatime,data=ordered
Inode Size           : 256
Disk Space Free      : 9.8GB
Total Disk Space     : 9.8GB
Inode Count          : 655360
Free Inodes          : 655322


Attached is ths glusterfs statedump before running the test script.

Memory consumption of the process before running the test script:
ps -aux --sort -rss | grep "glusterfs" | grep -w pmdb
root     19204  0.0  0.1 571100 12456 ?        Ssl  11:46   0:00 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=169.254.0.6 --volfile-server=169.254.0.7 --volfile-id=/pmdb /mnt/pmdb

Top command output for the process before running the test script:
top - 12:11:22 up 29 min,  3 users,  load average: 0.59, 0.56, 0.47
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu0  :   2.7/2.7     5[||||||                                                                                              ]
%Cpu1  :   2.0/2.0     4[||||                                                                                                ]
GiB Mem :  8.5/7.801    [                                                                                                    ]
GiB Swap:  0.0/0.000    [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND
19204 root      20   0  621.7m  12.2m   0.0  0.2   0:00.24 S glusterfs



Memory consumption of the process immediately  after running the test script:
ps -aux --sort -rss | grep "glusterfs" | grep -w pmdb
root     19204  0.4  0.4 638776 34560 ?        Ssl  11:46   0:12 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=169.254.0.6 --volfile-server=169.254.0.7 --volfile-id=/pmdb /mnt/pmdb

Top command output for the process before running the test script:
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu0  :  26.9/22.1   49[|||||||||||||||||||||||||||||||||||||||||||||||||                                                   ]
%Cpu1  :  23.9/23.9   48[||||||||||||||||||||||||||||||||||||||||||||||||                                                    ]
GiB Mem :  9.7/7.801    [                                                                                                    ]
GiB Swap:  0.0/0.000    [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND
19204 root      20   0  623.8m  75.0m  55.3  0.9   0:36.38 S glusterfs


Memory consumption of the process after 10 min interval when the test script was run:
ps -aux --sort -rss | grep "glusterfs" | grep -w pmdb
root     19204  9.1  6.6 1097528 547504 ?      Ssl  11:46   4:53 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=169.254.0.6 --volfile-server=169.254.0.7 --volfile-id=/pmdb /mnt/pmdb

Top command output for the process after 10 min interval when the test script was run:
top - 12:40:46 up 59 min,  6 users,  load average: 1.83, 1.86, 1.30
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu0  :  20.3/16.8   37[|||||||||||||||||||||||||||||||||||||                                                               ]
%Cpu1  :  28.5/15.3   44[|||||||||||||||||||||||||||||||||||||||||||                                                         ]
GiB Mem : 17.8/7.801    [                                                                                                    ]
GiB Swap:  0.0/0.000    [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND
19204 root      20   0 1071.8m 566.6m  52.7  7.1   5:10.84 S glusterfs


Memory consumption of the process after another 10 min interval when the test script was run (i.e after 20 mins run of the test script ):
ps -aux --sort -rss | grep "glusterfs" | grep -w pmdb
root     19204 15.0 12.7 1621816 1046200 ?     Ssl  11:46   9:29 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=169.254.0.6 --volfile-server=169.254.0.7 --volfile-id=/pmdb /mnt/pmdb


Top command output for the process after another 10 min interval when the test script was run (i.e after 20 mins run of the test script ):
top - 12:50:34 up  1:08,  6 users,  load average: 2.70, 2.22, 1.71
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu0  :  22.6/14.4   37[|||||||||||||||||||||||||||||||||||||                                                               ]
%Cpu1  :  24.6/18.1   43[|||||||||||||||||||||||||||||||||||||||||||                                                         ]
GiB Mem : 26.0/7.801    [                                                                                                    ]
GiB Swap:  0.0/0.000    [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND
19204 root      20   0 1583.8m 1.025g  57.3 13.1   9:46.24 S glusterfs



Memory consumption of the process after another 10 min interval when the test script was run (i.e after 30 mins run of the test script ):
ps -aux --sort -rss | grep "glusterfs" | grep -w pmdb
root     19204 19.3 19.1 2080568 1564432 ?     Ssl  11:46  14:14 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=169.254.0.6 --volfile-server=169.254.0.7 --volfile-id=/pmdb /mnt/pmdb


Top command output for the process after another 10 min interval when the test script was run (i.e after 30 mins run of the test script ):
top - 13:00:58 up  1:19,  6 users,  load average: 1.57, 1.92, 1.83
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu0  :   8.7/16.8   26[||||||||||||||||||||||||||                                                                          ]
%Cpu1  :  13.1/15.9   29[|||||||||||||||||||||||||||||                                                                       ]
GiB Mem : 34.1/7.801    [                                                                                                    ]
GiB Swap:  0.0/0.000    [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND
19204 root      20   0 2031.8m 1.501g   0.0 19.2  14:26.96 S glusterfs


Memory consumption of the process after test script has been stopped:
ps -aux --sort -rss | grep "glusterfs" | grep -w pmdb
root     19204 17.8 19.2 2080568 1574200 ?     Ssl  11:46  14:27 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=169.254.0.6 --volfile-server=169.254.0.7 --volfile-id=/pmdb /mnt/pmdb

Top command output for the process after test script has been stopped:
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu0  :  25.5/57.0   83[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                 ]
%Cpu1  :  21.6/36.5   58[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||                                          ]
GiB Mem : 35.9/7.801    [                                                                                                    ]
GiB Swap:  0.0/0.000    [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND
19204 root      20   0 2031.8m 1.501g   0.0 19.2  14:34.73 S glusterfs


Memory consumption of the process after test script has been stopped more than 24 hours ago:
# ps -aux --sort -rss | grep 19204
root     19204  0.3 19.2 2146104 1576576 ?     Ssl  Oct04  16:28 /usr/local/sbin/glusterfs --process-name fuse --volfile-server=169.254.0.6 --volfile-server=169.254.0.7 --volfile-id=/pmdb /mnt/pmdb

Top command output of the process after the test script has been stopped more than 24 hours ago:
top - 23:01:14 up 3 days, 11:19,  1 user,  load average: 0.83, 0.55, 0.49
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu0  :   8.2/15.0   23[|||||||||||||||||||||||                                                                             ]
%Cpu1  :   9.5/12.9   22[|||||||||||||||||||||||                                                                             ]
GiB Mem : 41.0/7.801    [                                                                                                    ]
GiB Swap:  0.0/0.000    [                                                                                                    ]

  PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND
19204 root      20   0 2095.8m 1.504g   0.0 19.3  16:28.64 S glusterfs

For each interval mentioned above we have glusterfs statedump available. As we can attach only one file in this bugzilla. Will try to attach them in a separate comment.

Comment 1 Vikas Rana 2018-10-08 03:15:01 UTC
Created attachment 1491500 [details]
glusterfs statedump immediately after running the test script

glusterfs statedump immediately after running the test script

Comment 2 Vikas Rana 2018-10-08 03:18:36 UTC
Created attachment 1491502 [details]
glusterfs statedump  after running the test script for next 10 mins.

glusterfs statedump  after running the test script for next 10 mins.

Comment 3 Vikas Rana 2018-10-08 03:20:54 UTC
Created attachment 1491503 [details]
glusterfs statedump  after running the test script for next 20 mins.

glusterfs statedump  after running the test script for next 20 mins.

Comment 4 Vikas Rana 2018-10-08 03:24:15 UTC
Created attachment 1491504 [details]
glusterfs statedump  after running the test script for next 30 mins.

glusterfs statedump  after running the test script for next 30 mins.

Comment 5 Vikas Rana 2018-10-08 03:25:56 UTC
Created attachment 1491505 [details]
glusterfs statedump immediately after test script is stopped

glusterfs statedump immediately after test script is stopped

Comment 6 Vikas Rana 2018-10-12 04:18:51 UTC
Hi ,

 Do we have any update on this ?

Best Regards,
Vikas Rana

Comment 7 Vikas Rana 2018-10-15 03:13:44 UTC
Hi,

  Can anyone please provide an update on this issue ?

Best Regards,
Vikas Rana

Comment 8 Vikas Rana 2018-10-22 04:11:04 UTC
Hi,

 Is is possible to let us know how to resolve this issue or any workaround available. ?
Thanks in advance.

Best Regards,
Vikas Rana

Comment 9 Shyamsundar 2018-10-23 14:55:08 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 10 Vikas Rana 2018-10-24 07:21:24 UTC
Hi Shyamsundar,

 The above test case was tried with glusterfs version 3.13.2 . The same has been mentioned in the initial bug description as well.
This we are informing to avoid any confusion in the future. And we hope for version 3.13.2 there is no EOLd.

Best Regards,
Vikas Rana

Comment 11 Amar Tumballi 2019-07-15 06:27:45 UTC
Hi Vikas,

We fixed some of the memory issues with above mentioned scenarios with glusterfs-5.x releases. Now that we have glusterfs-6.x releases, and glusterfs-7.0 is coming out soon, would like you to upgrade and test. I believe this is closed with https://review.gluster.org/#/c/glusterfs/+/19778/.

Marking it as CLOSED with above info. Please upgrade and test, if the issue persists, re-open the issue.