Created attachment 802042 [details] Reproducer program Description of problem: When the attached program(oom_test.c) is run for longer, the SMBD process gets oom killed Output of top command for the smbd processes; PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22658 root 20 0 2414m 1.7g 1980 D 0.0 84.6 8:39.59 smbd 22563 root 20 0 151m 1272 488 D 0.0 0.1 0:00.07 smbd 22562 root 20 0 151m 1608 828 S 0.0 0.1 0:00.02 smbd Version-Release number of selected component (if applicable): How reproducible: Intermittent Steps to Reproduce: 1.gcc -pthread -o oom_test oom_test.c 2.do a cifs mount of gluster volume on /mnt/vfs (as this path is hard coded in the test program) ./oom_test <file size in GB> Try giving the file size more than the volume size initially, this increases the chances of hitting memory leak. 3. Run multiple instances of the test program and multiple times, with this the memory used by SMBD increases and approaches its limit and gets oom killed. Actual results: SMBD process consumes more memory and gets oom killed Expected results: SMBD process should not get oom killed Additional info:
Is it the master SMBD process that is leaking memory, or is it a child process? Like most daemons, SMBD runs an initial instance that listens for connections, and then spawns child processes to handle those connections. It appears that it is the main process is leaking memory, but the main process doesn't actually do file I/O so it doesn't pass through the VFS layer.
Its the child process that is leaking the memory. It looks to be in iobuf_pool. I new iobuf_arena is allocated but not freed, this looks to be the cause for the leak.
Patch posted for review at https://code.engineering.redhat.com/gerrit/#/c/14184/ https://code.engineering.redhat.com/gerrit/#/c/14183/
Executed oom_test with multiple instances on smb mount, memory leaks are not observed and even running it multiple times with multiple instances it did not reach to a level that causes oom killed. Verified in version: [root@dhcp159-76 ~]# rpm -qa | grep glusterfs glusterfs-3.4.0.44.1u2rhs-1.el6rhs.x86_64 glusterfs-devel-3.4.0.44.1u2rhs-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.44.1u2rhs-1.el6rhs.x86_64 glusterfs-api-3.4.0.44.1u2rhs-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.44.1u2rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.44.1u2rhs-1.el6rhs.x86_64 glusterfs-libs-3.4.0.44.1u2rhs-1.el6rhs.x86_64 samba-glusterfs-3.6.9-160.7.el6rhs.x86_64 glusterfs-server-3.4.0.44.1u2rhs-1.el6rhs.x86_64 glusterfs-api-devel-3.4.0.44.1u2rhs-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.44.1u2rhs-1.el6rhs.x86_64 Output of top command as follows: 8769 root 20 0 503m 76m 2996 S 0.0 1.0 0:02.42 smbd 9411 root 20 0 508m 76m 3052 S 0.0 1.0 1:35.89 smbd 9449 root 20 0 508m 73m 2720 S 0.0 0.9 0:00.69 smbd 9454 root 20 0 508m 73m 2608 S 0.0 0.9 0:00.73 smbd 9551 root 20 0 227m 3032 1692 S 0.0 0.0 0:00.16 smbd 9587 root 20 0 227m 2564 1048 S 0.0 0.0 0:11.65 smbd
Can you please verify the doc text for technical accuracy?
A comma after "performed I/O operations" would be better, other than that doc text looks fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html