Bug 1495161

Summary: [GSS] Few brick processes are consuming more memory after patching 3.2
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prerna Sony <psony>
Component: locksAssignee: Xavi Hernandez <jahernan>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: abhishku, amukherj, bkunal, hgowtham, jahernan, nbalacha, nchilaka, psony, rcyriac, rhs-bugs, sankarshan, sheggodu, srmukher, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-2 Doc Type: Bug Fix
Doc Text:
Previously, processes that used of many POSIX locks, possibly in combination with gluster clear-locks command, would lead to memory leak causing high memory consumption on brick processes that triggered ‘OOM killer’ error in some cases. This release fixes the issue related to leaks present in translators.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:36:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503135, 1507361, 1526377    
Attachments:
Description Flags
state-dump from 3.1.3 (prod, prod-moodle) none

Description Prerna Sony 2017-09-25 11:09:30 UTC
Description of problem:

GlusterFSd processes using more memory after patching 3.2 when compared to the non patched environment i.e 3.1.3

Version-Release number of selected component (if applicable):

glusterfs-server-3.8.4-18.4.el6rhs.x86_64   

How reproducible:
In Customer environment

Actual results:
Few of the brick processes are consuming more memory after patching 3.2

Comment 2 Prerna Sony 2017-09-25 11:13:25 UTC
Created attachment 1330478 [details]
State dump

Comment 12 Prerna Sony 2017-09-27 05:08:36 UTC
Created attachment 1331309 [details]
state-dump from 3.1.3 (prod, prod-moodle)

Comment 22 hari gowtham 2017-10-09 07:45:01 UTC
Hi Atin,

Yes, it is a regression.

This was introduced in 3.8 for https://bugzilla.redhat.com/show_bug.cgi?id=1326085 

This code is not there on 3.1.3 but is there on 3.2.

Regards,
Hari.

Comment 56 Nag Pavan Chilakam 2018-08-14 09:36:58 UTC
Below is what I had run for a span of ~4 days on 3.12.2-15:
create a 18x3 volume with performance.client-io-threads off  and brickmux off(as in customer case)
mounted volume on 8 different clients, and triggered different kinds of IOs as below
1) script to take locks on a file in multiple iterations( 2 clients
2) linux untar from 2 clients for multiple iterations
3) from 2 client creating files simultaneous, renaming and deleting as below
for x in {1..10000};do for i in {1..10000};do dd if=/dev/urandom of=file.$x.$i bs=123 count=10000;done;for j in {1..10000};do mv -f file.$x.$j file.$x.$j.$j;done;rm -rf file.$x.*;done
4) different IOs from 2 client using crefi as below
for x  in {1..1000};do for i in {create,chmod,chown,chgrp,symlink,truncate,rename,hardlink}; do ./crefi.py  --multi -n 15 -b 100 -d 20 --max=10K --min=50 --random -T 3 -t text --fop=$i /mnt/locks/IOs/Crefi/$HOSTNAME/  ; sleep 10 ; done;rm -rf /mnt/locks/IOs/Crefi/$HOSTNAME/*;done

5) same directory creation in depth and bredth from 2 clients simultaneously



mounted client locally on one server and was issuing clearing of locks as below
for i in $(find  IOs);do gluster volume clear-locks locks /$i kind all posix; done




Over this 3 days I didn't see any siginificant mem consumption by bricks

Comment 57 Nag Pavan Chilakam 2018-08-14 14:02:15 UTC
sosreports and logs for my tests @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1495161/onqa_verification


memory info captured in file "fresh_top.log" for each node


I don't see any concern with memory footprint


even after 4 days resident memory has increased by about 1% per glusterfsd
and is not anything close to what customer has seen

Comment 59 Nag Pavan Chilakam 2018-08-16 09:09:31 UTC
I am moving BZ to verified based on my above comments from testing
(if need be i will raise a new bz for c#58)

Comment 64 errata-xmlrpc 2018-09-04 06:36:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607