Description of problem: GlusterFSd processes using more memory after patching 3.2 when compared to the non patched environment i.e 3.1.3 Version-Release number of selected component (if applicable): glusterfs-server-3.8.4-18.4.el6rhs.x86_64 How reproducible: In Customer environment Actual results: Few of the brick processes are consuming more memory after patching 3.2
Created attachment 1330478 [details] State dump
Created attachment 1331309 [details] state-dump from 3.1.3 (prod, prod-moodle)
Hi Atin, Yes, it is a regression. This was introduced in 3.8 for https://bugzilla.redhat.com/show_bug.cgi?id=1326085 This code is not there on 3.1.3 but is there on 3.2. Regards, Hari.
Below is what I had run for a span of ~4 days on 3.12.2-15: create a 18x3 volume with performance.client-io-threads off and brickmux off(as in customer case) mounted volume on 8 different clients, and triggered different kinds of IOs as below 1) script to take locks on a file in multiple iterations( 2 clients 2) linux untar from 2 clients for multiple iterations 3) from 2 client creating files simultaneous, renaming and deleting as below for x in {1..10000};do for i in {1..10000};do dd if=/dev/urandom of=file.$x.$i bs=123 count=10000;done;for j in {1..10000};do mv -f file.$x.$j file.$x.$j.$j;done;rm -rf file.$x.*;done 4) different IOs from 2 client using crefi as below for x in {1..1000};do for i in {create,chmod,chown,chgrp,symlink,truncate,rename,hardlink}; do ./crefi.py --multi -n 15 -b 100 -d 20 --max=10K --min=50 --random -T 3 -t text --fop=$i /mnt/locks/IOs/Crefi/$HOSTNAME/ ; sleep 10 ; done;rm -rf /mnt/locks/IOs/Crefi/$HOSTNAME/*;done 5) same directory creation in depth and bredth from 2 clients simultaneously mounted client locally on one server and was issuing clearing of locks as below for i in $(find IOs);do gluster volume clear-locks locks /$i kind all posix; done Over this 3 days I didn't see any siginificant mem consumption by bricks
sosreports and logs for my tests @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1495161/onqa_verification memory info captured in file "fresh_top.log" for each node I don't see any concern with memory footprint even after 4 days resident memory has increased by about 1% per glusterfsd and is not anything close to what customer has seen
I am moving BZ to verified based on my above comments from testing (if need be i will raise a new bz for c#58)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607