Bug 1495161

Summary:

[GSS] Few brick processes are consuming more memory after patching 3.2

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Prerna Sony <psony>

Component:

locks

Assignee:

Xavi Hernandez <jahernan>

Status:

CLOSED ERRATA

QA Contact:

Nag Pavan Chilakam <nchilaka>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

rhgs-3.2

CC:

abhishku, amukherj, bkunal, hgowtham, jahernan, nbalacha, nchilaka, psony, rcyriac, rhs-bugs, sankarshan, sheggodu, srmukher, storage-qa-internal

Target Milestone:

---

Target Release:

RHGS 3.4.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

glusterfs-3.12.2-2

Doc Type:

Bug Fix

Doc Text:

Previously, processes that used of many POSIX locks, possibly in combination with gluster clear-locks command, would lead to memory leak causing high memory consumption on brick processes that triggered ‘OOM killer’ error in some cases. This release fixes the issue related to leaks present in translators.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-09-04 06:36:24 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1503135, 1507361, 1526377

Attachments:

Description	Flags
state-dump from 3.1.3 (prod, prod-moodle)	none

Description Prerna Sony 2017-09-25 11:09:30 UTC

Description of problem:

GlusterFSd processes using more memory after patching 3.2 when compared to the non patched environment i.e 3.1.3

Version-Release number of selected component (if applicable):

glusterfs-server-3.8.4-18.4.el6rhs.x86_64   

How reproducible:
In Customer environment

Actual results:
Few of the brick processes are consuming more memory after patching 3.2

Comment 2 Prerna Sony 2017-09-25 11:13:25 UTC

Created attachment 1330478 [details]
State dump

Comment 12 Prerna Sony 2017-09-27 05:08:36 UTC

Created attachment 1331309 [details]
state-dump from 3.1.3 (prod, prod-moodle)

Comment 22 hari gowtham 2017-10-09 07:45:01 UTC

Hi Atin,

Yes, it is a regression.

This was introduced in 3.8 for https://bugzilla.redhat.com/show_bug.cgi?id=1326085 

This code is not there on 3.1.3 but is there on 3.2.

Regards,
Hari.

Comment 56 Nag Pavan Chilakam 2018-08-14 09:36:58 UTC

Below is what I had run for a span of ~4 days on 3.12.2-15:
create a 18x3 volume with performance.client-io-threads off  and brickmux off(as in customer case)
mounted volume on 8 different clients, and triggered different kinds of IOs as below
1) script to take locks on a file in multiple iterations( 2 clients
2) linux untar from 2 clients for multiple iterations
3) from 2 client creating files simultaneous, renaming and deleting as below
for x in {1..10000};do for i in {1..10000};do dd if=/dev/urandom of=file.$x.$i bs=123 count=10000;done;for j in {1..10000};do mv -f file.$x.$j file.$x.$j.$j;done;rm -rf file.$x.*;done
4) different IOs from 2 client using crefi as below
for x  in {1..1000};do for i in {create,chmod,chown,chgrp,symlink,truncate,rename,hardlink}; do ./crefi.py  --multi -n 15 -b 100 -d 20 --max=10K --min=50 --random -T 3 -t text --fop=$i /mnt/locks/IOs/Crefi/$HOSTNAME/  ; sleep 10 ; done;rm -rf /mnt/locks/IOs/Crefi/$HOSTNAME/*;done

5) same directory creation in depth and bredth from 2 clients simultaneously



mounted client locally on one server and was issuing clearing of locks as below
for i in $(find  IOs);do gluster volume clear-locks locks /$i kind all posix; done




Over this 3 days I didn't see any siginificant mem consumption by bricks

Comment 57 Nag Pavan Chilakam 2018-08-14 14:02:15 UTC

sosreports and logs for my tests @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1495161/onqa_verification


memory info captured in file "fresh_top.log" for each node


I don't see any concern with memory footprint


even after 4 days resident memory has increased by about 1% per glusterfsd
and is not anything close to what customer has seen

Comment 59 Nag Pavan Chilakam 2018-08-16 09:09:31 UTC

I am moving BZ to verified based on my above comments from testing
(if need be i will raise a new bz for c#58)

Comment 64 errata-xmlrpc 2018-09-04 06:36:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607