Bug 1398311

Summary:	[compound FOPs]:in replica pair one brick is down the other Brick process and fuse client process consume high memory at a increasing pace
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	replicate	Assignee:	Krutika Dhananjay <kdhananj>
Status:	CLOSED ERRATA	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	amukherj, kdhananj, mchangir, rhinduja, rhs-bugs, sasundar, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-10	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-23 05:50:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1351528

Description Nag Pavan Chilakam 2016-11-24 12:24:51 UTC

Description of problem:
=========================
In a 1x2 volume with compound fops enabled,
If we do IOs in healthy state, I dont see any abnormal cpu or memory consumption either by brick processes or the fuse client (did file creates of 10MB for 150 files)

I also tried with 10GB file creation, and it was also smooth and non-resource hungry,
But as the 10GB file was created halfway through I brought down brick1 of the replica pair.
I then started to notice huge increase in memory consumption both by the active brick process(brick2 ) and the client process

Client Process(with brickdown)
2951 root      20   0 10.602g 7.838g   4104 S  57.9 50.6  19:06.86 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 13.145g 8.265g   4104 S  81.2 53.4  19:22.45 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 15.672g 8.698g   4104 S  73.3 56.2  19:38.03 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 18.203g 9.163g   4104 S  75.0 59.2  19:53.22 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 20.746g 9.633g   4104 S  75.0 62.2  20:08.72 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 23.387g 0.010t   2976 S  73.3 65.4  20:24.32 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 25.911g 0.010t   2976 S  68.8 69.0  20:40.65 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 28.461g 0.011t   2960 S  68.8 72.1  20:56.14 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 31.071g 0.011t   2960 S  68.8 74.6  21:11.62 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 33.348g 0.012t   2852 S  93.8 78.0  21:25.62 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 35.832g 0.012t   2832 S  75.0 80.4  21:41.61 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 38.313g 0.013t   1900 S  68.8 83.2  21:57.39 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 40.754g 0.013t   1880 S  70.6 84.4  22:13.30 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 43.344g 0.013t   1880 S  76.5 86.2  22:29.64 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 45.453g 0.013t   2036 S  75.0 87.0  22:46.24 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 48.036g 0.013t   2036 S  62.5 87.9  23:02.72 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 50.610g 0.014t   1924 S  75.0 89.5  23:18.98 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 53.059g 0.014t   1908 S  81.2 90.6  23:35.54 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 55.719g 0.014t   1908 S  81.2 91.6  23:52.51 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 58.332g 0.014t   1844 S  76.5 92.4  24:10.01 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND



Brick Process
=============
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 2429728 1.237g   4348 S  56.2 16.2  10:33.16 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.84 glusterd
 3937 root      20   0  672724  28996   2976 S   0.0  0.4   0:00.49 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 2495264 1.327g   4348 S  62.5 17.4  10:46.17 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.84 glusterd
 3937 root      20   0  672724  28996   2976 S   0.0  0.4   0:00.52 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 2626336 1.417g   4348 S  62.5 18.5  10:59.30 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.85 glusterd
 3937 root      20   0  672724  28996   2976 S   0.0  0.4   0:00.55 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 2626336 1.501g   4348 S  68.8 19.6  11:11.49 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.85 glusterd
 3937 root      20   0  672724  28996   2976 S   0.0  0.4   0:00.59 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 2757408 1.593g   4348 S  56.2 20.8  11:24.41 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.85 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.62 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 2757408 1.670g   4348 S  62.5 21.8  11:35.28 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.85 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.66 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 2954016 1.761g   4348 S  62.5 23.0  11:48.46 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.91 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.69 glusterfs
ki  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 3085088 1.849g   4348 S  62.5 24.2  12:01.45 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.91 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.72 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 3281696 1.940g   4348 S  62.5 25.4  12:14.33 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.91 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.75 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 3281696 2.034g   4348 S  60.0 26.6  12:27.13 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.92 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.79 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 3412768 2.126g   4348 S  66.7 27.8  12:39.77 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.92 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.81 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 3412768 2.212g   4348 S  62.5 28.9  12:52.27 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.92 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.84 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 3609376 2.292g   4348 S  53.3 30.0  13:04.48 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.92 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.87 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 3609376 2.373g   4348 S  56.2 31.0  13:16.33 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.92 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.91 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 3871520 2.463g   4348 S  62.5 32.2  13:28.98 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.92 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.93 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 3871520 2.548g   4348 S  62.5 33.3  13:41.19 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.92 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:00.97 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.93 glusterd
 3914 root      20   0 3937056 2.632g   4348 S   0.0 34.4  13:53.61 glusterfsd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:01.00 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 4002592 2.720g   4348 S  81.2 35.6  14:06.29 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.94 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:01.03 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3914 root      20   0 4068128 2.803g   4348 S  68.8 36.7  14:18.27 glusterfsd
 1431 root      20   0  608672  24332   4264 S   0.0  0.3   0:01.94 glusterd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:01.06 glusterfs





Version-Release number of selected component (if applicable):
===========
3.8.4-5

Steps to Reproduce:
1. create a 1x2 vol
2. enable compound fops, fuse mount the volume on a client
3. keep track of the memory consumption by both  the brick processes and the client process
4. create a 10 gb file with dd
5. after about 5gb is written bring down one brick


you can notice the spike in memory consumption

Comment 2 Nag Pavan Chilakam 2016-11-24 12:38:27 UTC

Note:
1398315 - [compound FOPs]: Memory leak while doing FOPs with brick down 
was raised due to memory leak

 I would like to track them as two different issues. However on RCA if we find that the root cause is same, then we can go ahead and dup one of them to the other

Comment 3 Nag Pavan Chilakam 2016-11-29 06:57:17 UTC

hit this while validating RFE 1360978 - [RFE]Reducing number of network round trips

Comment 4 Krutika Dhananjay 2016-11-29 10:36:05 UTC

Requesting qe and pm ack on this BZ.

Comment 5 Krutika Dhananjay 2016-11-29 10:45:03 UTC

http://review.gluster.org/#/c/15965/ <--- patch posted in upstream master.
Moving this bug to POST state.

NOTE: The fix for this bug is the same as the one for https://bugzilla.redhat.com/show_bug.cgi?id=1398315

-Krutika

Comment 9 Nag Pavan Chilakam 2016-12-16 12:51:03 UTC

ON_QATP:
I have tried this scenario on 3.8.4-8
However I still see this rapid memory consumption
May have to fail this bug


Brick mem consumption:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
(beginning of file create post brick kill)
16200 root      20   0 1250076  53876   4052 S  33.3  0.7  10:36.66 gluster+
(end of file create post brick kill)
16200 root      20   0 1578784 374512   3032 S   0.0  4.7  20:52.72 gluster+


client mem consumption:

[root@rhs-client45 ~]# top -n 1 -b|egrep "RES|7699"
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7699 root      20   0  708532  44260   3904 S  68.8  0.3  21:00.23 gluster+

(post brick kill)
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7699 root      20   0 2021388 1.277g   4052 S   0.0  8.2  31:34.71 gluster+


Which means I will have to fail 1398315 - [compound FOPs]: Memory leak while doing FOPs with brick down

Comment 10 Nag Pavan Chilakam 2016-12-16 12:51:43 UTC

Also i saw client error in fuse log as below(probably when i killed the brick)
[2016-12-16 12:13:08.076069] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-cfops-client-1: remote operation failed [Transport endpoint is not connected]

Comment 11 Krutika Dhananjay 2016-12-16 16:40:02 UTC

[root@rhs-client45 gluster]# grep -iarT '^num_allocs=' glusterdump.7699.dump.1481891567 | sort | uniq
num_allocs=1
num_allocs=1024
num_allocs=1025
num_allocs=13
num_allocs=14
num_allocs=16
num_allocs=2
num_allocs=21
num_allocs=25
num_allocs=26
num_allocs=3
num_allocs=4
num_allocs=5
num_allocs=59
num_allocs=6
num_allocs=7
num_allocs=8
num_allocs=9

highest value of num_allocs is 1024,1025 from FUSE event history buffers.

Same with all other statedumps of PID 7699

As for PID 3942,

[root@rhs-client45 gluster]# grep -iarT '^num_allocs=' glusterdump.3942.dump.1481889016 | sort | uniq
num_allocs=1
num_allocs=10
num_allocs=1024
num_allocs=1025
num_allocs=104
num_allocs=13
num_allocs=176
num_allocs=19
num_allocs=2
num_allocs=24
num_allocs=3
num_allocs=32
num_allocs=34
num_allocs=394
num_allocs=4
num_allocs=404
num_allocs=415
num_allocs=5
num_allocs=6
num_allocs=66
num_allocs=7
num_allocs=8
num_allocs=9

The highest value is again 1024, 1025 and both of them are related to event history buffers.

Coming to the bricks statedump:

[root@dhcp35-239 gluster]# grep -iarT '^num_allocs=' rhs-brick4-cfops.16200.dump.1481891916| sort | uniq
num_allocs=1
num_allocs=10
num_allocs=104
num_allocs=115
num_allocs=12
num_allocs=13
num_allocs=14
num_allocs=19
num_allocs=2
num_allocs=21
num_allocs=3
num_allocs=36
num_allocs=4
num_allocs=5
num_allocs=59
num_allocs=6
num_allocs=66
num_allocs=7
num_allocs=8
num_allocs=9
num_allocs=97

The highest value is 115 and seems to be coming from
[global.glusterfs - usage-type gf_common_mt_strdup memusage]
size=2342
num_allocs=115
max_size=3281
max_num_allocs=135
total_allocs=323

I am not sure about the source of these allocations. With all that, I am guessing 115 strings would not consume so much
memory as to lead to OOM kill.

This looks similar to the issue you are working on with Csaba. I cannot of course claim with 100% certainty that the leaks are NOT in compound fops unless we capture statedump with the same test but without compound fops && the statedump numbers look identical. At least the gluster-managed data structures dont look to be consumed at large enough rate to be causing oom kill as per statedump.

-Krutika

Comment 12 Nag Pavan Chilakam 2016-12-19 10:01:27 UTC

I reran same case on a similar new volume but without enabling compound fops.
In this case I hardly saw any memory spike(a spike was seen on the online brick once the other brick was down, but then on the consumption was not increasing or spiking)
Hence have to fail this bug

Comment 13 Atin Mukherjee 2016-12-20 07:42:40 UTC

upstream mainline patch http://review.gluster.org/#/c/16210/ posted for review.

Comment 14 Atin Mukherjee 2016-12-21 10:04:43 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93488/

Comment 15 Milind Changire 2016-12-22 18:47:07 UTC

BZ added to erratum https://errata.devel.redhat.com/advisory/24866
Moving to ON_QA

Comment 16 Nag Pavan Chilakam 2016-12-27 09:16:22 UTC

on_qa validation:
=================

there was minimal mem consumption when compared to before fix
Hence moving the testcase to pass, ie the bz to verified

had a 1x2 vol with cfops enabled
mounted vol one two client cli45(el7.3) and cli24(el6.7)
From cli45, started to create a 10gb file using dd
after about 3gb was created brought down b1

kept tracking client side and brick process memeory consumption below are the details, in a loop for every 30sec
cli45:
before file create started:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  41632   3196 S   0.0  0.3   0:00.05 glusterfs
after create was issued:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46764   3804 S  75.0  0.3   0:21.34 glusterfs
==>it shot by 5MB
but from there on it was constant or minimal increase
before brick b1 was killed:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46848   3824 S  66.7  0.3   4:49.47 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46848   3824 S  73.3  0.3   5:11.78 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46848   3824 S  62.5  0.3   5:33.70 glusterfs


after b1 was killed:
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46848   3824 S  68.8  0.3   5:55.95 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47264   3936 S  68.8  0.3   6:17.59 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47268   3936 S  73.3  0.3   6:38.93 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47268   3936 S  68.8  0.3   7:00.40 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47268   3936 S  81.2  0.3   7:21.79 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47268   3936 S  75.0  0.3   7:42.81 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47328   3936 S  68.8  0.3  12:40.12 glusterfs

after b1 was brought up using vol force start:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  778272  49512   4028 S  93.8  0.3  13:55.20 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  778272  49516   4032 S 106.2  0.3  14:26.07 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  778272  49516   4032 S 100.0  0.3  14:56.36 glusterfs


at end of file create complete 10gb file
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  776220  49512   4040 S   0.0  0.3  20:48.97 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  776220  49512   4040 S   0.0  0.3  20:48.97 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  776220  49512   4040 S   0.0  0.3  20:48.97 glusterfs




brick process(the replica brick b2 which was alive)

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1117984  48020   4024 S  37.5  0.6   0:01.36 glusterfsd
Tue Dec 27 12:53:45 IST 2016
============>started file1 create
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1117984  48272   4164 S  43.8  0.6   0:11.84 glusterfsd
Tue Dec 27 12:54:15 IST 2016
===>killing b1 after about 3gb created
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48520   4260 S  37.5  0.6   2:44.61 glusterfsd
Tue Dec 27 13:01:48 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48784   4264 S  56.2  0.6   3:00.91 glusterfsd
Tue Dec 27 13:02:18 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48784   4264 S  62.5  0.6   3:19.55 glusterfsd

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48784   4264 S  56.2  0.6   7:58.98 glusterfsd
Tue Dec 27 13:10:21 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48784   4264 S  52.9  0.6   8:17.68 glusterfsd
Tue Dec 27 13:10:51 IST 2016
===>i dont see any mem leak even after 6gb completed hence restarting brick
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48636   4116 S  62.5  0.6   8:36.46 glusterfsd
Tue Dec 27 13:11:21 IST 2016
====>started vol using force
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1316648  48848   4132 S  61.1  0.6   8:59.89 glusterfsd
Tue Dec 27 13:11:52 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1316648  48848   4132 S  25.0  0.6   9:20.06 glusterfsd
Tue Dec 27 13:12:22 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1316648  49072   4152 S  37.5  0.6   9:33.38 glusterfsd
Tue Dec 27 13:12:52 IST 2016



So there was minimal mem consumption when compared to before fix


Hence moving the testcase to pass, ie the bz to verified

Comment 17 Nag Pavan Chilakam 2016-12-27 09:22:01 UTC

[root@dhcp35-37 ~]# gluster v heal comp info
Brick 10.70.35.37:/rhs/brick4/comp
Status: Connected
Number of entries: 0

Brick 10.70.35.116:/rhs/brick4/comp
/dir1/file1 
Status: Connected
Number of entries: 1

[root@dhcp35-37 ~]# gluster v heal comp info
Brick 10.70.35.37:/rhs/brick4/comp
Status: Connected
Number of entries: 0

Brick 10.70.35.116:/rhs/brick4/comp
Status: Connected
Number of entries: 0

[root@dhcp35-37 ~]# gluster v info comp
 
Volume Name: comp
Type: Replicate
Volume ID: 1dae5832-1625-4287-b478-3be79be62d68
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.35.37:/rhs/brick4/comp
Brick2: 10.70.35.116:/rhs/brick4/comp
Options Reconfigured:
cluster.use-compound-fops: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@dhcp35-37 ~]# gluster v status comp
Status of volume: comp
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.37:/rhs/brick4/comp          49155     0          Y       23814
Brick 10.70.35.116:/rhs/brick4/comp         49155     0          Y       28670
Self-heal Daemon on localhost               N/A       N/A        Y       23834
Self-heal Daemon on dhcp35-239.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       17583
Self-heal Daemon on dhcp35-196.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       22197
Self-heal Daemon on dhcp35-116.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       20028
Self-heal Daemon on dhcp35-135.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       16450
Self-heal Daemon on dhcp35-8.lab.eng.blr.re
dhat.com                                    N/A       N/A        Y       16470
 
Task Status of Volume comp
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-37 ~]#

Comment 18 Nag Pavan Chilakam 2016-12-27 09:22:24 UTC

refer bz https://bugzilla.redhat.com/show_bug.cgi?id=1398315 for validation log

Comment 20 errata-xmlrpc 2017-03-23 05:50:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html