Bug 1398315 - [compound FOPs]: Memory leak while doing FOPs with brick down
Summary: [compound FOPs]: Memory leak while doing FOPs with brick down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: RHGS 3.2.0
Assignee: Krutika Dhananjay
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks: 1351528 1399578 1399891
TreeView+ depends on / blocked
 
Reported: 2016-11-24 12:37 UTC by Nag Pavan Chilakam
Modified: 2017-03-23 05:51 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8.4-10
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1399578 (view as bug list)
Environment:
Last Closed: 2017-03-23 05:51:04 UTC
Embargoed:


Attachments (Terms of Use)
on_qa top output validation log server and client (124.24 KB, text/plain)
2016-12-27 09:21 UTC, Nag Pavan Chilakam
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Nag Pavan Chilakam 2016-11-24 12:37:15 UTC
Description of problem:
======================
Firstly I have raised a issue  1398311 - [compound FOPs]:in replica pair one brick is down the other Brick process and fuse client process consume high memory at a increasing pace
Both bugs are hit with same procedure.
However that bug was raised due to issue of "MEMORY CONSUMPTION INCREASING CONSISTENTLY"

this bug I am raising, because even after the file is completely written to the brick(with one brick down) the memory is not getting cleared. Hence a very high chance of memory leak. This is seen in both brick process and fuse client

Fuse client: I check in interval of 10min post the write was complete and didn't see any change in memory consumed
2951 root      20   0 86.157g 0.014t      0 S   0.3 93.3  27:39.73 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 86.157g 0.014t      0 S   2.0 93.3  27:43.90 glusterfs



same with brick process
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1431 root      20   0  608672  24324   4256 S   0.0  0.3   0:01.99 glusterd
 3914 root      20   0 4461344 3.097g   4348 S   0.0 40.5  15:00.45 glusterfsd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:01.82 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1431 root      20   0  608672  24324   4256 S   0.0  0.3   0:02.00 glusterd
 3914 root      20   0 4461344 3.097g   4348 S   0.0 40.5  15:00.45 glusterfsd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:01.84 glusterfs

Version-Release number of selected component (if applicable):
==========
3.8.4-5

Steps to Reproduce:
1. create a 1x2 vol
2. enable compound fops, fuse mount the volume on a client
3. keep track of the memory consumption by both  the brick processes and the client process
4. create a 10 gb file with dd
5. after about 5gb is written bring down one brick

Now after the file is completely written, note down the memory consumed by brick and the fuse client

Now leave the setup idle and check after 15min.
You don't see any freed up memory


Note: I would like to track them as two different issues. However on RCA if we find that the root cause is same, then we can go ahead and dup one of them to the other

Comment 2 Nag Pavan Chilakam 2016-11-24 12:48:52 UTC
and here comes the OOM Kill :)
[Thu Nov 24 18:13:50 2016] glusterfs invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
[Thu Nov 24 18:13:50 2016] glusterfs cpuset=/ mems_allowed=0-1
[Thu Nov 24 18:13:50 2016] CPU: 3 PID: 2953 Comm: glusterfs Not tainted 3.10.0-510.el7.x86_64 #1
[Thu Nov 24 18:13:50 2016] Hardware name: Supermicro X9DRW-3LN4F+/X9DRW-3TF+/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 1.0b 05/29/2012
[Thu Nov 24 18:13:50 2016]  ffff880475a93ec0 000000002e7f78c3 ffff88046bb23990 ffffffff81685ccc
[Thu Nov 24 18:13:50 2016]  ffff88046bb23a20 ffffffff81680c77 ffffffff812ae65b ffff880476e27d00
[Thu Nov 24 18:13:50 2016]  ffff880476e27d18 ffffffff00000202 fffeefff00000000 0000000000000001
[Thu Nov 24 18:13:50 2016] Call Trace:
[Thu Nov 24 18:13:50 2016]  [<ffffffff81685ccc>] dump_stack+0x19/0x1b
[Thu Nov 24 18:13:50 2016]  [<ffffffff81680c77>] dump_header+0x8e/0x225
[Thu Nov 24 18:13:50 2016]  [<ffffffff812ae65b>] ? cred_has_capability+0x6b/0x120
[Thu Nov 24 18:13:50 2016]  [<ffffffff8113cb03>] ? delayacct_end+0x33/0xb0
[Thu Nov 24 18:13:50 2016]  [<ffffffff8118460e>] oom_kill_process+0x24e/0x3c0
[Thu Nov 24 18:13:50 2016]  [<ffffffff81184e46>] out_of_memory+0x4b6/0x4f0
[Thu Nov 24 18:13:50 2016]  [<ffffffff81681780>] __alloc_pages_slowpath+0x5d7/0x725
[Thu Nov 24 18:13:50 2016]  [<ffffffff8118af55>] __alloc_pages_nodemask+0x405/0x420
[Thu Nov 24 18:13:50 2016]  [<ffffffff811d209a>] alloc_pages_vma+0x9a/0x150
[Thu Nov 24 18:13:50 2016]  [<ffffffff811c2e8b>] read_swap_cache_async+0xeb/0x160
[Thu Nov 24 18:13:50 2016]  [<ffffffff811c2fa8>] swapin_readahead+0xa8/0x110
[Thu Nov 24 18:13:50 2016]  [<ffffffff811b120c>] handle_mm_fault+0xb1c/0xfe0
[Thu Nov 24 18:13:50 2016]  [<ffffffff81691794>] __do_page_fault+0x154/0x450
[Thu Nov 24 18:13:50 2016]  [<ffffffff81691ac5>] do_page_fault+0x35/0x90
[Thu Nov 24 18:13:50 2016]  [<ffffffff8168dfc0>] ? bstep_iret+0xf/0xf
[Thu Nov 24 18:13:50 2016]  [<ffffffff8168dd88>] page_fault+0x28/0x30
[Thu Nov 24 18:13:50 2016] Mem-Info:
[Thu Nov 24 18:13:50 2016] active_anon:3322839 inactive_anon:510929 isolated_anon:0
 active_file:174 inactive_file:754 isolated_file:0
 unevictable:0 dirty:0 writeback:136 unstable:0
 slab_reclaimable:11575 slab_unreclaimable:22836
 mapped:291 shmem:742 pagetables:45178 bounce:0
 free:32274 free_pcp:30 free_cma:0
[Thu Nov 24 18:13:50 2016] Node 0 DMA free:15848kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15932kB managed:15848kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 1763 7777 7777
[Thu Nov 24 18:13:50 2016] Node 0 DMA32 free:33960kB min:10020kB low:12524kB high:15028kB active_anon:1239140kB inactive_anon:445404kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052108kB managed:1807368kB mlocked:0kB dirty:0kB writeback:0kB mapped:612kB shmem:608kB slab_reclaimable:1464kB slab_unreclaimable:8624kB kernel_stack:336kB pagetables:3624kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 6014 6014
[Thu Nov 24 18:13:50 2016] Node 0 Normal free:34060kB min:34180kB low:42724kB high:51268kB active_anon:4993472kB inactive_anon:713816kB active_file:632kB inactive_file:3244kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:6291456kB managed:6158340kB mlocked:0kB dirty:0kB writeback:344kB mapped:456kB shmem:2316kB slab_reclaimable:13080kB slab_unreclaimable:44936kB kernel_stack:3136kB pagetables:50900kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:20840 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0
[Thu Nov 24 18:13:50 2016] Node 1 Normal free:45228kB min:45820kB low:57272kB high:68728kB active_anon:7058744kB inactive_anon:884496kB active_file:64kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:8388608kB managed:8255248kB mlocked:0kB dirty:0kB writeback:200kB mapped:96kB shmem:44kB slab_reclaimable:31756kB slab_unreclaimable:37784kB kernel_stack:2400kB pagetables:126188kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8515 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0
[Thu Nov 24 18:13:50 2016] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15848kB
[Thu Nov 24 18:13:50 2016] Node 0 DMA32: 1013*4kB (UM) 688*8kB (UEM) 266*16kB (UEM) 54*32kB (UEM) 16*64kB (UEM) 6*128kB (EM) 7*256kB (EM) 5*512kB (M) 8*1024kB (UEM) 2*2048kB (M) 0*4096kB = 33972kB
[Thu Nov 24 18:13:50 2016] Node 0 Normal: 206*4kB (UEM) 158*8kB (UEM) 87*16kB (UEM) 59*32kB (UEM) 87*64kB (UEM) 38*128kB (UM) 24*256kB (UE) 6*512kB (UEM) 10*1024kB (M) 0*2048kB 0*4096kB = 35256kB
[Thu Nov 24 18:13:50 2016] Node 1 Normal: 164*4kB (UEM) 114*8kB (UEM) 68*16kB (UEM) 47*32kB (UEM) 17*64kB (UEM) 6*128kB (UM) 11*256kB (UEM) 35*512kB (UM) 19*1024kB (UM) 0*2048kB 0*4096kB = 46208kB
[Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Thu Nov 24 18:13:50 2016] 28778 total pagecache pages
[Thu Nov 24 18:13:50 2016] 27044 pages in swap cache
[Thu Nov 24 18:13:50 2016] Swap cache stats: add 2112210, delete 2085166, find 18057/22414
[Thu Nov 24 18:13:50 2016] Free swap  = 0kB
[Thu Nov 24 18:13:50 2016] Total swap = 8257532kB
[Thu Nov 24 18:13:50 2016] 4187026 pages RAM
[Thu Nov 24 18:13:50 2016] 0 pages HighMem/MovableOnly
[Thu Nov 24 18:13:50 2016] 127825 pages reserved
[Thu Nov 24 18:13:50 2016] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[Thu Nov 24 18:13:50 2016] [  731]     0   731     9204      172      21       49             0 systemd-journal
[Thu Nov 24 18:13:50 2016] [  752]     0   752    67411        0      34      608             0 lvmetad
[Thu Nov 24 18:13:50 2016] [  768]     0   768    11319        1      23      245         -1000 systemd-udevd
[Thu Nov 24 18:13:50 2016] [ 1091]     0  1091    13854       23      28       87         -1000 auditd
[Thu Nov 24 18:13:50 2016] [ 1113]     0  1113     4860       81      14       38             0 irqbalance
[Thu Nov 24 18:13:50 2016] [ 1116]    81  1116     8207       95      17       52          -900 dbus-daemon
[Thu Nov 24 18:13:50 2016] [ 1119]   997  1119    28962       47      26       50             0 chronyd
[Thu Nov 24 18:13:50 2016] [ 1127]   998  1127   132067       81      55     1658             0 polkitd
[Thu Nov 24 18:13:50 2016] [ 1128]     0  1128     6048       43      16       30             0 systemd-logind
[Thu Nov 24 18:13:50 2016] [ 1131]     0  1131    31556       26      19      130             0 crond
[Thu Nov 24 18:13:50 2016] [ 1141]     0  1141    81800      261      82     4781             0 firewalld
[Thu Nov 24 18:13:50 2016] [ 1148]     0  1148    27509        1      10       31             0 agetty
[Thu Nov 24 18:13:50 2016] [ 1150]     0  1150   109534      294      68      345             0 NetworkManager
[Thu Nov 24 18:13:50 2016] [ 1250]     0  1250    28206        1      55     3122             0 dhclient
[Thu Nov 24 18:13:50 2016] [ 1508]     0  1508    54944      164      38      135             0 rsyslogd
[Thu Nov 24 18:13:50 2016] [ 1511]     0  1511   138288       91      89     2576             0 tuned
[Thu Nov 24 18:13:50 2016] [ 1516]     0  1516    28335        1      11       38             0 rhsmcertd
[Thu Nov 24 18:13:50 2016] [ 1538]     0  1538    20617       25      42      189         -1000 sshd
[Thu Nov 24 18:13:50 2016] [ 1552]     0  1552    26971        0       9       24             0 rhnsd
[Thu Nov 24 18:13:50 2016] [ 2331]     0  2331    22244       16      41      239             0 master
[Thu Nov 24 18:13:50 2016] [ 2363]    89  2363    22270       15      44      235             0 pickup
[Thu Nov 24 18:13:50 2016] [ 2365]    89  2365    22287       14      44      236             0 qmgr
[Thu Nov 24 18:13:50 2016] [ 2869]     0  2869    35726       28      71      291             0 sshd
[Thu Nov 24 18:13:50 2016] [ 2873]     0  2873    29316       81      15      492             0 bash
[Thu Nov 24 18:13:50 2016] [ 2951]     0  2951 22585439  3780372   43885  2041242             0 glusterfs
[Thu Nov 24 18:13:50 2016] [ 2969]     0  2969    35726       26      68      291             0 sshd
[Thu Nov 24 18:13:50 2016] [ 2973]     0  2973    28846       72      14       39             0 bash
[Thu Nov 24 18:13:50 2016] [ 2998]     0  2998    31927       68      17       70             0 screen
[Thu Nov 24 18:13:50 2016] [ 2999]     0  2999    38218     4753      32     4734             0 bash
[Thu Nov 24 18:13:50 2016] [ 3674]     0  3674    35726      316      72        0             0 sshd
[Thu Nov 24 18:13:50 2016] [ 3678]     0  3678    28846      109      12        0             0 bash
[Thu Nov 24 18:13:50 2016] [ 3815]     0  3815   130941    18547     177        0             0 yum
[Thu Nov 24 18:13:50 2016] Out of memory: Kill process 2951 (glusterfs) score 929 or sacrifice child
[Thu Nov 24 18:13:50 2016] Killed process 2951 (glusterfs) total-vm:90341756kB, anon-rss:15121488kB, file-rss:0kB, shmem-rss:0kB

Comment 4 Nag Pavan Chilakam 2016-11-29 06:56:18 UTC
hit this while validating RFE 1360978 - [RFE]Reducing number of network round trips

Comment 5 surabhi 2016-11-29 08:45:04 UTC
As discussed in Bug triage meeting providing qa_ack

Comment 6 Krutika Dhananjay 2016-11-29 10:44:07 UTC
http://review.gluster.org/#/c/15965/ <--- patch posted in upstream master.

Moving this bug to POST state.

-Krutika

Comment 11 Nag Pavan Chilakam 2016-12-19 10:04:50 UTC
I am failing this bug as the memory is not getting released even after fops are completed
Refer below bz for more details wrt validation

1398311 - [compound FOPs]:in replica pair one brick is down the other Brick process and fuse client process consume high memory at a increasing pace

Comment 12 Atin Mukherjee 2016-12-20 07:42:47 UTC
upstream mainline patch http://review.gluster.org/#/c/16210/ posted for review.

Comment 13 Atin Mukherjee 2016-12-21 10:04:30 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93488/

Comment 14 Milind Changire 2016-12-22 18:47:32 UTC
BZ added to erratum https://errata.devel.redhat.com/advisory/24866
Moving to ON_QA

Comment 15 Nag Pavan Chilakam 2016-12-27 09:16:46 UTC
on_qa validation:
=================

there was minimal mem consumption when compared to before fix
Hence moving the testcase to pass, ie the bz to verified

had a 1x2 vol with cfops enabled
mounted vol one two client cli45(el7.3) and cli24(el6.7)
From cli45, started to create a 10gb file using dd
after about 3gb was created brought down b1

kept tracking client side and brick process memeory consumption below are the details, in a loop for every 30sec
cli45:
before file create started:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  41632   3196 S   0.0  0.3   0:00.05 glusterfs
after create was issued:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46764   3804 S  75.0  0.3   0:21.34 glusterfs
==>it shot by 5MB
but from there on it was constant or minimal increase
before brick b1 was killed:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46848   3824 S  66.7  0.3   4:49.47 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46848   3824 S  73.3  0.3   5:11.78 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46848   3824 S  62.5  0.3   5:33.70 glusterfs


after b1 was killed:
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  643012  46848   3824 S  68.8  0.3   5:55.95 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47264   3936 S  68.8  0.3   6:17.59 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47268   3936 S  73.3  0.3   6:38.93 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47268   3936 S  68.8  0.3   7:00.40 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47268   3936 S  81.2  0.3   7:21.79 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47268   3936 S  75.0  0.3   7:42.81 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  645148  47328   3936 S  68.8  0.3  12:40.12 glusterfs

after b1 was brought up using vol force start:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  778272  49512   4028 S  93.8  0.3  13:55.20 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  778272  49516   4032 S 106.2  0.3  14:26.07 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  778272  49516   4032 S 100.0  0.3  14:56.36 glusterfs


at end of file create complete 10gb file
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  776220  49512   4040 S   0.0  0.3  20:48.97 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  776220  49512   4040 S   0.0  0.3  20:48.97 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2985 root      20   0  776220  49512   4040 S   0.0  0.3  20:48.97 glusterfs




brick process(the replica brick b2 which was alive)

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1117984  48020   4024 S  37.5  0.6   0:01.36 glusterfsd
Tue Dec 27 12:53:45 IST 2016
============>started file1 create
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1117984  48272   4164 S  43.8  0.6   0:11.84 glusterfsd
Tue Dec 27 12:54:15 IST 2016
===>killing b1 after about 3gb created
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48520   4260 S  37.5  0.6   2:44.61 glusterfsd
Tue Dec 27 13:01:48 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48784   4264 S  56.2  0.6   3:00.91 glusterfsd
Tue Dec 27 13:02:18 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48784   4264 S  62.5  0.6   3:19.55 glusterfsd

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48784   4264 S  56.2  0.6   7:58.98 glusterfsd
Tue Dec 27 13:10:21 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48784   4264 S  52.9  0.6   8:17.68 glusterfsd
Tue Dec 27 13:10:51 IST 2016
===>i dont see any mem leak even after 6gb completed hence restarting brick
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1250084  48636   4116 S  62.5  0.6   8:36.46 glusterfsd
Tue Dec 27 13:11:21 IST 2016
====>started vol using force
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1316648  48848   4132 S  61.1  0.6   8:59.89 glusterfsd
Tue Dec 27 13:11:52 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1316648  48848   4132 S  25.0  0.6   9:20.06 glusterfsd
Tue Dec 27 13:12:22 IST 2016
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28670 root      20   0 1316648  49072   4152 S  37.5  0.6   9:33.38 glusterfsd
Tue Dec 27 13:12:52 IST 2016



So there was minimal mem consumption when compared to before fix


Hence moving the testcase to pass, ie the bz to verified

Comment 16 Nag Pavan Chilakam 2016-12-27 09:21:50 UTC
Created attachment 1235486 [details]
on_qa top output validation log server and client

Comment 17 Nag Pavan Chilakam 2016-12-27 09:22:07 UTC
[root@dhcp35-37 ~]# gluster v heal comp info
Brick 10.70.35.37:/rhs/brick4/comp
Status: Connected
Number of entries: 0

Brick 10.70.35.116:/rhs/brick4/comp
/dir1/file1 
Status: Connected
Number of entries: 1

[root@dhcp35-37 ~]# gluster v heal comp info
Brick 10.70.35.37:/rhs/brick4/comp
Status: Connected
Number of entries: 0

Brick 10.70.35.116:/rhs/brick4/comp
Status: Connected
Number of entries: 0

[root@dhcp35-37 ~]# gluster v info comp
 
Volume Name: comp
Type: Replicate
Volume ID: 1dae5832-1625-4287-b478-3be79be62d68
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.35.37:/rhs/brick4/comp
Brick2: 10.70.35.116:/rhs/brick4/comp
Options Reconfigured:
cluster.use-compound-fops: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@dhcp35-37 ~]# gluster v status comp
Status of volume: comp
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.37:/rhs/brick4/comp          49155     0          Y       23814
Brick 10.70.35.116:/rhs/brick4/comp         49155     0          Y       28670
Self-heal Daemon on localhost               N/A       N/A        Y       23834
Self-heal Daemon on dhcp35-239.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       17583
Self-heal Daemon on dhcp35-196.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       22197
Self-heal Daemon on dhcp35-116.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       20028
Self-heal Daemon on dhcp35-135.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       16450
Self-heal Daemon on dhcp35-8.lab.eng.blr.re
dhat.com                                    N/A       N/A        Y       16470
 
Task Status of Volume comp
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-37 ~]#

Comment 19 errata-xmlrpc 2017-03-23 05:51:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.