Bug 1381140

Summary:	OOM kill of glusterfs fuse mount process seen on both the clients with one doing rename and the other doing delete of same files
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	readdir-ahead	Assignee:	Raghavendra G <rgowdapp>
Status:	CLOSED ERRATA	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	amukherj, nchilaka, rgowdapp, rhs-bugs, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-10	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-23 06:07:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1351528

Description Nag Pavan Chilakam 2016-10-03 08:49:55 UTC

Description of problem:
======================
In my systemic setup https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885

I created about 20million files.

Later after about 3 days, I tried to delete the files from one client and from another client,  I tried to rename the same files.

On both clients, there was OOM kill of the glusterfs fuse mount process

Version-Release number of selected component (if applicable):
===========================================================
[root@dhcp35-153 ~]# rpm -qa|grep gluster
glusterfs-3.8.4-1.el7rhgs.x86_64
glusterfs-server-3.8.4-1.el7rhgs.x86_64
glusterfs-api-3.8.4-1.el7rhgs.x86_64
python-gluster-3.8.4-1.el7rhgs.noarch
glusterfs-events-3.8.4-1.el7rhgs.x86_64
glusterfs-libs-3.8.4-1.el7rhgs.x86_64
glusterfs-fuse-3.8.4-1.el7rhgs.x86_64
glusterfs-rdma-3.8.4-1.el7rhgs.x86_64
glusterfs-cli-3.8.4-1.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-1.el7rhgs.x86_64
glusterfs-devel-3.8.4-1.el7rhgs.x86_64



Steps to Reproduce:
===================
1.created 20million 10kb files.
2.after about 3 days, did following from different clients:
client1:
=======
cd /mnt/distrepvol/rootdir1/dhcp35-153.lab.eng.blr.redhat.com/dd
 for i in {1..20000000};do mv -f tinyfile.$i renamed_tinyfile.35_153.$i |& tee -a error_while_rename.log;done

Client2:
=======
cd /mnt/distrepvol/rootdir1/dhcp35-153.lab.eng.blr.redhat.com/dd
rm -rf tinyf* 2>>error_while_delete.log

Client3: (octal dump of the tiny files)
=======
root@dhcp35-79 ~]# cd /mnt/distrepvol-new/rootdir1/dhcp35-153.lab.eng.blr.redhat.com/dd
[root@dhcp35-79 dd]# for i in {20000000..1};do od tinyfile.$i 2>>odproblem.log;echo "####### this is loop $i" >>odproblem.log;done

However, on client3 I didn;t see any oom kill, rather i saw below error message in the screen session, so i guess the read through od(octal dump) itself failed to trigger)
[root@dhcp35-79 dd]# for i in {20000000..1};do od tinyfile.$i 2>>odproblem.log;echo "####### this is loop $i" >>odproblem.log;done
bash: fork: Cannot allocate memory





Client1: Where I was doing rename, the fuse mount process seemed to have spiked consumption of memory from 19% to 90% in 42 minutes( i run the log collection every minute)

[root@dhcp35-86 profiling]# cat top_30sep.log |grep glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  
14321 root      20   0 2489036 1.512g   4376 S   0.0 19.8 542:00.09 glusterfs
14321 root      20   0 2489036 1.522g   4376 S  18.8 19.9 542:01.80 glusterfs
14321 root      20   0 2751180 1.769g   4376 S   0.0 23.1 542:09.51 glusterfs
14321 root      20   0 3473104 2.426g   4376 S   6.2 31.7 542:21.95 glusterfs
14321 root      20   0 4062928 3.021g   4376 S   0.0 39.5 542:28.81 glusterfs
14321 root      20   0 4325072 3.288g   4376 S   0.0 43.0 542:32.60 glusterfs
14321 root      20   0 4783824 3.725g   4380 S   0.0 48.7 542:38.87 glusterfs
14321 root      20   0 5045968 3.908g   4380 S   0.0 51.1 542:41.49 glusterfs
14321 root      20   0 5177040 4.039g   4380 S   0.0 52.8 542:43.58 glusterfs
14321 root      20   0 5308112 4.207g   4380 S   0.0 55.0 542:46.16 glusterfs
14321 root      20   0 5439184 4.381g   4380 S   0.0 57.3 542:49.14 glusterfs
14321 root      20   0 5635792 4.514g   4380 S   0.0 59.0 542:51.46 glusterfs
14321 root      20   0 5701328 4.576g   4380 S   0.0 59.8 543:29.11 glusterfs
14321 root      20   0 5963472 4.798g   4380 S   0.0 62.8 543:32.48 glusterfs
14321 root      20   0 6160080 5.005g   4380 S  50.0 65.5 543:41.55 glusterfs
14321 root      20   0 6553296 5.380g   4100 S   0.0 70.4 543:58.35 glusterfs
14321 root      20   0 6880976 5.698g   2968 S   0.0 74.5 544:04.79 glusterfs
14321 root      20   0 7208656 5.917g   2792 S   0.0 77.4 544:13.45 glusterfs
14321 root      20   0 7470800 6.150g   2816 S   0.0 80.4 544:17.48 glusterfs
14321 root      20   0 7798480 6.149g   2076 S  55.0 80.4 544:30.85 glusterfs
14321 root      20   0 8126160 6.316g   2272 S  18.8 82.6 544:37.36 glusterfs
14321 root      20   0 8715984 6.442g   2044 S  11.8 84.3 544:47.85 glusterfs
14321 root      20   0 9109200 6.500g   1896 S  12.5 85.0 544:57.58 glusterfs
14321 root      20   0 9240272 6.498g   1924 S   0.0 85.0 545:00.45 glusterfs
14321 root      20   0 9305808 6.501g   2072 S   6.2 85.0 545:02.26 glusterfs
14321 root      20   0 9371344 6.492g   2184 S   0.0 84.9 545:03.74 glusterfs
14321 root      20   0 9436880 6.515g   2292 S   0.0 85.2 545:05.74 glusterfs
14321 root      20   0 9567952 6.485g   1932 S   0.0 84.8 545:08.03 glusterfs
14321 root      20   0 9699024 6.475g   2280 S   0.0 84.7 545:10.66 glusterfs
14321 root      20   0 9764560 6.544g   2292 S   0.0 85.6 545:12.77 glusterfs
14321 root      20   0 9830096 6.557g   2292 S   0.0 85.8 545:14.95 glusterfs
14321 root      20   0 9895632 6.561g   2308 S   0.0 85.8 545:16.75 glusterfs
14321 root      20   0 9791.7m 6.560g   2308 S   0.0 85.8 545:18.71 glusterfs
14321 root      20   0 9791.7m 6.554g   1828 S   0.0 85.7 545:20.90 glusterfs
14321 root      20   0 9919.7m 6.547g   2088 S   0.0 85.6 545:23.59 glusterfs
14321 root      20   0  9.812g 6.536g   2060 S  11.8 85.5 545:27.10 glusterfs
14321 root      20   0  9.937g 6.545g   2284 S   0.0 85.6 545:29.84 glusterfs
14321 root      20   0 10.062g 6.613g   2252 S   0.0 86.5 545:32.98 glusterfs
14321 root      20   0 10.187g 6.667g   2284 S   0.0 87.2 545:36.29 glusterfs
14321 root      20   0 10.312g 6.649g   2284 S   0.0 87.0 545:38.25 glusterfs
14321 root      20   0 10.375g 6.688g   2284 S   0.0 87.5 545:40.83 glusterfs
14321 root      20   0 10.500g 6.693g   2284 S   0.0 87.5 545:43.92 glusterfs




Client 2 TOP log collection":

[root@dhcp35-86 dhcp35-153.lab.eng.blr.redhat.com]# cat top_30sep_35_153.log |grep gluster
21124 root      20   0 4921808 3.606g   2336 S   0.0 47.2 337:32.45 glusterfs
21124 root      20   0 4921808 3.607g   2504 S   0.0 47.2 337:32.61 glusterfs
21124 root      20   0 4921808 3.607g   2732 S   0.0 47.2 337:32.75 glusterfs
21124 root      20   0 4921808 3.482g   2408 S   0.0 45.5 337:50.69 glusterfs
21124 root      20   0 4921808 3.482g   2412 S   0.0 45.5 337:51.51 glusterfs
21124 root      20   0 4921808 3.482g   2412 S   0.0 45.5 337:52.77 glusterfs
21124 root      20   0 4921808 3.395g   2072 S   5.6 44.4 337:54.67 glusterfs
21124 root      20   0 4921808 3.402g   2092 S   0.0 44.5 337:55.79 glusterfs
21124 root      20   0 4921808 3.437g   2108 S   0.0 45.0 337:57.04 glusterfs
21124 root      20   0 4921808 3.474g   2108 S   0.0 45.4 337:58.21 glusterfs
21124 root      20   0 4921808 3.499g   2108 S   0.0 45.8 337:59.48 glusterfs
21124 root      20   0 4921808 3.499g   2108 S   0.0 45.8 338:00.30 glusterfs
21124 root      20   0 4921808 3.502g   2140 S   0.0 45.8 338:01.47 glusterfs
21124 root      20   0 4921808 3.080g   1872 S   5.9 40.3 338:08.16 glusterfs
21124 root      20   0 4921808 3.062g   1936 S  18.8 40.0 338:11.24 glusterfs
21124 root      20   0 4921808 3.083g   2008 S   0.0 40.3 338:13.29 glusterfs
21124 root      20   0 4921808 3.173g   2008 S   0.0 41.5 338:16.01 glusterfs
21124 root      20   0 4921808 3.167g   2008 S   0.0 41.4 338:17.05 glusterfs
21124 root      20   0 4921808 3.171g   2008 S   0.0 41.5 338:17.80 glusterfs
21124 root      20   0 4987344 3.199g   2008 S   6.2 41.8 338:18.99 glusterfs
21124 root      20   0 5052880 3.256g   2008 S   0.0 42.6 338:20.19 glusterfs
21124 root      20   0 5118416 3.311g   2008 S  31.2 43.3 338:21.25 glusterfs
21124 root      20   0 5183952 3.446g   2008 S   6.2 45.1 338:22.72 glusterfs
21124 root      20   0 5380560 3.463g   2192 S   5.6 45.3 338:23.71 glusterfs
21124 root      20   0 5642704 3.694g   2000 S  23.5 48.3 338:25.25 glusterfs
21124 root      20   0 5970384 3.992g   1792 S   0.0 52.2 338:27.18 glusterfs
21124 root      20   0 6298064 4.330g   2000 S   0.0 56.6 338:29.06 glusterfs
21124 root      20   0 6625744 4.573g   1908 S   0.0 59.8 338:30.61 glusterfs
21124 root      20   0 6887888 4.778g   2052 S   0.0 62.5 338:31.97 glusterfs
21124 root      20   0 7346640 5.197g   2008 S   0.0 68.0 338:34.22 glusterfs
21124 root      20   0 7608784 5.467g   2012 S   6.2 71.5 338:35.50 glusterfs
21124 root      20   0 8133072 5.817g   1808 S   0.0 76.1 338:37.83 glusterfs



For statedumps , let me know I have been taking every 30min, Can point you to the machine(the clients itself)

Comment 2 Nag Pavan Chilakam 2016-10-03 09:01:38 UTC

these can be easily hit by customer, just think if the admin wanted to delete a directory in one shot which has huge list of files.

Comment 3 Nag Pavan Chilakam 2016-10-03 10:39:04 UTC

Logs available at rhsqe-repo.lab.eng.blr.redhat.com: /home/repo/sosreports/nchilaka/bug.1381140

Comment 4 Raghavendra G 2016-10-04 06:02:58 UTC

Is it possible to rerun the test when glusterfs is mounted with following options?
1. --entry-timeout=0
2. --attribute-timeout=0

I just want to see whether large number of inodes in itable is the culprit.

Comment 5 Raghavendra G 2016-10-04 06:09:22 UTC

(In reply to Raghavendra G from comment #4)
> Is it possible to rerun the test when glusterfs is mounted with following
> options?
> 1. --entry-timeout=0
> 2. --attribute-timeout=0
> 
> I just want to see whether large number of inodes in itable is the culprit.

Another thing to try is to disable readdirp (along with above options). To summarize,
1. mount glusterfs with --use-readdirp=no --entry-timeout=0 --attribute-timeout=0
2. gluster volume set <volname> performance.force-readdirp off
3. gluster volume set <volname> dht.force-readdirp off

regards,
Raghavendra

Comment 7 Raghavendra G 2016-11-23 04:10:48 UTC

Is it possible to provide statedumps associated with this bz?

Comment 9 Nag Pavan Chilakam 2016-11-30 12:03:36 UTC

As discussed in triage meeting raised following bugs
 1400067 - OOM kill of glusterfs fuse mount process seen on client where i was doing rename
 1400071 - OOM kill of glusterfs fuse mount process seen on client where i was doing deletes 

also, marked these two bugs with internal whiteboard as "3.2.0-beyond"

Comment 12 Atin Mukherjee 2016-12-15 05:45:24 UTC

An upstream patch http://review.gluster.org/#/c/16137/ posted for review

Comment 13 Atin Mukherjee 2016-12-22 11:51:25 UTC

downstream patch : downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93587

Comment 15 Nag Pavan Chilakam 2017-01-03 08:14:51 UTC

I am blocked with validation of this bug due to 1409472	brick crashed on systemic setup

Comment 16 Nag Pavan Chilakam 2017-02-20 14:36:26 UTC

Hi Raghavendra,
As part of validation, I am incorporating the steps I mentioned while raising the bug itself to validate this bz
However, in https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c9 , we mentioned about breaking this bz into 3 different BZs including this one.
So how should i go about testing this.
Shall I still proceed with what i mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c0
or you want me to test only a part of the test scenario.

Also , if you have any other idea for me to incorporate to test this, kindly let me know


thanks,
nagpavan

Comment 17 Raghavendra G 2017-02-21 16:38:58 UTC

(In reply to nchilaka from comment #16)
> Hi Raghavendra,
> As part of validation, I am incorporating the steps I mentioned while
> raising the bug itself to validate this bz
> However, in https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c9 , we
> mentioned about breaking this bz into 3 different BZs including this one.
> So how should i go about testing this.
> Shall I still proceed with what i mentioned in
> https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c0
> or you want me to test only a part of the test scenario.

We need to test only directory traversal part. No need of rm and mv. Please have the same data set and just do find <glusterfs-mount> or ls -lR <glusterfs-mount> with readdir-ahead enabled.

> 
> Also , if you have any other idea for me to incorporate to test this, kindly
> let me know
> 
> 
> thanks,
> nagpavan

Comment 18 Nag Pavan Chilakam 2017-02-23 13:04:39 UTC

Validation:
Based on comment#17,
setup1: had created a 4x2 vol (default settings meaning readdir-ahead enabled)
created about 10million files and did below

I did a lookup from two different clients on the whole filesystem using find * on c1 and ls -lRt on c2 
I did this in a loop of 5 times
didn't see any fuse OOM kill


Setup2: on my systemic setup did a lookup from two diffrent clients 
didn't hit any oom kill

hence moving to verified

3.8.4.-14

Comment 20 errata-xmlrpc 2017-03-23 06:07:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html