Description of problem: ====================== In my systemic setup https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=760435885 I created about 20million files. Later after about 3 days, I tried to delete the files from one client and from another client, I tried to rename the same files. On both clients, there was OOM kill of the glusterfs fuse mount process Version-Release number of selected component (if applicable): =========================================================== [root@dhcp35-153 ~]# rpm -qa|grep gluster glusterfs-3.8.4-1.el7rhgs.x86_64 glusterfs-server-3.8.4-1.el7rhgs.x86_64 glusterfs-api-3.8.4-1.el7rhgs.x86_64 python-gluster-3.8.4-1.el7rhgs.noarch glusterfs-events-3.8.4-1.el7rhgs.x86_64 glusterfs-libs-3.8.4-1.el7rhgs.x86_64 glusterfs-fuse-3.8.4-1.el7rhgs.x86_64 glusterfs-rdma-3.8.4-1.el7rhgs.x86_64 glusterfs-cli-3.8.4-1.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-1.el7rhgs.x86_64 glusterfs-devel-3.8.4-1.el7rhgs.x86_64 Steps to Reproduce: =================== 1.created 20million 10kb files. 2.after about 3 days, did following from different clients: client1: ======= cd /mnt/distrepvol/rootdir1/dhcp35-153.lab.eng.blr.redhat.com/dd for i in {1..20000000};do mv -f tinyfile.$i renamed_tinyfile.35_153.$i |& tee -a error_while_rename.log;done Client2: ======= cd /mnt/distrepvol/rootdir1/dhcp35-153.lab.eng.blr.redhat.com/dd rm -rf tinyf* 2>>error_while_delete.log Client3: (octal dump of the tiny files) ======= root@dhcp35-79 ~]# cd /mnt/distrepvol-new/rootdir1/dhcp35-153.lab.eng.blr.redhat.com/dd [root@dhcp35-79 dd]# for i in {20000000..1};do od tinyfile.$i 2>>odproblem.log;echo "####### this is loop $i" >>odproblem.log;done However, on client3 I didn;t see any oom kill, rather i saw below error message in the screen session, so i guess the read through od(octal dump) itself failed to trigger) [root@dhcp35-79 dd]# for i in {20000000..1};do od tinyfile.$i 2>>odproblem.log;echo "####### this is loop $i" >>odproblem.log;done bash: fork: Cannot allocate memory Client1: Where I was doing rename, the fuse mount process seemed to have spiked consumption of memory from 19% to 90% in 42 minutes( i run the log collection every minute) [root@dhcp35-86 profiling]# cat top_30sep.log |grep glusterfs PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14321 root 20 0 2489036 1.512g 4376 S 0.0 19.8 542:00.09 glusterfs 14321 root 20 0 2489036 1.522g 4376 S 18.8 19.9 542:01.80 glusterfs 14321 root 20 0 2751180 1.769g 4376 S 0.0 23.1 542:09.51 glusterfs 14321 root 20 0 3473104 2.426g 4376 S 6.2 31.7 542:21.95 glusterfs 14321 root 20 0 4062928 3.021g 4376 S 0.0 39.5 542:28.81 glusterfs 14321 root 20 0 4325072 3.288g 4376 S 0.0 43.0 542:32.60 glusterfs 14321 root 20 0 4783824 3.725g 4380 S 0.0 48.7 542:38.87 glusterfs 14321 root 20 0 5045968 3.908g 4380 S 0.0 51.1 542:41.49 glusterfs 14321 root 20 0 5177040 4.039g 4380 S 0.0 52.8 542:43.58 glusterfs 14321 root 20 0 5308112 4.207g 4380 S 0.0 55.0 542:46.16 glusterfs 14321 root 20 0 5439184 4.381g 4380 S 0.0 57.3 542:49.14 glusterfs 14321 root 20 0 5635792 4.514g 4380 S 0.0 59.0 542:51.46 glusterfs 14321 root 20 0 5701328 4.576g 4380 S 0.0 59.8 543:29.11 glusterfs 14321 root 20 0 5963472 4.798g 4380 S 0.0 62.8 543:32.48 glusterfs 14321 root 20 0 6160080 5.005g 4380 S 50.0 65.5 543:41.55 glusterfs 14321 root 20 0 6553296 5.380g 4100 S 0.0 70.4 543:58.35 glusterfs 14321 root 20 0 6880976 5.698g 2968 S 0.0 74.5 544:04.79 glusterfs 14321 root 20 0 7208656 5.917g 2792 S 0.0 77.4 544:13.45 glusterfs 14321 root 20 0 7470800 6.150g 2816 S 0.0 80.4 544:17.48 glusterfs 14321 root 20 0 7798480 6.149g 2076 S 55.0 80.4 544:30.85 glusterfs 14321 root 20 0 8126160 6.316g 2272 S 18.8 82.6 544:37.36 glusterfs 14321 root 20 0 8715984 6.442g 2044 S 11.8 84.3 544:47.85 glusterfs 14321 root 20 0 9109200 6.500g 1896 S 12.5 85.0 544:57.58 glusterfs 14321 root 20 0 9240272 6.498g 1924 S 0.0 85.0 545:00.45 glusterfs 14321 root 20 0 9305808 6.501g 2072 S 6.2 85.0 545:02.26 glusterfs 14321 root 20 0 9371344 6.492g 2184 S 0.0 84.9 545:03.74 glusterfs 14321 root 20 0 9436880 6.515g 2292 S 0.0 85.2 545:05.74 glusterfs 14321 root 20 0 9567952 6.485g 1932 S 0.0 84.8 545:08.03 glusterfs 14321 root 20 0 9699024 6.475g 2280 S 0.0 84.7 545:10.66 glusterfs 14321 root 20 0 9764560 6.544g 2292 S 0.0 85.6 545:12.77 glusterfs 14321 root 20 0 9830096 6.557g 2292 S 0.0 85.8 545:14.95 glusterfs 14321 root 20 0 9895632 6.561g 2308 S 0.0 85.8 545:16.75 glusterfs 14321 root 20 0 9791.7m 6.560g 2308 S 0.0 85.8 545:18.71 glusterfs 14321 root 20 0 9791.7m 6.554g 1828 S 0.0 85.7 545:20.90 glusterfs 14321 root 20 0 9919.7m 6.547g 2088 S 0.0 85.6 545:23.59 glusterfs 14321 root 20 0 9.812g 6.536g 2060 S 11.8 85.5 545:27.10 glusterfs 14321 root 20 0 9.937g 6.545g 2284 S 0.0 85.6 545:29.84 glusterfs 14321 root 20 0 10.062g 6.613g 2252 S 0.0 86.5 545:32.98 glusterfs 14321 root 20 0 10.187g 6.667g 2284 S 0.0 87.2 545:36.29 glusterfs 14321 root 20 0 10.312g 6.649g 2284 S 0.0 87.0 545:38.25 glusterfs 14321 root 20 0 10.375g 6.688g 2284 S 0.0 87.5 545:40.83 glusterfs 14321 root 20 0 10.500g 6.693g 2284 S 0.0 87.5 545:43.92 glusterfs Client 2 TOP log collection": [root@dhcp35-86 dhcp35-153.lab.eng.blr.redhat.com]# cat top_30sep_35_153.log |grep gluster 21124 root 20 0 4921808 3.606g 2336 S 0.0 47.2 337:32.45 glusterfs 21124 root 20 0 4921808 3.607g 2504 S 0.0 47.2 337:32.61 glusterfs 21124 root 20 0 4921808 3.607g 2732 S 0.0 47.2 337:32.75 glusterfs 21124 root 20 0 4921808 3.482g 2408 S 0.0 45.5 337:50.69 glusterfs 21124 root 20 0 4921808 3.482g 2412 S 0.0 45.5 337:51.51 glusterfs 21124 root 20 0 4921808 3.482g 2412 S 0.0 45.5 337:52.77 glusterfs 21124 root 20 0 4921808 3.395g 2072 S 5.6 44.4 337:54.67 glusterfs 21124 root 20 0 4921808 3.402g 2092 S 0.0 44.5 337:55.79 glusterfs 21124 root 20 0 4921808 3.437g 2108 S 0.0 45.0 337:57.04 glusterfs 21124 root 20 0 4921808 3.474g 2108 S 0.0 45.4 337:58.21 glusterfs 21124 root 20 0 4921808 3.499g 2108 S 0.0 45.8 337:59.48 glusterfs 21124 root 20 0 4921808 3.499g 2108 S 0.0 45.8 338:00.30 glusterfs 21124 root 20 0 4921808 3.502g 2140 S 0.0 45.8 338:01.47 glusterfs 21124 root 20 0 4921808 3.080g 1872 S 5.9 40.3 338:08.16 glusterfs 21124 root 20 0 4921808 3.062g 1936 S 18.8 40.0 338:11.24 glusterfs 21124 root 20 0 4921808 3.083g 2008 S 0.0 40.3 338:13.29 glusterfs 21124 root 20 0 4921808 3.173g 2008 S 0.0 41.5 338:16.01 glusterfs 21124 root 20 0 4921808 3.167g 2008 S 0.0 41.4 338:17.05 glusterfs 21124 root 20 0 4921808 3.171g 2008 S 0.0 41.5 338:17.80 glusterfs 21124 root 20 0 4987344 3.199g 2008 S 6.2 41.8 338:18.99 glusterfs 21124 root 20 0 5052880 3.256g 2008 S 0.0 42.6 338:20.19 glusterfs 21124 root 20 0 5118416 3.311g 2008 S 31.2 43.3 338:21.25 glusterfs 21124 root 20 0 5183952 3.446g 2008 S 6.2 45.1 338:22.72 glusterfs 21124 root 20 0 5380560 3.463g 2192 S 5.6 45.3 338:23.71 glusterfs 21124 root 20 0 5642704 3.694g 2000 S 23.5 48.3 338:25.25 glusterfs 21124 root 20 0 5970384 3.992g 1792 S 0.0 52.2 338:27.18 glusterfs 21124 root 20 0 6298064 4.330g 2000 S 0.0 56.6 338:29.06 glusterfs 21124 root 20 0 6625744 4.573g 1908 S 0.0 59.8 338:30.61 glusterfs 21124 root 20 0 6887888 4.778g 2052 S 0.0 62.5 338:31.97 glusterfs 21124 root 20 0 7346640 5.197g 2008 S 0.0 68.0 338:34.22 glusterfs 21124 root 20 0 7608784 5.467g 2012 S 6.2 71.5 338:35.50 glusterfs 21124 root 20 0 8133072 5.817g 1808 S 0.0 76.1 338:37.83 glusterfs For statedumps , let me know I have been taking every 30min, Can point you to the machine(the clients itself)
these can be easily hit by customer, just think if the admin wanted to delete a directory in one shot which has huge list of files.
Logs available at rhsqe-repo.lab.eng.blr.redhat.com: /home/repo/sosreports/nchilaka/bug.1381140
Is it possible to rerun the test when glusterfs is mounted with following options? 1. --entry-timeout=0 2. --attribute-timeout=0 I just want to see whether large number of inodes in itable is the culprit.
(In reply to Raghavendra G from comment #4) > Is it possible to rerun the test when glusterfs is mounted with following > options? > 1. --entry-timeout=0 > 2. --attribute-timeout=0 > > I just want to see whether large number of inodes in itable is the culprit. Another thing to try is to disable readdirp (along with above options). To summarize, 1. mount glusterfs with --use-readdirp=no --entry-timeout=0 --attribute-timeout=0 2. gluster volume set <volname> performance.force-readdirp off 3. gluster volume set <volname> dht.force-readdirp off regards, Raghavendra
Is it possible to provide statedumps associated with this bz?
As discussed in triage meeting raised following bugs 1400067 - OOM kill of glusterfs fuse mount process seen on client where i was doing rename 1400071 - OOM kill of glusterfs fuse mount process seen on client where i was doing deletes also, marked these two bugs with internal whiteboard as "3.2.0-beyond"
An upstream patch http://review.gluster.org/#/c/16137/ posted for review
downstream patch : downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93587
I am blocked with validation of this bug due to 1409472 brick crashed on systemic setup
Hi Raghavendra, As part of validation, I am incorporating the steps I mentioned while raising the bug itself to validate this bz However, in https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c9 , we mentioned about breaking this bz into 3 different BZs including this one. So how should i go about testing this. Shall I still proceed with what i mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c0 or you want me to test only a part of the test scenario. Also , if you have any other idea for me to incorporate to test this, kindly let me know thanks, nagpavan
(In reply to nchilaka from comment #16) > Hi Raghavendra, > As part of validation, I am incorporating the steps I mentioned while > raising the bug itself to validate this bz > However, in https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c9 , we > mentioned about breaking this bz into 3 different BZs including this one. > So how should i go about testing this. > Shall I still proceed with what i mentioned in > https://bugzilla.redhat.com/show_bug.cgi?id=1381140#c0 > or you want me to test only a part of the test scenario. We need to test only directory traversal part. No need of rm and mv. Please have the same data set and just do find <glusterfs-mount> or ls -lR <glusterfs-mount> with readdir-ahead enabled. > > Also , if you have any other idea for me to incorporate to test this, kindly > let me know > > > thanks, > nagpavan
Validation: Based on comment#17, setup1: had created a 4x2 vol (default settings meaning readdir-ahead enabled) created about 10million files and did below I did a lookup from two different clients on the whole filesystem using find * on c1 and ls -lRt on c2 I did this in a loop of 5 times didn't see any fuse OOM kill Setup2: on my systemic setup did a lookup from two diffrent clients didn't hit any oom kill hence moving to verified 3.8.4.-14
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html