Bug 1509071

Summary: Many files in a dir causes leaks after quota is enabled
Product: [Community] GlusterFS Reporter: Hans Henrik Happe <happe>
Component: quotaAssignee: hari gowtham <hgowtham>
Status: CLOSED UPSTREAM QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: mainlineCC: bugs, happe, hgowtham, jthottan
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:55:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Reproduce script
none
Dump brick 0 after leak on 3.12.6
none
Dump brick 1 after leak on 3.12.6 none

Description Hans Henrik Happe 2017-11-02 20:58:18 UTC
Created attachment 1347114 [details]
Reproduce script

Description of problem:

We have a production system with a dir containing 17mil files. Quota was enables later. Accessing these files with with 'du' or a rebalance causes OOM killer to kick in.

I've created a script that will reproduce. See below.

Version-Release number of selected component (if applicable):


How reproducible:

Always.

Steps to Reproduce:

The attched script will show the issue. Creates an in memory distributed volume and fill it with files (# is first argument). Then enables quota and running 'du'.

Actual results:

OOM or high memory use depending on how many files are created.

Expected results:

No exceptional memory use.

Additional info:

3.7 does not have the issue.
3.8, 3.10 and 3.12 do.

Comment 1 Hans Henrik Happe 2017-11-03 06:39:01 UTC
Forgot to mention that 50000-100000 files is enough to get the brick processes to use ~1GB of memory.

Comment 2 Sanoj Unnikrishnan 2017-11-09 09:34:16 UTC
This is mostly due to the inode_ref leak that was fixed through https://bugzilla.redhat.com/show_bug.cgi?id=1497084
Please upgrade to 3.12.2 and check if that fixes the issue.

Comment 3 Hans Henrik Happe 2017-11-09 14:43:19 UTC
Sorry, I forgot to mention. The latest I've tested is 3.12.2 from the centos-gluster312-test repo. It is still having the bug.

I guess it is related to the creation of trusted.pgfid.XXXXX xattr after quota enable. A 'du' or rebalance starts that process. When files has the pgfid there is no leak.

Comment 4 Sanoj Unnikrishnan 2017-11-10 05:36:53 UTC
Thanks for confirming that, changing priority.

Comment 5 Hans Henrik Happe 2018-01-11 06:49:19 UTC
Any news about this one?

Comment 7 hari gowtham 2018-02-02 06:01:41 UTC
Hi Hans,

I have problem accessing the script, can you paste it here?
Or you can mention the exact steps performed in the script that will give us a reproducer.

Also if you can give us the state dump, it will help us debug the memory leak if any.

Thanks,
Hari.

Comment 8 Hans Henrik Happe 2018-02-02 08:22:53 UTC
#!/bin/bash

# Usage: fail <number of files to create>

n=$1

# Host to create bricks on
host=sciimg01

# cleanup
umount /mnt

yes|gluster vol stop mem 
yes|gluster vol delete mem

umount /gluster/mem0
umount /gluster/mem1

# init
truncate -s 4g /dev/shm/b0
truncate -s 4g /dev/shm/b1

mkfs.xfs -f /dev/shm/b0
mkfs.xfs -f /dev/shm/b1

mkdir -p /gluster/mem0
mkdir -p /gluster/mem1

mount -o loop /dev/shm/b0 /gluster/mem0
mount -o loop /dev/shm/b1 /gluster/mem1


gluster vol create mem ${host}:/gluster/mem0/brick ${host}:/gluster/mem1/brick
gluster vol start mem

mount -t glusterfs ${host}:/mem /mnt
sleep 3

# create files

mkdir /mnt/many
i=0; while [ $i -lt $n ]; do touch /mnt/many/f$i; let i++; done

gluster vol quota mem enable

du -hs /mnt

# Or rebalance
# gluster vol rebalance mem start

Comment 10 Hans Henrik Happe 2018-02-08 14:52:18 UTC
You can run the script with about 100000 files to see the result.

Comment 11 Hans Henrik Happe 2018-03-06 14:01:58 UTC
Created attachment 1404822 [details]
Dump brick 0 after leak on 3.12.6

Comment 12 Hans Henrik Happe 2018-03-06 14:02:57 UTC
Created attachment 1404823 [details]
Dump brick 1 after leak on 3.12.6

Comment 14 Shyamsundar 2018-10-23 14:54:13 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 15 Amar Tumballi 2019-06-19 02:56:42 UTC
We are not trying to focus a lot on Quota as a feature right now. Hence the reduction in priority!

Comment 16 Worker Ant 2020-03-12 12:55:17 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/965, and will be tracked there from now on. Visit GitHub issues URL for further details