Bug 1509071
| Summary: | Many files in a dir causes leaks after quota is enabled | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Hans Henrik Happe <happe> | ||||||||
| Component: | quota | Assignee: | hari gowtham <hgowtham> | ||||||||
| Status: | CLOSED UPSTREAM | QA Contact: | |||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | low | ||||||||||
| Version: | mainline | CC: | bugs, happe, hgowtham, jthottan | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2020-03-12 12:55:17 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Forgot to mention that 50000-100000 files is enough to get the brick processes to use ~1GB of memory. This is mostly due to the inode_ref leak that was fixed through https://bugzilla.redhat.com/show_bug.cgi?id=1497084 Please upgrade to 3.12.2 and check if that fixes the issue. Sorry, I forgot to mention. The latest I've tested is 3.12.2 from the centos-gluster312-test repo. It is still having the bug. I guess it is related to the creation of trusted.pgfid.XXXXX xattr after quota enable. A 'du' or rebalance starts that process. When files has the pgfid there is no leak. Thanks for confirming that, changing priority. Any news about this one? Hi Hans, I have problem accessing the script, can you paste it here? Or you can mention the exact steps performed in the script that will give us a reproducer. Also if you can give us the state dump, it will help us debug the memory leak if any. Thanks, Hari. #!/bin/bash
# Usage: fail <number of files to create>
n=$1
# Host to create bricks on
host=sciimg01
# cleanup
umount /mnt
yes|gluster vol stop mem
yes|gluster vol delete mem
umount /gluster/mem0
umount /gluster/mem1
# init
truncate -s 4g /dev/shm/b0
truncate -s 4g /dev/shm/b1
mkfs.xfs -f /dev/shm/b0
mkfs.xfs -f /dev/shm/b1
mkdir -p /gluster/mem0
mkdir -p /gluster/mem1
mount -o loop /dev/shm/b0 /gluster/mem0
mount -o loop /dev/shm/b1 /gluster/mem1
gluster vol create mem ${host}:/gluster/mem0/brick ${host}:/gluster/mem1/brick
gluster vol start mem
mount -t glusterfs ${host}:/mem /mnt
sleep 3
# create files
mkdir /mnt/many
i=0; while [ $i -lt $n ]; do touch /mnt/many/f$i; let i++; done
gluster vol quota mem enable
du -hs /mnt
# Or rebalance
# gluster vol rebalance mem start
You can run the script with about 100000 files to see the result. Created attachment 1404822 [details]
Dump brick 0 after leak on 3.12.6
Created attachment 1404823 [details]
Dump brick 1 after leak on 3.12.6
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions. We are not trying to focus a lot on Quota as a feature right now. Hence the reduction in priority! This bug is moved to https://github.com/gluster/glusterfs/issues/965, and will be tracked there from now on. Visit GitHub issues URL for further details |
Created attachment 1347114 [details] Reproduce script Description of problem: We have a production system with a dir containing 17mil files. Quota was enables later. Accessing these files with with 'du' or a rebalance causes OOM killer to kick in. I've created a script that will reproduce. See below. Version-Release number of selected component (if applicable): How reproducible: Always. Steps to Reproduce: The attched script will show the issue. Creates an in memory distributed volume and fill it with files (# is first argument). Then enables quota and running 'du'. Actual results: OOM or high memory use depending on how many files are created. Expected results: No exceptional memory use. Additional info: 3.7 does not have the issue. 3.8, 3.10 and 3.12 do.