| Summary: | Quota calculation can get hung up on deleted files | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Adam Tygart <mozestygart+gluster> | ||||||||
| Component: | quota | Assignee: | Junaid <junaid> | ||||||||
| Status: | CLOSED DUPLICATE | QA Contact: | |||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 3.2.1 | CC: | gluster-bugs, raghavendra, vagarwal, vijay | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | Type: | --- | |||||||||
| Regression: | --- | Mount Type: | fuse | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
|
Description
Adam Tygart
2011-07-07 12:35:18 UTC
Created attachment 549 I'm new to gluster, and have been evaluating it for use in a Beowulf cluster environment. I have done extensive performance testing with distributed iozone, and saw wonderful performance. My main issue is that once I enable quotas (doesn't matter if there is a quota set), I have been triggering what looks to be an issue with locking inodes. The error itself indicates it was successful at attaining the lock, but the log claims it as a failure, and as such it cycles trying to attain the lock over again. This may be related to line like "[2011-07-07 10:00:48.989978] I [server3_1-fops.c:1240:server_writev_cbk] 0-testing-server: 4661391: WRITEV 0 (18350082) ==> -1 (Cannot allocate memory)" that occur just before the spewing of the "lock setting failed (Success)" lines, however I have no indication as to why gluster would be unable to allocate memory. Attached is the log file from the server, and memory graphs of both of my fileservers as the failure started. After extensive testing, I think this is the actual issue: If you delete a file before updates to the quota ACL have finished writing, glusterfs seems to loop trying to get an inode lock on the file to update the ACL, but as the file doesn't exist the lock fails. This may not be exactly what happens internally, but it may give you more insight into the issue. Reproduction requirements: 1 glusterfs server with quota on (mine was set to distributed mode). 2+ client machines iozone configured in distributed mode: iozone command: RSH=`which ssh` /usr/sbin/iozone -+m host_file -t 2 -r 8192 -s 128g contents of host_file: client01 /mnt/point /usr/sbin/iozone client02 /mnt/point /usr/sbin/iozone Once the test has been going for a while, but preferably on a writing test, <ctrl-c> out of iozone. This will cause iozone to clean up its child files on the mounts, and trigger the bug. Thanks for your time. I'm not sure how to explain this bug short of this: When using a gluster distributed volume the quota xattr can take some time to update. If the files get deleted before the the quota can be re-calculated, glusterfs doesn't forget about the files. It continues to try to poll them for their size, even after it is told the file no longer exists, it then gets hung up on re-calculating, causing increased user CPU load on the server, with i/o throughput gradually decreasing to nothing. Here is a link to debug logs of all clients and servers involved: http://beocat.cis.ksu.edu/glusterfs-logs.tar.bz2 iozone command: RSH=`which ssh` /usr/sbin/iozone -+m host_file -i 0 -i 2 -t 10 -r 8192 -s 40g host_file contents: scout01 /tmp/gluster /usr/sbin/iozone paladin01 /tmp/gluster /usr/sbin/iozone rogue01 /tmp/gluster /usr/sbin/iozone titan01 /tmp/gluster /usr/sbin/iozone ranger01 /tmp/gluster /usr/sbin/iozone scout02 /tmp/gluster /usr/sbin/iozone paladin02 /tmp/gluster /usr/sbin/iozone rogue02 /tmp/gluster /usr/sbin/iozone titan02 /tmp/gluster /usr/sbin/iozone ranger02 /tmp/gluster /usr/sbin/iozone Gluster Volume info: root@[tiamat] /mnt/data/testing $ gluster volume info testing Volume Name: testing Type: Distribute Status: Stopped Number of Bricks: 2 Transport-type: tcp,rdma Bricks: Brick1: minotaur:/mnt/data/testing Brick2: tiamat:/mnt/data/testing Options Reconfigured: diagnostics.client-log-level: DEBUG diagnostics.brick-log-level: DEBUG features.quota: on features.quota-timeout: 0 xattrs of the filesystem on minotaur after the test: root@[minotaur] /mnt/data/testing $ getfattr -m . -d -e hex . # file: . trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.quota.dirty=0x3100 trusted.glusterfs.quota.size=0x0000000ae0a8a000 trusted.glusterfs.test=0x776f726b696e6700 root@[minotaur] /mnt/data/testing $ ls -lah total 12K drwxr-xr-x 2 root root 4.0K Jul 13 13:38 ./ drwxr-xr-x 6 root root 4.0K Jul 7 16:10 ../ xattrs of the filesystem on tiamat after the test: root@[tiamat] /mnt/data/testing $ getfattr -m . -d -e hex . # file: . trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007fffffffffffffff trusted.glusterfs.quota.dirty=0x3100 trusted.glusterfs.quota.size=0x000000b34c94e000 trusted.glusterfs.test=0x776f726b696e6700 root@[tiamat] /mnt/data/testing $ ls -lah total 12K drwxr-xr-x 2 root root 4.0K Jul 13 13:38 ./ drwxr-xr-x 4 root root 4.0K Jul 6 19:51 ../ CHANGE: http://review.gluster.com/42 (Change-Id: I483caadbfd77d3830b1ea930a31eedddd0808fca) merged in release-3.2 by Anand Avati (avati) CHANGE: http://review.gluster.com/32 (Change-Id: I6733586ba56ed4fdc2abbef63c38b8c491f89819) merged in master by Anand Avati (avati) There are two issues involved in here: 1. memory leak. 2. Excessive logging. Patches have been sent to fix 2. Hence marking this bug as duplicate of another bug which traces mem-leak fixes in marker/quota. *** This bug has been marked as a duplicate of bug 3169 *** |