Description of problem: if I set the hard limit to a small lets say 1MB to a directory and start creating data in a for loop the limit can get crossed with that same for loop. [root@rhsauto032 ~]# gluster volume info dist-rep3 Volume Name: dist-rep3 Type: Distributed-Replicate Volume ID: 6aaeda5c-b6f6-42c2-8003-b4035f62085b Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d1r1-3 Brick2: rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d1r2-3 Brick3: rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d2r1-3 Brick4: rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d2r2-3 Brick5: rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d3r1-3 Brick6: rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d3r2-3 Brick7: rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d4r1-3 Brick8: rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d4r2-3 Brick9: rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d5r1-3 Brick10: rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d5r2-3 Brick11: rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d6r1-3 Brick12: rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d6r2-3 Options Reconfigured: features.alert-time: 10s features.quota: on [root@rhsauto032 ~]# gluster volume status dist-rep3 Status of volume: dist-rep3 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d1r 1-3 49167 Y 11530 Brick rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d1r 2-3 49170 Y 31979 Brick rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d2r 1-3 49170 Y 13829 Brick rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d2r 2-3 49170 Y 13832 Brick rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d3r 1-3 49168 Y 11541 Brick rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d3r 2-3 49171 Y 31990 Brick rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d4r 1-3 49171 Y 13840 Brick rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d4r 2-3 49171 Y 13843 Brick rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d5r 1-3 49169 Y 11552 Brick rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d5r 2-3 49172 Y 32001 Brick rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d6r 1-3 49172 Y 13851 Brick rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d6r 2-3 49172 Y 13854 NFS Server on localhost 2049 Y 13561 Self-heal Daemon on localhost N/A Y 11574 Quota Daemon on localhost N/A Y 15121 NFS Server on rhsauto034.lab.eng.blr.redhat.com 2049 Y 15301 Self-heal Daemon on rhsauto034.lab.eng.blr.redhat.com N/A Y 13871 Quota Daemon on rhsauto034.lab.eng.blr.redhat.com N/A Y 16194 NFS Server on rhsauto033.lab.eng.blr.redhat.com 2049 Y 1024 Self-heal Daemon on rhsauto033.lab.eng.blr.redhat.com N/A Y 32021 Quota Daemon on rhsauto033.lab.eng.blr.redhat.com N/A Y 2024 NFS Server on rhsauto035.lab.eng.blr.redhat.com 2049 Y 15293 Self-heal Daemon on rhsauto035.lab.eng.blr.redhat.com N/A Y 13881 Quota Daemon on rhsauto035.lab.eng.blr.redhat.com N/A Y 16315 There are no active volume tasks Version-Release number of selected component (if applicable): glusterfs-server-3.4.0.20rhsquota1-1.el6.x86_64 glusterfs-fuse-3.4.0.20rhsquota1-1.el6.x86_64 glusterfs-3.4.0.20rhsquota1-1.el6.x86_64 How reproducible: always Steps to Reproduce: 1. create a volume of 6x2 type, start it. 2. enable quota 3. mount over nfs 4. create a directory 5. set limit of 1MB on the directory 6. start creating data in the directory, #for i in {21..1040}; do dd if=/dev/urandom of=file$i bs=100KB count=1; done Actual results: [root@rhsauto032 ~]# gluster volume quota dist-rep3 list /dir1 Path Hard-limit Soft-limit Used Available -------------------------------------------------------------------------------- /dir1 1.0MB 80% 2.0MB 0Bytes Expected results: limit is not suppose to cross for any value of hard limit set. Additional info: though same behaviour is not seen for large values like 1GB, till now.
xattrs from node1, [root@rhsauto032 ~]# getfattr -m . -d -e hex /rhs/bricks/d1r1-3/dir1/ getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/d1r1-3/dir1/ trusted.afr.dist-rep3-client-0=0x000000000000000000000000 trusted.afr.dist-rep3-client-1=0x000000000000000000000000 trusted.gfid=0x4ec9311df3a8434db0c17c823555cdc3 trusted.glusterfs.dht=0x00000001000000007ffffffeaaaaaaa7 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000018800 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x00000000001000000000000000000000 trusted.glusterfs.quota.size=0x0000000000018800 [root@rhsauto032 ~]# [root@rhsauto032 ~]# [root@rhsauto032 ~]# getfattr -m . -d -e hex /rhs/bricks/d3r1-3/dir1/ getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/d3r1-3/dir1/ trusted.afr.dist-rep3-client-4=0x000000000000000000000000 trusted.afr.dist-rep3-client-5=0x000000000000000000000000 trusted.gfid=0x4ec9311df3a8434db0c17c823555cdc3 trusted.glusterfs.dht=0x0000000100000000d5555552ffffffff trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x00000000000e5800 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x00000000001000000000000000000000 trusted.glusterfs.quota.size=0x00000000000e5800 [root@rhsauto032 ~]# getfattr -m . -d -e hex /rhs/bricks/d5r1-3/dir1/ getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/d5r1-3/dir1/ trusted.gfid=0x4ec9311df3a8434db0c17c823555cdc3 trusted.glusterfs.dht=0x00000001000000002aaaaaaa55555553 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x00000000001000000000000000000000 trusted.glusterfs.quota.size=0x0000000000000000 xattrs from node3, [root@rhsauto034 bricks]# getfattr -m . -d -e hex /rhs/bricks/d2r1-3/dir1/ getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/d2r1-3/dir1/ trusted.afr.dist-rep3-client-2=0x000000000000000000000000 trusted.afr.dist-rep3-client-3=0x000000000000000000000000 trusted.gfid=0x4ec9311df3a8434db0c17c823555cdc3 trusted.glusterfs.dht=0x0000000100000000aaaaaaa8d5555551 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x000000000010d800 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x00000000001000000000000000000000 trusted.glusterfs.quota.size=0x000000000010d800 [root@rhsauto034 bricks]# getfattr -m . -d -e hex /rhs/bricks/d4r1-3/dir1/ getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/d4r1-3/dir1/ trusted.gfid=0x4ec9311df3a8434db0c17c823555cdc3 trusted.glusterfs.dht=0x0000000100000000000000002aaaaaa9 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x00000000001000000000000000000000 trusted.glusterfs.quota.size=0x0000000000000000 [root@rhsauto034 bricks]# getfattr -m . -d -e hex /rhs/bricks/d6r1-3/dir1/ getfattr: Removing leading '/' from absolute path names # file: rhs/bricks/d6r1-3/dir1/ trusted.gfid=0x4ec9311df3a8434db0c17c823555cdc3 trusted.glusterfs.dht=0x0000000100000000555555547ffffffd trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri=0x0000000000000000 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x00000000001000000000000000000000 trusted.glusterfs.quota.size=0x0000000000000000 [root@rhsauto034 bricks]#
Varun, Can you check whether this behaviour is seen, 1. in a single brick setup. 2. when hard and soft timeouts of quota set to zero. The reason for doing above two tests is that: 1. In a distributed setup, there is a time window where a brick reports its size and the time when aggregated size of the directory from all bricks reaches quota enforcer. Whatever writes that happen on other bricks (apart from the one where enforcer has requested for aggregated size) during this time window are not accounted by enforcer. Test 1 helps us to confirm whether we are hitting this issue. 2. With caching of sizes in enforcer for a timeout period, an enforcer might be missing writes happening on other nodes. Both test case 1 and 2 will help us to find out whether this is the cause. regards, Raghavendra.
Is this issue still seen in Build 3? Some fixes to accounting have gone in Build 3.
When I tried the same steps using soft-timeout, hard-timeout to 0. The test case works as expected. Output from my run: root@pranith-vm2 - /mnt/r2/dir1 03:36:03 :) ⚡ for i in {21..1040}; do dd if=/dev/urandom of=file$i bs=100KB count=1; done 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.0642977 s, 1.6 MB/s 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.061704 s, 1.6 MB/s 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.0615557 s, 1.6 MB/s 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.0660609 s, 1.5 MB/s 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.0660485 s, 1.5 MB/s 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.0617839 s, 1.6 MB/s 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.0645328 s, 1.5 MB/s 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.0661677 s, 1.5 MB/s 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.0650025 s, 1.5 MB/s 1+0 records in 1+0 records out 100000 bytes (100 kB) copied, 0.0658353 s, 1.5 MB/s dd: closing output file `file31': Disk quota exceeded ....... rest of them also fail with Disk quota exceeded. root@pranith-vm2 - /mnt/r2/dir1 03:36:25 :( ⚡ gluster volume quota r2 list Path Hard-limit Soft-limit Used Available -------------------------------------------------------------------------------- /dir1 1.0MB 80% 1.0MB 0Bytes root@pranith-vm2 - /mnt/r2/dir1 03:36:28 :) ⚡ gluster volume info Volume Name: r2 Type: Distributed-Replicate Volume ID: c2f28d51-ca0c-4e3a-b1bb-16d39ea74fa7 Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.42.237:/brick/r2_0 Brick2: 10.70.42.237:/brick/r2_1 Brick3: 10.70.42.237:/brick/r2_2 Brick4: 10.70.42.237:/brick/r2_3 Brick5: 10.70.42.237:/brick/r2_4 Brick6: 10.70.42.237:/brick/r2_5 Brick7: 10.70.42.237:/brick/r2_6 Brick8: 10.70.42.237:/brick/r2_7 Brick9: 10.70.42.237:/brick/r2_8 Brick10: 10.70.42.237:/brick/r2_9 Brick11: 10.70.42.237:/brick/r2_10 Brick12: 10.70.42.237:/brick/r2_11 Options Reconfigured: features.soft-timeout: 0 features.hard-timeout: 0 features.quota: on We need to document this to set the right expectation to the user in the admin guide.
Based on my discussion with Pranith and KP, required information is added as a note in the Admin guide, section 12.5. http://documentation-devel.engineering.redhat.com/docs/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/ch12s05.html
After a conversation with Du and Saurabh, I am updating the doc text for known issue.
Moving the known issues to Doc team, to be documented in release notes for U1
I've documented this as a known issue in the BB U1 Release Notes. Here is the link: http://documentation-devel.engineering.redhat.com/docs/en-US/Red_Hat_Storage/2.1/html/2.1_Update_1_Release_Notes/chap-Documentation-2.1_Update_1_Release_Notes-Known_Issues.html