Bug 1504174

Summary: Corruption of quota.conf when setting large number of quotas
Product: [Community] GlusterFS Reporter: John Strunk <jstrunk>
Component: quotaAssignee: bugs <bugs>
Status: CLOSED DUPLICATE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.10CC: bugs, sunnikri
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-08 09:01:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1510940    
Bug Blocks:    

Description John Strunk 2017-10-19 15:35:03 UTC
Description of problem:
When setting a large number of directory quotas, setting the quota will fail after approx 7790 quotas have been added:

# sudo gluster vol quota v2 limit-usage /7790 1GB
quota command failed : Failed to set hard limit on path /7790 for volume v2


Version-Release number of selected component (if applicable):
# gluster --version
glusterfs 3.10.5
# rpm -qe glusterfs
glusterfs-3.10.5-1.fc26.x86_64

How reproducible:
always


Steps to Reproduce:
1. Create a volume (tested w/ single brick as well as replica 3): v2
2. Start volume
3. sudo gluster volume quota v2 enable
4. sudo mount -tglusterfs localhost:/v2 /mnt
5. for i in `seq 10000`; do echo $i; sudo mkdir /mnt/$i; sudo gluster vol quota v2 limit-usage /$i 1GB; done

Actual results:
After approx 7790 quotas are set, the limit-usage command begins to fail with message:
quota command failed : Failed to set hard limit on path /7790 for volume v2

At this point, no further changes can be made to quotas w/o disabling, then re-enabling.
# sudo gluster vol quota v2 remove /1
quota command failed : Commit failed on localhost. Please check the log file for more details.


Expected results:
Setting and removing quotas should either succeed or return an appropriate error message if the maximum number of supported quotas have been reached. The system should never corrupt the quota file.

Additional info:
When setting quotas fails, the following appears in glusterd.log:
[2017-10-11 16:34:15.914084] E [MSGID: 106212] [glusterd-quota.c:1141:glusterd_store_quota_config] 0-management: quota.conf corrupted
[2017-10-11 16:34:15.938086] E [MSGID: 106123] [glusterd-syncop.c:1451:gd_commit_op_phase] 0-management: Commit of operation 'Volume Quota' failed on localhost : Failed to set hard limit on path /7873 for volume v2
[2017-10-11 16:34:15.939549] I [socket.c:3608:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2017-10-11 16:34:15.939560] E [rpcsvc.c:1348:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 24) to rpc-transport (socket.management)
[2017-10-11 16:34:15.939576] E [MSGID: 106430] [glusterd-utils.c:497:glusterd_submit_reply] 0-glusterd: Reply submission failed

And trying to remove a quota after failure (glusterd.log):
[2017-10-11 16:55:14.607734] I [socket.c:348:ssl_setup_connection] 0-socket.management: peer CN = storage0
[2017-10-11 16:55:14.607834] I [socket.c:351:ssl_setup_connection] 0-socket.management: SSL verification succeeded (client: 127.0.0.1:49144)
[2017-10-11 16:55:14.640335] I [socket.c:348:ssl_setup_connection] 0-socket.management: peer CN = storage0
[2017-10-11 16:55:14.640384] I [socket.c:351:ssl_setup_connection] 0-socket.management: SSL verification succeeded (client: 127.0.0.1:49143)
[2017-10-11 16:55:14.655256] E [MSGID: 106212] [glusterd-quota.c:1141:glusterd_store_quota_config] 0-management: quota.conf corrupted
[2017-10-11 16:55:14.670981] E [MSGID: 106123] [glusterd-syncop.c:1451:gd_commit_op_phase] 0-management: Commit of operation 'Volume Quota' failed on localhost    

# sudo gluster vol info v2
 
Volume Name: v2
Type: Distribute
Volume ID: 165bf2be-d2ec-4284-80f5-fa796886b636
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: storage0:/bricks/v2/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on

To recover, quota must be disabled, then it can be re-enabled and existing directory quotas re-added, but the failure will recur if too many limits are added.

Comment 1 Sanoj Unnikrishnan 2017-11-08 13:17:33 UTC
patch (https://review.gluster.org/#/c/18695) has been posted for fixing this.

Comment 2 Sanoj Unnikrishnan 2018-03-08 09:01:02 UTC

*** This bug has been marked as a duplicate of bug 1510940 ***