Description of problem: Enabling quota causes many errors in the logs which I do not understand. Version-Release number of selected component (if applicable): 3.7.1 How reproducible: I am unable to test in a controlled environment but on my 2 node/brick distributed volume it happens every time. Steps to Reproduce: 1. Enable quota 2. 3. Actual results: Many many LOOKUP errors on a path which does not exist. And a UUID for a node/brick that does not exist? Expected results: No errors. Additional info: glusterfs-client-xlators-3.7.1-1.el6.x86_64 glusterfs-server-3.7.1-1.el6.x86_64 glusterfs-3.7.1-1.el6.x86_64 glusterfs-api-3.7.1-1.el6.x86_64 glusterfs-cli-3.7.1-1.el6.x86_64 glusterfs-libs-3.7.1-1.el6.x86_64 glusterfs-fuse-3.7.1-1.el6.x86_64 [root@hgluster01 ~]# cat /var/lib/glusterd/glusterd.info UUID=875dbae1-82bd-485f-98e4-b7c5562e4da1 operating-version=30700 [root@hgluster02 ~]# cat /var/lib/glusterd/glusterd.info UUID=d85ec083-34f2-458c-9b31-4786462ca48e operating-version=30700 [root@hgluster01 ~]# gluster volume status Status of volume: export_volume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick hgluster01:/gluster_data 49152 0 Y 4712 Brick hgluster02:/gluster_data 49152 0 Y 24584 Quota Daemon on localhost N/A N/A Y 7938 Quota Daemon on hgluster02.red.dsic.com N/A N/A Y 27770 Task Status of Volume export_volume ------------------------------------------------------------------------------ There are no active volume tasks [root@hgluster01 ~]# gluster volume info Volume Name: export_volume Type: Distribute Volume ID: c74cc970-31e2-4924-a244-4c70d958dadb Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: hgluster01:/gluster_data Brick2: hgluster02:/gluster_data Options Reconfigured: features.trash: off performance.readdir-ahead: on server.root-squash: on performance.io-cache: off network.ping-timeout: 60 performance.write-behind-window-size: 1MB server.allow-insecure: on auth.allow: 192.168.10.*,10.0.10.*,10.8.0.*,10.2.0.*,10.0.60.* nfs.disable: on cluster.eager-lock: on features.quota: on performance.io-thread-count: 24 performance.read-ahead: on performance.client-io-threads: on performance.quick-read: off performance.flush-behind: on performance.write-behind: on performance.stat-prefetch: on diagnostics.brick-log-level: ERROR performance.cache-size: 1GB features.inode-quota: on Lots and lots of errors like this on both nodes. This path does not exist and as far as I can tell neither does this brick UUID: [2015-06-08 16:19:16.855070] E [server-rpc-fops.c:148:server_lookup_cbk] 0-export_volume-server: 238287: LOOKUP /\ (00000000-0000-0000-0000-000000000001/\) ==> (Permission denied) Also getting similar errors about paths which do actually exist but with the same UUID as the error message above: [2015-06-08 16:37:46.458016] E [server-rpc-fops.c:148:server_lookup_cbk] 0-export_volume-server: 456799: LOOKUP /rclough (00000000-0000-0000-0000-000000000001/rclough) ==> (Permission denied) Also got these errors shortly after enabling quota: Jun 8 09:41:29 hgluster01 export_volume-quota-crawl[7948]: [2015-06-08 16:41:29.484670] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-export_volume-client-0: server 10.0.10.10:49152 has not responded in the last 60 seconds, disconnecting. [2015-06-08 16:42:09.811310] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-06-08 16:42:12.980212] E [server-helpers.c:382:server_alloc_frame] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f781da78f80] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server.so(get_frame_from_request+0x3a9)[0x7f7809605ad9] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server.so(server3_3_statfs+0x8e)[0x7f780960674e] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x295)[0x7f781d843db5] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103)[0x7f781d843ff3] ))))) 0-server: invalid argument: client We really need to be able to use quotas to control our users data sprawl. How can I fix this? Please let me know if you need any additional information. Thank you.
Now I am unable to run gluster commands: [root@hgluster01 ~]# gluster volume status Error : Request timed out [root@hgluster02 ~]# gluster volume status Error : Request timed out
Also, disabling quota does not kill the initial quota crawl from enabling quota: [root@hgluster01 ~]# ps aux | grep find root 7961 1.1 0.0 112528 1228 ? S 08:57 0:48 /usr/bin/find . -exec /usr/bin/stat {} \ ; root 28275 1.8 0.0 112528 1232 ? R 10:03 0:03 /usr/bin/find . -exec /usr/bin/setfattr -n glusterfs.quota-xattr-cleanup -v 1 {} \ ; root 29955 0.0 0.0 103244 896 pts/2 S+ 10:06 0:00 grep find
Restarting glusterd on both nodes allows me to run gluster commands again.
Could you attach the complete Gluster logs for both systems please?
Upload limit is much to small to attach complete Gluster logs. I have made them available via my Google Drive: Logs from hgluster01: https://drive.google.com/file/d/0B_WRuYXQSKvrdUk4RVZ2enVqQ3c/view?usp=sharing Logs from hgluster02: https://drive.google.com/file/d/0B_WRuYXQSKvrMWxFZ2YxdWVvWjA/view?usp=sharing
Were you able to download the logs?
Still seeing these errors. Here is another example from the quota crawl log: [2015-07-08 22:40:06.802222] W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied] [2015-07-08 22:40:06.802304] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 1966580: LOOKUP() /\ => -1 (Permission denied) [2015-07-08 23:24:53.044793] W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied] [2015-07-08 23:24:53.044868] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied) [2015-07-08 23:24:53.045194] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2583463: LOOKUP() /\ => -1 (Permission denied) [2015-07-08 23:24:53.048892] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied) [2015-07-08 23:24:53.049195] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2583465: LOOKUP() /\ => -1 (Permission denied) [2015-07-08 23:25:48.015659] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied) [2015-07-08 23:25:48.016025] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2589784: LOOKUP() /\ => -1 (Permission denied) [2015-07-08 23:25:48.019369] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied) [2015-07-08 23:25:48.019703] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2589786: LOOKUP() /\ => -1 (Permission denied) The message "W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied]" repeated 7 times between [2015-07-08 23:24:53.044793] and [2015-07-08 23:25:48.019687] [2015-07-09 00:33:53.186559] W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-0: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied] [2015-07-09 02:00:28.086666] W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied] [2015-07-09 02:00:28.095138] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied)
Below mentioned patches fixes the problem: http://review.gluster.org/#/c/13825/ http://review.gluster.org/#/c/13909/ http://review.gluster.org/#/c/13862/
Attached SF Case 01649316 as potentially related to this BZ. If someone can confirm or deny that it's the same issue, it would be appreciated. Please let me know if I can provide any additional information germane to making that assessment.
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.