1229422 – server_lookup_cbk erros on LOOKUP only when quota is enabled

Bug 1229422 - server_lookup_cbk erros on LOOKUP only when quota is enabled

Summary: server_lookup_cbk erros on LOOKUP only when quota is enabled

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	quota
Sub Component:
Version:	3.7.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Raghavendra G
QA Contact:
Docs Contact:
URL:
Whiteboard:	rpc-ping-timeout
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-06-08 16:59 UTC by Ryan Clough
Modified:	2017-03-08 10:49 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-03-08 10:49:06 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ryan Clough 2015-06-08 16:59:22 UTC

Description of problem:
Enabling quota causes many errors in the logs which I do not understand.

Version-Release number of selected component (if applicable):
3.7.1

How reproducible:
I am unable to test in a controlled environment but on my 2 node/brick distributed volume it happens every time.

Steps to Reproduce:
1. Enable quota
2.
3.

Actual results:
Many many LOOKUP errors on a path which does not exist. And a UUID for a node/brick that does not exist?

Expected results:
No errors.

Additional info:
glusterfs-client-xlators-3.7.1-1.el6.x86_64
glusterfs-server-3.7.1-1.el6.x86_64
glusterfs-3.7.1-1.el6.x86_64
glusterfs-api-3.7.1-1.el6.x86_64
glusterfs-cli-3.7.1-1.el6.x86_64
glusterfs-libs-3.7.1-1.el6.x86_64
glusterfs-fuse-3.7.1-1.el6.x86_64

[root@hgluster01 ~]# cat /var/lib/glusterd/glusterd.info 
UUID=875dbae1-82bd-485f-98e4-b7c5562e4da1
operating-version=30700

[root@hgluster02 ~]# cat /var/lib/glusterd/glusterd.info
UUID=d85ec083-34f2-458c-9b31-4786462ca48e
operating-version=30700

[root@hgluster01 ~]# gluster volume status
Status of volume: export_volume
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick hgluster01:/gluster_data              49152     0          Y       4712 
Brick hgluster02:/gluster_data              49152     0          Y       24584
Quota Daemon on localhost                   N/A       N/A        Y       7938 
Quota Daemon on hgluster02.red.dsic.com     N/A       N/A        Y       27770
 
Task Status of Volume export_volume
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@hgluster01 ~]# gluster volume info
Volume Name: export_volume
Type: Distribute
Volume ID: c74cc970-31e2-4924-a244-4c70d958dadb
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: hgluster01:/gluster_data
Brick2: hgluster02:/gluster_data
Options Reconfigured:
features.trash: off
performance.readdir-ahead: on
server.root-squash: on
performance.io-cache: off
network.ping-timeout: 60
performance.write-behind-window-size: 1MB
server.allow-insecure: on
auth.allow: 192.168.10.*,10.0.10.*,10.8.0.*,10.2.0.*,10.0.60.*
nfs.disable: on
cluster.eager-lock: on
features.quota: on
performance.io-thread-count: 24
performance.read-ahead: on
performance.client-io-threads: on
performance.quick-read: off
performance.flush-behind: on
performance.write-behind: on
performance.stat-prefetch: on
diagnostics.brick-log-level: ERROR
performance.cache-size: 1GB
features.inode-quota: on

Lots and lots of errors like this on both nodes. This path does not exist and as far as I can tell neither does this brick UUID:
[2015-06-08 16:19:16.855070] E [server-rpc-fops.c:148:server_lookup_cbk] 0-export_volume-server: 238287: LOOKUP /\ (00000000-0000-0000-0000-000000000001/\) ==> (Permission denied)

Also getting similar errors about paths which do actually exist but with the same UUID as the error message above:
[2015-06-08 16:37:46.458016] E [server-rpc-fops.c:148:server_lookup_cbk] 0-export_volume-server: 456799: LOOKUP /rclough (00000000-0000-0000-0000-000000000001/rclough) ==> (Permission denied)

Also got these errors shortly after enabling quota:
Jun  8 09:41:29 hgluster01 export_volume-quota-crawl[7948]: [2015-06-08 16:41:29.484670] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-export_volume-client-0: server 10.0.10.10:49152 has not responded in the last 60 seconds, disconnecting.

[2015-06-08 16:42:09.811310] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2015-06-08 16:42:12.980212] E [server-helpers.c:382:server_alloc_frame] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f781da78f80] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server.so(get_frame_from_request+0x3a9)[0x7f7809605ad9] (--> /usr/lib64/glusterfs/3.7.1/xlator/protocol/server.so(server3_3_statfs+0x8e)[0x7f780960674e] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x295)[0x7f781d843db5] (--> /usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103)[0x7f781d843ff3] ))))) 0-server: invalid argument: client

We really need to be able to use quotas to control our users data sprawl. How can I fix this? Please let me know if you need any additional information.
Thank you.

Comment 1 Ryan Clough 2015-06-08 17:00:54 UTC

Now I am unable to run gluster commands:
[root@hgluster01 ~]# gluster volume status
Error : Request timed out

[root@hgluster02 ~]# gluster volume status
Error : Request timed out

Comment 2 Ryan Clough 2015-06-08 17:51:11 UTC

Also, disabling quota does not kill the initial quota crawl from enabling quota:
[root@hgluster01 ~]# ps aux | grep find
root      7961  1.1  0.0 112528  1228 ?        S    08:57   0:48 /usr/bin/find . -exec /usr/bin/stat {} \ ;
root     28275  1.8  0.0 112528  1232 ?        R    10:03   0:03 /usr/bin/find . -exec /usr/bin/setfattr -n glusterfs.quota-xattr-cleanup -v 1 {} \ ;
root     29955  0.0  0.0 103244   896 pts/2    S+   10:06   0:00 grep find

Comment 3 Ryan Clough 2015-06-08 17:56:41 UTC

Restarting glusterd on both nodes allows me to run gluster commands again.

Comment 4 Niels de Vos 2015-06-09 12:14:12 UTC

Could you attach the complete Gluster logs for both systems please?

Comment 5 Ryan Clough 2015-06-09 16:21:53 UTC

Upload limit is much to small to attach complete Gluster logs. I have made them available via my Google Drive:
Logs from hgluster01:
https://drive.google.com/file/d/0B_WRuYXQSKvrdUk4RVZ2enVqQ3c/view?usp=sharing

Logs from hgluster02:
https://drive.google.com/file/d/0B_WRuYXQSKvrMWxFZ2YxdWVvWjA/view?usp=sharing

Comment 6 Ryan Clough 2015-06-29 20:11:06 UTC

Were you able to download the logs?

Comment 7 Ryan Clough 2015-07-09 16:00:18 UTC

Still seeing these errors. Here is another example from the quota crawl log:

[2015-07-08 22:40:06.802222] W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied]
[2015-07-08 22:40:06.802304] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 1966580: LOOKUP() /\ => -1 (Permission denied)
[2015-07-08 23:24:53.044793] W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied]
[2015-07-08 23:24:53.044868] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied)
[2015-07-08 23:24:53.045194] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2583463: LOOKUP() /\ => -1 (Permission denied)
[2015-07-08 23:24:53.048892] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied)
[2015-07-08 23:24:53.049195] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2583465: LOOKUP() /\ => -1 (Permission denied)
[2015-07-08 23:25:48.015659] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied)
[2015-07-08 23:25:48.016025] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2589784: LOOKUP() /\ => -1 (Permission denied)
[2015-07-08 23:25:48.019369] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied)
[2015-07-08 23:25:48.019703] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2589786: LOOKUP() /\ => -1 (Permission denied)
The message "W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied]" repeated 7 times between [2015-07-08 23:24:53.044793] and [2015-07-08 23:25:48.019687]
[2015-07-09 00:33:53.186559] W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-0: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied]
[2015-07-09 02:00:28.086666] W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (00000000-0000-0000-0000-000000000000) [Permission denied]
[2015-07-09 02:00:28.095138] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/\: failed to resolve (Permission denied)

Comment 8 Vijaikumar Mallikarjuna 2016-04-07 01:42:05 UTC

Below mentioned patches fixes the problem:

http://review.gluster.org/#/c/13825/
http://review.gluster.org/#/c/13909/
http://review.gluster.org/#/c/13862/

Comment 9 Cal Calhoun 2016-06-13 20:26:37 UTC

Attached SF Case 01649316 as potentially related to this BZ.  If someone can confirm or deny that it's the same issue, it would be appreciated.  Please let me know if I can provide any additional information germane to making that assessment.

Comment 12 Kaushal 2017-03-08 10:49:06 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.