+++ This bug was initially created as a clone of Bug #1394548 +++ Description of problem: Whenever we see some error on the brick it is difficult to find which xlator is giving this error and why. I am raising this bug to track a recent issue where I have to deal with figuring out why posix-acl is giving EACCES. Not completely sure if posix-acl is indeed the xlator that is giving this error either. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Pranith Kumar K on 2016-11-13 06:44:04 EST --- I will improve/add enough logs to make debugging such issues easier. --- Additional comment from Worker Ant on 2016-11-13 06:45:39 EST --- REVIEW: http://review.gluster.org/15836 (protocol/server: Print error-xlator name) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-11-13 06:45:42 EST --- REVIEW: http://review.gluster.org/15837 (system/posix: Log reason for EACCES) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Pranith Kumar K on 2016-11-13 06:48:14 EST --- This is how it will look-like: Server xlator prints the xlator which gave the error: [2016-11-13 11:37:04.240437] E [MSGID: 115050] [server-rpc-fops.c:155:server_lookup_cbk] 0-d1-server: 20: LOOKUP /.Trash-1000 (00000000-0000-0000-0000-000000000001/.Trash-1000), error-xlator: d1-access-control [Permission denied] Posix-acl logs the following error: [2016-11-13 11:37:04.240723] E [MSGID: 139001] [posix-acl.c:245:posix_acl_log_permit_denied] 0-d1-access-control: client: dhcp35-190.lab.eng.blr.redhat.com-20476-2016/11/13-11:37:00:218336-d1-client-0-0-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:1000,gid:1000,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:755,updated-fop:LOOKUP, acl:-) [Permission denied] --- Additional comment from Worker Ant on 2016-11-15 13:50:41 EST --- REVIEW: http://review.gluster.org/15836 (protocol/server: Print error-xlator name) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Pranith Kumar K on 2016-11-15 13:51:15 EST --- Tests done: [2016-11-15 16:15:55.276972] W [MSGID: 115049] [server-rpc-fops.c:61:server_statfs_cbk] 0-d1-server: 30: STATFS, client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Operation not permitted] [2016-11-15 16:19:32.702160] E [MSGID: 115050] [server-rpc-fops.c:168:server_lookup_cbk] 0-d1-server: 33: LOOKUP / (00000000-0000-0000-0000-000000000001), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable] [2016-11-15 16:22:14.103250] E [MSGID: 115056] [server-rpc-fops.c:512:server_mkdir_cbk] 0-d1-server: 82: MKDIR /d1 (00000000-0000-0000-0000-000000000001/d1) client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Permission denied] [2016-11-15 16:24:07.672258] I [MSGID: 115056] [server-rpc-fops.c:467:server_rmdir_cbk] 0-d1-server: 415: RMDIR /d1 (00000000-0000-0000-0000-000000000001/d1), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Operation not permitted] [2016-11-15 16:26:22.560947] E [MSGID: 115056] [server-rpc-fops.c:1860:server_readdirp_cbk] 0-d1-server: 459: READDIRP 0 (00000000-0000-0000-0000-000000000001), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Transport endpoint is not connected] [2016-11-15 16:27:22.081672] E [MSGID: 115056] [server-rpc-fops.c:679:server_opendir_cbk] 0-d1-server: 501: OPENDIR / (00000000-0000-0000-0000-000000000001), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable] [2016-11-15 16:29:53.057811] I [MSGID: 115060] [server-rpc-fops.c:914:server_setxattr_cbk] 0-d1-server: client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Numerical result out of range] [2016-11-15 17:28:32.951020] I [MSGID: 115061] [server-rpc-fops.c:1024:server_rename_cbk] 0-d1-server: 867: RENAME /a (00000000-0000-0000-0000-000000000000/a) -> /b (00000000-0000-0000-0000-000000000000/b), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Operation not permitted] [2016-11-15 17:31:16.111967] E [MSGID: 115062] [server-rpc-fops.c:1066:server_unlink_cbk] 0-d1-server: 954: UNLINK /b (00000000-0000-0000-0000-000000000001/b), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable] [2016-11-15 17:32:22.888055] I [MSGID: 115062] [server-rpc-fops.c:1111:server_symlink_cbk] 0-d1-server: 1122: SYMLINK /c (00000000-0000-0000-0000-000000000001/c), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator:d1-posix [Permission denied] [2016-11-15 17:33:36.232374] I [MSGID: 115062] [server-rpc-fops.c:1158:server_link_cbk] 0-d1-server: 1224: LINK /b (7d426df8-c507-441c-b886-0f2944e02434) -> 00000000-0000-0000-0000-000000000001/c, client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable] [2016-11-15 17:35:02.425027] E [MSGID: 115067] [server-rpc-fops.c:1391:server_writev_cbk] 0-d1-server: 1275: WRITEV 0 (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable] [2016-11-15 17:35:02.425027] E [MSGID: 115067] [server-rpc-fops.c:1391:server_writev_cbk] 0-d1-server: 1275: WRITEV 0 (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable] [2016-11-15 17:45:02.357029] I [MSGID: 115063] [server-rpc-fops.c:1198:server_truncate_cbk] 0-d1-server: 1403: TRUNCATE /a (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Transport endpoint is not connected] [2016-11-15 17:45:59.293328] E [MSGID: 115063] [server-rpc-fops.c:1276:server_ftruncate_cbk] 0-d1-server: 1446: FTRUNCATE 0 (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable] [2016-11-15 17:47:05.029708] E [MSGID: 115065] [server-rpc-fops.c:1315:server_flush_cbk] 0-d1-server: 1493: FLUSH 0 (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Transport endpoint is not connected] [2016-11-15 17:49:49.998398] E [MSGID: 115066] [server-rpc-fops.c:1352:server_fsync_cbk] 0-d1-server: 1649: FSYNC 0 (8a7a8ccc-c7ad-467b-a31a-bd0624ebe184), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Too many open files] [2016-11-15 17:52:50.151625] E [MSGID: 115068] [server-rpc-fops.c:1442:server_readv_cbk] 0-d1-server: 1922: READV -2 (8a7a8ccc-c7ad-467b-a31a-bd0624ebe184), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Level 3 reset] [2016-11-15 17:54:32.968047] E [MSGID: 115070] [server-rpc-fops.c:1520:server_open_cbk] 0-d1-server: 2051: OPEN /b.txt (4fae6044-d912-462e-9945-2118ffb8e5fb), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Transport endpoint is not connected] [2016-11-15 18:00:54.058103] I [MSGID: 115071] [server-rpc-fops.c:1562:server_create_cbk] 0-d1-server: 2146: CREATE /c (00000000-0000-0000-0000-000000000001/c), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Stale file handle] [2016-11-15 18:01:46.593861] I [MSGID: 115062] [server-rpc-fops.c:1614:server_readlink_cbk] 0-d1-server: 2192: READLINK /b (7d426df8-c507-441c-b886-0f2944e02434), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [No such file or directory] [2016-11-15 18:02:45.994104] I [MSGID: 115072] [server-rpc-fops.c:1692:server_setattr_cbk] 0-d1-server: 2249: SETATTR /a (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Input/output error] [2016-11-15 18:28:56.729535] E [MSGID: 115053] [server-rpc-fops.c:316:server_finodelk_cbk] 0-r2-server: 82: FINODELK 0 (03da68c1-d612-4315-bce6-33be333835f3), client: dhcp35-190.lab.eng.blr.redhat.com-28126-2016/11/15-18:27:35:326952-r2-client-0-0-0, error-xlator: r2-locks [Cannot allocate memory] [2016-11-15 18:30:22.018119] E [MSGID: 115053] [server-rpc-fops.c:278:server_inodelk_cbk] 0-r2-server: 166: INODELK /a.txt (03da68c1-d612-4315-bce6-33be333835f3), client: dhcp35-190.lab.eng.blr.redhat.com-28126-2016/11/15-18:27:35:326952-r2-client-0-0-0, error-xlator: r2-locks [Cannot allocate memory] [2016-11-15 18:31:42.018708] E [MSGID: 115054] [server-rpc-fops.c:353:server_entrylk_cbk] 0-r2-server: 228: ENTRYLK / (00000000-0000-0000-0000-000000000001), client: dhcp35-190.lab.eng.blr.redhat.com-28126-2016/11/15-18:27:35:326952-r2-client-0-0-0, error-xlator: r2-locks [Cannot allocate memory] [2016-11-15 18:33:55.268172] E [MSGID: 115073] [server-rpc-fops.c:1771:server_xattrop_cbk] 0-r2-server: 128: XATTROP /a (ce82cf09-8869-4762-b34c-59fa0822373b), client: dhcp35-190.lab.eng.blr.redhat.com-28689-2016/11/15-18:32:59:379301-r2-client-0-0-0, error-xlator: r2-posix [Transport endpoint is not connected] [2016-11-15 18:35:26.500727] E [MSGID: 115073] [server-rpc-fops.c:1814:server_fxattrop_cbk] 0-r2-server: 283: FXATTROP 1 (ce82cf09-8869-4762-b34c-59fa0822373b), client: dhcp35-190.lab.eng.blr.redhat.com-28689-2016/11/15-18:32:59:379301-r2-client-0-0-0, error-xlator: r2-posix [Too many open files in system] [2016-11-15 18:41:05.990114] E [MSGID: 115074] [server-rpc-fops.c:1911:server_fallocate_cbk] 0-r2-server: 639: FALLOCATE 0 (7e532b5e-9175-4b00-a360-5283bd3d4e81), client: dhcp35-190.lab.eng.blr.redhat.com-29224-2016/11/15-18:37:58:466209-r2-client-0-0-0, error-xlator: - [Level 2 not synchronized] [2016-11-15 18:44:47.815369] I [MSGID: 115058] [server-rpc-fops.c:728:server_removexattr_cbk] 0-r2-server: 1165: REMOVEXATTR /a.txt (7e532b5e-9175-4b00-a360-5283bd3d4e81) of key trusted.abc, client: dhcp35-190.lab.eng.blr.redhat.com-29224-2016/11/15-18:37:58:466209-r2-client-0-0-0, error-xlator: r2-posix [Invalid request code] [2016-11-15 18:49:33.433095] E [MSGID: 115059] [server-rpc-fops.c:802:server_getxattr_cbk] 0-r2-server: 1628: GETXATTR /a.txt (7e532b5e-9175-4b00-a360-5283bd3d4e81) (trusted.abc), client: dhcp35-190.lab.eng.blr.redhat.com-29224-2016/11/15-18:37:58:466209-r2-client-1-0-0, error-xlator: r2-posix [File exists] --- Additional comment from Worker Ant on 2016-11-16 04:37:03 EST --- REVIEW: http://review.gluster.org/15837 (system/posix-acl: Log reason for EACCES) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-11-16 05:56:09 EST --- COMMIT: http://review.gluster.org/15836 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 83311c5bb735508274bb25cc610e929481577f9b Author: Pranith Kumar K <pkarampu> Date: Sat Nov 12 16:47:00 2016 +0530 protocol/server: Print error-xlator name Problem: At the moment from which xlator the errors are stemming from is a mystery. Fix: With this patch we can find on the server side which xlator first gave the errno received by server xlator. I am not yet sure how to get this done for client side which has lot of copy_frame()s. May be another patch. Change-Id: Ie13307b965facf2a496123e81ce0bd6756f98ac9 BUG: 1394548 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15836 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Vijay Bellur <vbellur> CentOS-regression: Gluster Build System <jenkins.org> --- Additional comment from Worker Ant on 2016-11-16 11:27:55 EST --- REVIEW: http://review.gluster.org/15837 (system/posix-acl: Log reason for EACCES) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-11-17 04:56:17 EST --- COMMIT: http://review.gluster.org/15837 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 35e8ecabd7db36431ea93f523095bc21078ecef8 Author: Pranith Kumar K <pkarampu> Date: Sun Nov 13 16:43:36 2016 +0530 system/posix-acl: Log reason for EACCES It is becoming increasingly difficult to debug the reason why posix-acl decides to fail a fop with EACCES. This patch prints a big log everytime such a condition occurs giving out the details that may help in finding why the fop is errored out. Change-Id: I2505baaafb5d77ef6c187554ff027df9b20468db BUG: 1394548 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15837 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra Talur <rtalur> --- Additional comment from Worker Ant on 2016-11-17 10:33:13 EST --- REVIEW: http://review.gluster.org/15872 (protocol/server: Print pargfid in logs for rename error) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/15886 (protocol/server: Print error-xlator name) posted (#1) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/15887 (protocol/server: Print pargfid in logs for rename error) posted (#2) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/15886 (protocol/server: Print error-xlator name) posted (#2) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/16058 (protocol/server: Remove unused variable) posted (#1) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/15887 committed in release-3.9 by Raghavendra Talur (rtalur) ------ commit 4b52890f52978231e0ccbcc2a4461683c7ca4b78 Author: Pranith Kumar K <pkarampu> Date: Thu Nov 17 21:02:29 2016 +0530 protocol/server: Print pargfid in logs for rename error Backport of http://review.gluster.org/15872 >BUG: 1394548 >Change-Id: I42ee627c8cdf54158f083f9019a096ace449e3cc >Signed-off-by: Pranith Kumar K <pkarampu> BUG: 1396780 Change-Id: I80c3d3c7994982b812857c6705b3bf4716bf0824 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15887 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra Talur <rtalur>
COMMIT: http://review.gluster.org/15886 committed in release-3.9 by Raghavendra Talur (rtalur) ------ commit 114537a2e11b3811dbf4f1ca0d4fd74d0c0b23c3 Author: Pranith Kumar K <pkarampu> Date: Sat Nov 12 16:47:00 2016 +0530 protocol/server: Print error-xlator name Problem: At the moment from which xlator the errors are stemming from is a mystery. Fix: With this patch we can find on the server side which xlator first gave the errno received by server xlator. I am not yet sure how to get this done for client side which has lot of copy_frame()s. May be another patch. >Change-Id: Ie13307b965facf2a496123e81ce0bd6756f98ac9 >BUG: 1394548 >Signed-off-by: Pranith Kumar K <pkarampu> >Reviewed-on: http://review.gluster.org/15836 >Smoke: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >Reviewed-by: Vijay Bellur <vbellur> >CentOS-regression: Gluster Build System <jenkins.org> BUG: 1396780 Change-Id: I3b293d21528da391eafd0fbaa5b451a1d3ddc237 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15886 CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra Talur <rtalur>
COMMIT: http://review.gluster.org/16058 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) ------ commit 670fe5633d6d80db380c2916613c457b4522c454 Author: Anoop C S <anoopcs> Date: Tue Nov 22 06:34:35 2016 +0000 protocol/server: Remove unused variable >Change-Id: I0d0a786b2d02d4db37c4da6194ee4b4feac31b63 >BUG: 1198849 >Signed-off-by: Anoop C S <anoopcs> >Reviewed-on: http://review.gluster.org/15899 >Smoke: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Reviewed-by: Vijay Bellur <vbellur> BUG: 1396780 Change-Id: I78cb34d0f91c2f375c2e4e413337253b34987baa Signed-off-by: Anoop C S <anoopcs> Reviewed-on: http://review.gluster.org/16058 Tested-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report. glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html [2] https://www.gluster.org/pipermail/gluster-users/