Bug 1394548 - Make debugging EACCES errors easier to debug
Summary: Make debugging EACCES errors easier to debug
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: posix-acl
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1396780
TreeView+ depends on / blocked
 
Reported: 2016-11-13 11:43 UTC by Pranith Kumar K
Modified: 2017-03-06 17:33 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1396780 (view as bug list)
Environment:
Last Closed: 2017-03-06 17:33:37 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2016-11-13 11:43:12 UTC
Description of problem:
Whenever we see some error on the brick it is difficult to find which xlator is giving this error and why. I am raising this bug to track a recent issue where I have to deal with figuring out why posix-acl is giving EACCES. Not completely sure if posix-acl is indeed the xlator that is giving this error either.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Pranith Kumar K 2016-11-13 11:44:04 UTC
I will improve/add enough logs to make debugging such issues easier.

Comment 2 Worker Ant 2016-11-13 11:45:39 UTC
REVIEW: http://review.gluster.org/15836 (protocol/server: Print error-xlator name) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Worker Ant 2016-11-13 11:45:42 UTC
REVIEW: http://review.gluster.org/15837 (system/posix: Log reason for EACCES) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Pranith Kumar K 2016-11-13 11:48:14 UTC
This is how it will look-like:

Server xlator prints the xlator which gave the error:
[2016-11-13 11:37:04.240437] E [MSGID: 115050] [server-rpc-fops.c:155:server_lookup_cbk] 0-d1-server: 20: LOOKUP /.Trash-1000 (00000000-0000-0000-0000-000000000001/.Trash-1000), error-xlator: d1-access-control [Permission denied]

Posix-acl logs the following error:
[2016-11-13 11:37:04.240723] E [MSGID: 139001] [posix-acl.c:245:posix_acl_log_permit_denied] 0-d1-access-control: client: dhcp35-190.lab.eng.blr.redhat.com-20476-2016/11/13-11:37:00:218336-d1-client-0-0-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:1000,gid:1000,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:755,updated-fop:LOOKUP, acl:-) [Permission denied]

Comment 5 Worker Ant 2016-11-15 18:50:41 UTC
REVIEW: http://review.gluster.org/15836 (protocol/server: Print error-xlator name) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Pranith Kumar K 2016-11-15 18:51:15 UTC
Tests done:
[2016-11-15 16:15:55.276972] W [MSGID: 115049] [server-rpc-fops.c:61:server_statfs_cbk] 0-d1-server: 30: STATFS, client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Operation not permitted]

[2016-11-15 16:19:32.702160] E [MSGID: 115050] [server-rpc-fops.c:168:server_lookup_cbk] 0-d1-server: 33: LOOKUP / (00000000-0000-0000-0000-000000000001), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable]

[2016-11-15 16:22:14.103250] E [MSGID: 115056] [server-rpc-fops.c:512:server_mkdir_cbk] 0-d1-server: 82: MKDIR /d1 (00000000-0000-0000-0000-000000000001/d1) client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Permission denied]

[2016-11-15 16:24:07.672258] I [MSGID: 115056] [server-rpc-fops.c:467:server_rmdir_cbk] 0-d1-server: 415: RMDIR /d1 (00000000-0000-0000-0000-000000000001/d1), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Operation not permitted]

[2016-11-15 16:26:22.560947] E [MSGID: 115056] [server-rpc-fops.c:1860:server_readdirp_cbk] 0-d1-server: 459: READDIRP 0 (00000000-0000-0000-0000-000000000001), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Transport endpoint is not connected]

[2016-11-15 16:27:22.081672] E [MSGID: 115056] [server-rpc-fops.c:679:server_opendir_cbk] 0-d1-server: 501: OPENDIR / (00000000-0000-0000-0000-000000000001), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable]

[2016-11-15 16:29:53.057811] I [MSGID: 115060] [server-rpc-fops.c:914:server_setxattr_cbk] 0-d1-server: client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Numerical result out of range]

[2016-11-15 17:28:32.951020] I [MSGID: 115061] [server-rpc-fops.c:1024:server_rename_cbk] 0-d1-server: 867: RENAME /a (00000000-0000-0000-0000-000000000000/a) -> /b (00000000-0000-0000-0000-000000000000/b), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Operation not permitted]

[2016-11-15 17:31:16.111967] E [MSGID: 115062] [server-rpc-fops.c:1066:server_unlink_cbk] 0-d1-server: 954: UNLINK /b (00000000-0000-0000-0000-000000000001/b), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable]

[2016-11-15 17:32:22.888055] I [MSGID: 115062] [server-rpc-fops.c:1111:server_symlink_cbk] 0-d1-server: 1122: SYMLINK /c (00000000-0000-0000-0000-000000000001/c), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator:d1-posix [Permission denied]

[2016-11-15 17:33:36.232374] I [MSGID: 115062] [server-rpc-fops.c:1158:server_link_cbk] 0-d1-server: 1224: LINK /b (7d426df8-c507-441c-b886-0f2944e02434) -> 00000000-0000-0000-0000-000000000001/c, client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable]

[2016-11-15 17:35:02.425027] E [MSGID: 115067] [server-rpc-fops.c:1391:server_writev_cbk] 0-d1-server: 1275: WRITEV 0 (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable]

[2016-11-15 17:35:02.425027] E [MSGID: 115067] [server-rpc-fops.c:1391:server_writev_cbk] 0-d1-server: 1275: WRITEV 0 (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable]

[2016-11-15 17:45:02.357029] I [MSGID: 115063] [server-rpc-fops.c:1198:server_truncate_cbk] 0-d1-server: 1403: TRUNCATE /a (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Transport endpoint is not connected]

[2016-11-15 17:45:59.293328] E [MSGID: 115063] [server-rpc-fops.c:1276:server_ftruncate_cbk] 0-d1-server: 1446: FTRUNCATE 0 (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Resource temporarily unavailable]

[2016-11-15 17:47:05.029708] E [MSGID: 115065] [server-rpc-fops.c:1315:server_flush_cbk] 0-d1-server: 1493: FLUSH 0 (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Transport endpoint is not connected]

[2016-11-15 17:49:49.998398] E [MSGID: 115066] [server-rpc-fops.c:1352:server_fsync_cbk] 0-d1-server: 1649: FSYNC 0 (8a7a8ccc-c7ad-467b-a31a-bd0624ebe184), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Too many open files]

[2016-11-15 17:52:50.151625] E [MSGID: 115068] [server-rpc-fops.c:1442:server_readv_cbk] 0-d1-server: 1922: READV -2 (8a7a8ccc-c7ad-467b-a31a-bd0624ebe184), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Level 3 reset]

[2016-11-15 17:54:32.968047] E [MSGID: 115070] [server-rpc-fops.c:1520:server_open_cbk] 0-d1-server: 2051: OPEN /b.txt (4fae6044-d912-462e-9945-2118ffb8e5fb), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Transport endpoint is not connected]

[2016-11-15 18:00:54.058103] I [MSGID: 115071] [server-rpc-fops.c:1562:server_create_cbk] 0-d1-server: 2146: CREATE /c (00000000-0000-0000-0000-000000000001/c), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Stale file handle]

[2016-11-15 18:01:46.593861] I [MSGID: 115062] [server-rpc-fops.c:1614:server_readlink_cbk] 0-d1-server: 2192: READLINK /b (7d426df8-c507-441c-b886-0f2944e02434), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [No such file or directory]

[2016-11-15 18:02:45.994104] I [MSGID: 115072] [server-rpc-fops.c:1692:server_setattr_cbk] 0-d1-server: 2249: SETATTR /a (e844b0ce-d9fc-47de-a28a-a2c49055e469), client: dhcp35-190.lab.eng.blr.redhat.com-15882-2016/11/15-16:14:37:688661-d1-client-0-0-0, error-xlator: d1-posix [Input/output error]

[2016-11-15 18:28:56.729535] E [MSGID: 115053] [server-rpc-fops.c:316:server_finodelk_cbk] 0-r2-server: 82: FINODELK 0 (03da68c1-d612-4315-bce6-33be333835f3), client: dhcp35-190.lab.eng.blr.redhat.com-28126-2016/11/15-18:27:35:326952-r2-client-0-0-0, error-xlator: r2-locks [Cannot allocate memory]

[2016-11-15 18:30:22.018119] E [MSGID: 115053] [server-rpc-fops.c:278:server_inodelk_cbk] 0-r2-server: 166: INODELK /a.txt (03da68c1-d612-4315-bce6-33be333835f3), client: dhcp35-190.lab.eng.blr.redhat.com-28126-2016/11/15-18:27:35:326952-r2-client-0-0-0, error-xlator: r2-locks [Cannot allocate memory]

[2016-11-15 18:31:42.018708] E [MSGID: 115054] [server-rpc-fops.c:353:server_entrylk_cbk] 0-r2-server: 228: ENTRYLK / (00000000-0000-0000-0000-000000000001), client: dhcp35-190.lab.eng.blr.redhat.com-28126-2016/11/15-18:27:35:326952-r2-client-0-0-0, error-xlator: r2-locks [Cannot allocate memory]

[2016-11-15 18:33:55.268172] E [MSGID: 115073] [server-rpc-fops.c:1771:server_xattrop_cbk] 0-r2-server: 128: XATTROP /a (ce82cf09-8869-4762-b34c-59fa0822373b), client: dhcp35-190.lab.eng.blr.redhat.com-28689-2016/11/15-18:32:59:379301-r2-client-0-0-0, error-xlator: r2-posix [Transport endpoint is not connected]

[2016-11-15 18:35:26.500727] E [MSGID: 115073] [server-rpc-fops.c:1814:server_fxattrop_cbk] 0-r2-server: 283: FXATTROP 1 (ce82cf09-8869-4762-b34c-59fa0822373b), client: dhcp35-190.lab.eng.blr.redhat.com-28689-2016/11/15-18:32:59:379301-r2-client-0-0-0, error-xlator: r2-posix [Too many open files in system]

[2016-11-15 18:41:05.990114] E [MSGID: 115074] [server-rpc-fops.c:1911:server_fallocate_cbk] 0-r2-server: 639: FALLOCATE 0 (7e532b5e-9175-4b00-a360-5283bd3d4e81), client: dhcp35-190.lab.eng.blr.redhat.com-29224-2016/11/15-18:37:58:466209-r2-client-0-0-0, error-xlator: - [Level 2 not synchronized]

[2016-11-15 18:44:47.815369] I [MSGID: 115058] [server-rpc-fops.c:728:server_removexattr_cbk] 0-r2-server: 1165: REMOVEXATTR /a.txt (7e532b5e-9175-4b00-a360-5283bd3d4e81) of key trusted.abc, client: dhcp35-190.lab.eng.blr.redhat.com-29224-2016/11/15-18:37:58:466209-r2-client-0-0-0, error-xlator: r2-posix [Invalid request code]

[2016-11-15 18:49:33.433095] E [MSGID: 115059] [server-rpc-fops.c:802:server_getxattr_cbk] 0-r2-server: 1628: GETXATTR /a.txt (7e532b5e-9175-4b00-a360-5283bd3d4e81) (trusted.abc), client: dhcp35-190.lab.eng.blr.redhat.com-29224-2016/11/15-18:37:58:466209-r2-client-1-0-0, error-xlator: r2-posix [File exists]

Comment 7 Worker Ant 2016-11-16 09:37:03 UTC
REVIEW: http://review.gluster.org/15837 (system/posix-acl: Log reason for EACCES) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 8 Worker Ant 2016-11-16 10:56:09 UTC
COMMIT: http://review.gluster.org/15836 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 83311c5bb735508274bb25cc610e929481577f9b
Author: Pranith Kumar K <pkarampu>
Date:   Sat Nov 12 16:47:00 2016 +0530

    protocol/server: Print error-xlator name
    
    Problem:
    At the moment from which xlator the errors are stemming from is a mystery.
    
    Fix:
    With this patch we can find on the server side which xlator first gave
    the errno received by server xlator. I am not yet sure how to get this
    done for client side which has lot of copy_frame()s. May be another
    patch.
    
    Change-Id: Ie13307b965facf2a496123e81ce0bd6756f98ac9
    BUG: 1394548
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/15836
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Vijay Bellur <vbellur>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 9 Worker Ant 2016-11-16 16:27:55 UTC
REVIEW: http://review.gluster.org/15837 (system/posix-acl: Log reason for EACCES) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 10 Worker Ant 2016-11-17 09:56:17 UTC
COMMIT: http://review.gluster.org/15837 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 35e8ecabd7db36431ea93f523095bc21078ecef8
Author: Pranith Kumar K <pkarampu>
Date:   Sun Nov 13 16:43:36 2016 +0530

    system/posix-acl: Log reason for EACCES
    
    It is becoming increasingly difficult to debug the reason why posix-acl decides
    to fail a fop with EACCES. This patch prints a big log everytime such
    a condition occurs giving out the details that may help in finding why the fop
    is errored out.
    
    Change-Id: I2505baaafb5d77ef6c187554ff027df9b20468db
    BUG: 1394548
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/15837
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra Talur <rtalur>

Comment 11 Worker Ant 2016-11-17 15:33:13 UTC
REVIEW: http://review.gluster.org/15872 (protocol/server: Print pargfid in logs for rename error) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 12 Worker Ant 2016-11-20 05:52:34 UTC
REVIEW: http://review.gluster.org/15887 (protocol/server: Print pargfid in logs for rename error) posted (#1) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)

Comment 13 Worker Ant 2016-11-28 05:40:21 UTC
COMMIT: http://review.gluster.org/15872 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 2214501d6045b45dde469c5a476c86b6cf464c34
Author: Pranith Kumar K <pkarampu>
Date:   Thu Nov 17 21:02:29 2016 +0530

    protocol/server: Print pargfid in logs for rename error
    
    BUG: 1394548
    Change-Id: I42ee627c8cdf54158f083f9019a096ace449e3cc
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/15872
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Ravishankar N <ravishankar>

Comment 14 Shyamsundar 2017-03-06 17:33:37 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.