Bug 1656000 - file access problem with encrypt xlator in case one brick is down
Summary: file access problem with encrypt xlator in case one brick is down
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: encryption-xlator
Version: 5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-04 12:09 UTC by evangelos
Modified: 2019-05-18 06:34 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-05-18 06:34:04 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 21864 0 None Open [crypt]: file access problem with encrypt xlator in case one brick is down 2018-12-14 15:19:01 UTC

Description evangelos 2018-12-04 12:09:41 UTC
Description of problem:
setup: replica x3 no distribution using encrypt xlator.
if 3 bricks are up all clients are able to access all the files.
if one brick goes down, every client is able to access the files that created.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
setup: replica x3 no distribution using encrypt xlator.
if 3 bricks are up all clients are able to access all the files.
if one brick goes down, every client is able to access the files that created.

# ls -la /client_a_v10_1/filecreatedfromclient_a_2
ls: cannot access /client_a_v10_1/filecreatedfromclient_a_2: Invalid argument

[2018-11-28 16:48:19.095228] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup /
[2018-11-28 16:48:19.096445] D [MSGID: 0] [dht-common.c:1046:dht_revalidate_cbk] 0-oam_enc-dht: revalidate lookup of / returned with op_ret 0 [Structure needs cleaning]
[2018-11-28 16:48:19.096631] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup /filecreatedfromclient_a_2
[2018-11-28 16:48:19.096663] D [MSGID: 0] [dht-common.c:2743:dht_lookup] 0-oam_enc-dht: Calling fresh lookup for /filecreatedfromclient_a_2 on oam_enc-replicate-0
[2018-11-28 16:48:19.097397] D [MSGID: 0] [dht-common.c:2328:dht_lookup_cbk] 0-oam_enc-dht: fresh_lookup returned for /filecreatedfromclient_a_2 with op_ret 0
[2018-11-28 16:48:19.097429] D [MSGID: 0] [dht-layout.c:870:dht_layout_preset] 0-oam_enc-dht: file = 00000000-0000-0000-0000-000000000000, subvol = oam_enc-replicate-0
[2018-11-28 16:48:19.097482] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 0-oam_enc-replicate-0: 00000000-0000-0000-0000-000000000000: generation now vs cached: 2, 0
[2018-11-28 16:48:19.097992] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-oam_enc-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2018-11-28 16:48:19.098017] D [MSGID: 0] [client-rpc-fops.c:2936:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-client-0 returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.098188] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-oam_enc-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2018-11-28 16:48:19.098206] D [MSGID: 0] [client-rpc-fops.c:2936:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-client-1 returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.098231] D [MSGID: 0] [afr-inode-read.c:522:afr_getxattr_wind] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-replicate-0 returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.098246] D [MSGID: 0] [dht-common.c:3392:dht_getxattr_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-dht returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.098266] D [MSGID: 0] [crypt.c:3835:load_file_size] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-crypt returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.098279] D [MSGID: 0] [io-cache.c:267:ioc_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-io-cache returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.098287] D [MSGID: 0] [md-cache.c:1048:mdc_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-md-cache returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.098299] D [MSGID: 0] [io-stats.c:2186:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.098309] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/filecreatedfromclient_a_2: failed to resolve (Invalid argument)
[2018-11-28 16:48:19.098373] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup /filecreatedfromclient_a_2
[2018-11-28 16:48:19.098389] D [MSGID: 0] [dht-common.c:2743:dht_lookup] 0-oam_enc-dht: Calling fresh lookup for /filecreatedfromclient_a_2 on oam_enc-replicate-0
[2018-11-28 16:48:19.099080] D [MSGID: 0] [dht-common.c:2328:dht_lookup_cbk] 0-oam_enc-dht: fresh_lookup returned for /filecreatedfromclient_a_2 with op_ret 0
[2018-11-28 16:48:19.099099] D [MSGID: 0] [dht-layout.c:870:dht_layout_preset] 0-oam_enc-dht: file = 00000000-0000-0000-0000-000000000000, subvol = oam_enc-replicate-0
[2018-11-28 16:48:19.099126] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 0-oam_enc-replicate-0: 00000000-0000-0000-0000-000000000000: generation now vs cached: 2, 0
[2018-11-28 16:48:19.099690] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-oam_enc-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2018-11-28 16:48:19.099714] D [MSGID: 0] [client-rpc-fops.c:2936:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-client-0 returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.099735] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-oam_enc-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2018-11-28 16:48:19.099744] D [MSGID: 0] [client-rpc-fops.c:2936:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-client-1 returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.099780] D [MSGID: 0] [afr-inode-read.c:522:afr_getxattr_wind] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-replicate-0 returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.099788] D [MSGID: 0] [dht-common.c:3392:dht_getxattr_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-dht returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.099797] D [MSGID: 0] [crypt.c:3835:load_file_size] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-crypt returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.099805] D [MSGID: 0] [io-cache.c:267:ioc_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-io-cache returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.099812] D [MSGID: 0] [md-cache.c:1048:mdc_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-md-cache returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.099820] D [MSGID: 0] [io-stats.c:2186:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc returned -1 error: Invalid argument [Invalid argument]
[2018-11-28 16:48:19.099827] W [fuse-bridge.c:490:fuse_entry_cbk] 0-glusterfs-fuse: 104: LOOKUP() /filecreatedfromclient_a_2 => -1 (Invalid argument)



# ls -la /client_a_v10_2/filecreatedfromclient_a_2
-rw-r--r--. 1 root root 0 Nov 28 17:46 /client_a_v10_2/filecreatedfromclient_a_2


[2018-11-28 16:49:55.635933] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup /
[2018-11-28 16:49:55.637194] D [MSGID: 0] [dht-common.c:1046:dht_revalidate_cbk] 0-oam_enc-dht: revalidate lookup of / returned with op_ret 0 [Structure needs cleaning]
[2018-11-28 16:49:55.637349] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup /filecreatedfromclient_a_2
[2018-11-28 16:49:55.637394] D [MSGID: 0] [dht-common.c:2668:dht_lookup] 0-oam_enc-dht: calling revalidate lookup for /filecreatedfromclient_a_2 at oam_enc-replicate-0
[2018-11-28 16:49:55.638229] D [logging.c:1952:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk
[2018-11-28 16:49:55.638167] D [MSGID: 0] [dht-common.c:1046:dht_revalidate_cbk] 0-oam_enc-dht: revalidate lookup of /filecreatedfromclient_a_2 returned with op_ret 0
[2018-11-28 16:49:55.638228] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 0-oam_enc-replicate-0: e94dc9b7-f25b-4dbf-ba4b-ec20ec3e03b9: generation now vs cached: 2, 0
[2018-11-28 16:49:55.639554] D [crypt.c:3802:load_file_size] 0-oam_enc-crypt: FOP 27: Translate regular file to 0
[2018-11-28 16:49:57.246874] D [name.c:168:client_fill_address_family] 0-oam_enc-client-2: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: host9314567f944f432ca6f842fd107-storage-node-1.storage-server.net)
[2018-11-28 16:49:57.251435] D [MSGID: 0] [common-utils.c:334:gf_resolve_ip6] 0-resolver: returning ip-172.24.26.104 (port-24007) for hostname: host9314567f944f432ca6f842fd107-storage-node-1.storage-server.net and port: 24007
[2018-11-28 16:49:57.251463] D [socket.c:2994:socket_fix_ssl_opts] 0-oam_enc-client-2: disabling SSL for portmapper connection
[2018-11-28 16:49:57.251633] D [socket.c:686:__socket_shutdown] 0-oam_enc-client-2: shutdown() returned -1. Transport endpoint is not connected
[2018-11-28 16:49:57.251646] D [socket.c:733:__socket_disconnect] 0-oam_enc-client-2: __socket_teardown_connection () failed: Transport endpoint is not connected
[2018-11-28 16:49:57.251651] D [socket.c:2426:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2018-11-28 16:49:57.251660] D [MSGID: 0] [client.c:2260:client_rpc_notify] 0-oam_enc-client-2: got RPC_CLNT_DISCONNECT
[2018-11-28 16:49:57.252006] D [rpc-clnt-ping.c:93:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcce6b7cfab] (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fcce6948d0b] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fcce694513f] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fcce6945c20] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fcce6941a23] ))))) 0-: 172.24.26.104:24007: ping timer event already removed
[2018-11-28 16:49:57.252028] D [MSGID: 0] [event-epoll.c:588:event_dispatch_epoll_handler] 0-epoll: generation bumped on idx=2 from gen=292 to slot->gen=293, fd=6, slot->fd=6


From what i understood the problem is that first lookup is triggered with filename. Then nameless lookup is triggered with gfid and then, finaly, getxattr for trusted.glusterfs.crypt.att.size.
In case the file is created from own client last lookup includes the gfid but if the file is created from different client, although gfid is returned from the first lookup from server, it is not included in the nameless lookup.
This results error from server which of course causes failure in the whole operation. 

From what i have seen, the problem appears to be in the crypt xlator during client3_3_lookup/crypt_lookup_cbk.
There valid gfid i included in the buf->ia_gfid but not in inode. 

   local->inode = inode_ref(inode);
   local->buf = *buf;
   local->postbuf = *postparent;
   if (xdata)
       local->xdata = dict_ref(xdata);
   gf_uuid_copy(local->loc->gfid, buf->ia_gfid);

logs are from 3.7 but the same issue happens at version 5

Comment 1 Worker Ant 2018-12-14 15:19:01 UTC
REVIEW: https://review.gluster.org/21864 ([crypt]: file access problem with encrypt xlator in case one brick is down) posted (#1) for review on master by Vangelis Polakis

Comment 2 Amar Tumballi 2019-05-18 06:34:04 UTC
The patch is accepted at https://github.com/gluster/glusterfs-xlators/pull/1


Note You need to log in before you can comment on or make changes to this bug.