Description of problem: setup: replica x3 no distribution using encrypt xlator. if 3 bricks are up all clients are able to access all the files. if one brick goes down, every client is able to access the files that created. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: setup: replica x3 no distribution using encrypt xlator. if 3 bricks are up all clients are able to access all the files. if one brick goes down, every client is able to access the files that created. # ls -la /client_a_v10_1/filecreatedfromclient_a_2 ls: cannot access /client_a_v10_1/filecreatedfromclient_a_2: Invalid argument [2018-11-28 16:48:19.095228] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup / [2018-11-28 16:48:19.096445] D [MSGID: 0] [dht-common.c:1046:dht_revalidate_cbk] 0-oam_enc-dht: revalidate lookup of / returned with op_ret 0 [Structure needs cleaning] [2018-11-28 16:48:19.096631] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup /filecreatedfromclient_a_2 [2018-11-28 16:48:19.096663] D [MSGID: 0] [dht-common.c:2743:dht_lookup] 0-oam_enc-dht: Calling fresh lookup for /filecreatedfromclient_a_2 on oam_enc-replicate-0 [2018-11-28 16:48:19.097397] D [MSGID: 0] [dht-common.c:2328:dht_lookup_cbk] 0-oam_enc-dht: fresh_lookup returned for /filecreatedfromclient_a_2 with op_ret 0 [2018-11-28 16:48:19.097429] D [MSGID: 0] [dht-layout.c:870:dht_layout_preset] 0-oam_enc-dht: file = 00000000-0000-0000-0000-000000000000, subvol = oam_enc-replicate-0 [2018-11-28 16:48:19.097482] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 0-oam_enc-replicate-0: 00000000-0000-0000-0000-000000000000: generation now vs cached: 2, 0 [2018-11-28 16:48:19.097992] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-oam_enc-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2018-11-28 16:48:19.098017] D [MSGID: 0] [client-rpc-fops.c:2936:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-client-0 returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.098188] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-oam_enc-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2018-11-28 16:48:19.098206] D [MSGID: 0] [client-rpc-fops.c:2936:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-client-1 returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.098231] D [MSGID: 0] [afr-inode-read.c:522:afr_getxattr_wind] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-replicate-0 returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.098246] D [MSGID: 0] [dht-common.c:3392:dht_getxattr_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-dht returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.098266] D [MSGID: 0] [crypt.c:3835:load_file_size] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-crypt returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.098279] D [MSGID: 0] [io-cache.c:267:ioc_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-io-cache returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.098287] D [MSGID: 0] [md-cache.c:1048:mdc_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc-md-cache returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.098299] D [MSGID: 0] [io-stats.c:2186:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7ff89800f900, oam_enc returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.098309] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/filecreatedfromclient_a_2: failed to resolve (Invalid argument) [2018-11-28 16:48:19.098373] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup /filecreatedfromclient_a_2 [2018-11-28 16:48:19.098389] D [MSGID: 0] [dht-common.c:2743:dht_lookup] 0-oam_enc-dht: Calling fresh lookup for /filecreatedfromclient_a_2 on oam_enc-replicate-0 [2018-11-28 16:48:19.099080] D [MSGID: 0] [dht-common.c:2328:dht_lookup_cbk] 0-oam_enc-dht: fresh_lookup returned for /filecreatedfromclient_a_2 with op_ret 0 [2018-11-28 16:48:19.099099] D [MSGID: 0] [dht-layout.c:870:dht_layout_preset] 0-oam_enc-dht: file = 00000000-0000-0000-0000-000000000000, subvol = oam_enc-replicate-0 [2018-11-28 16:48:19.099126] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 0-oam_enc-replicate-0: 00000000-0000-0000-0000-000000000000: generation now vs cached: 2, 0 [2018-11-28 16:48:19.099690] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-oam_enc-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2018-11-28 16:48:19.099714] D [MSGID: 0] [client-rpc-fops.c:2936:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-client-0 returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.099735] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-oam_enc-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2018-11-28 16:48:19.099744] D [MSGID: 0] [client-rpc-fops.c:2936:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-client-1 returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.099780] D [MSGID: 0] [afr-inode-read.c:522:afr_getxattr_wind] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-replicate-0 returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.099788] D [MSGID: 0] [dht-common.c:3392:dht_getxattr_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-dht returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.099797] D [MSGID: 0] [crypt.c:3835:load_file_size] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-crypt returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.099805] D [MSGID: 0] [io-cache.c:267:ioc_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-io-cache returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.099812] D [MSGID: 0] [md-cache.c:1048:mdc_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc-md-cache returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.099820] D [MSGID: 0] [io-stats.c:2186:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7ff894003450, oam_enc returned -1 error: Invalid argument [Invalid argument] [2018-11-28 16:48:19.099827] W [fuse-bridge.c:490:fuse_entry_cbk] 0-glusterfs-fuse: 104: LOOKUP() /filecreatedfromclient_a_2 => -1 (Invalid argument) # ls -la /client_a_v10_2/filecreatedfromclient_a_2 -rw-r--r--. 1 root root 0 Nov 28 17:46 /client_a_v10_2/filecreatedfromclient_a_2 [2018-11-28 16:49:55.635933] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup / [2018-11-28 16:49:55.637194] D [MSGID: 0] [dht-common.c:1046:dht_revalidate_cbk] 0-oam_enc-dht: revalidate lookup of / returned with op_ret 0 [Structure needs cleaning] [2018-11-28 16:49:55.637349] D [crypt.c:4059:crypt_lookup] 0-oam_enc-crypt: Lookup /filecreatedfromclient_a_2 [2018-11-28 16:49:55.637394] D [MSGID: 0] [dht-common.c:2668:dht_lookup] 0-oam_enc-dht: calling revalidate lookup for /filecreatedfromclient_a_2 at oam_enc-replicate-0 [2018-11-28 16:49:55.638229] D [logging.c:1952:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk [2018-11-28 16:49:55.638167] D [MSGID: 0] [dht-common.c:1046:dht_revalidate_cbk] 0-oam_enc-dht: revalidate lookup of /filecreatedfromclient_a_2 returned with op_ret 0 [2018-11-28 16:49:55.638228] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 0-oam_enc-replicate-0: e94dc9b7-f25b-4dbf-ba4b-ec20ec3e03b9: generation now vs cached: 2, 0 [2018-11-28 16:49:55.639554] D [crypt.c:3802:load_file_size] 0-oam_enc-crypt: FOP 27: Translate regular file to 0 [2018-11-28 16:49:57.246874] D [name.c:168:client_fill_address_family] 0-oam_enc-client-2: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: host9314567f944f432ca6f842fd107-storage-node-1.storage-server.net) [2018-11-28 16:49:57.251435] D [MSGID: 0] [common-utils.c:334:gf_resolve_ip6] 0-resolver: returning ip-172.24.26.104 (port-24007) for hostname: host9314567f944f432ca6f842fd107-storage-node-1.storage-server.net and port: 24007 [2018-11-28 16:49:57.251463] D [socket.c:2994:socket_fix_ssl_opts] 0-oam_enc-client-2: disabling SSL for portmapper connection [2018-11-28 16:49:57.251633] D [socket.c:686:__socket_shutdown] 0-oam_enc-client-2: shutdown() returned -1. Transport endpoint is not connected [2018-11-28 16:49:57.251646] D [socket.c:733:__socket_disconnect] 0-oam_enc-client-2: __socket_teardown_connection () failed: Transport endpoint is not connected [2018-11-28 16:49:57.251651] D [socket.c:2426:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-11-28 16:49:57.251660] D [MSGID: 0] [client.c:2260:client_rpc_notify] 0-oam_enc-client-2: got RPC_CLNT_DISCONNECT [2018-11-28 16:49:57.252006] D [rpc-clnt-ping.c:93:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcce6b7cfab] (--> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7fcce6948d0b] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x5f)[0x7fcce694513f] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7fcce6945c20] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fcce6941a23] ))))) 0-: 172.24.26.104:24007: ping timer event already removed [2018-11-28 16:49:57.252028] D [MSGID: 0] [event-epoll.c:588:event_dispatch_epoll_handler] 0-epoll: generation bumped on idx=2 from gen=292 to slot->gen=293, fd=6, slot->fd=6 From what i understood the problem is that first lookup is triggered with filename. Then nameless lookup is triggered with gfid and then, finaly, getxattr for trusted.glusterfs.crypt.att.size. In case the file is created from own client last lookup includes the gfid but if the file is created from different client, although gfid is returned from the first lookup from server, it is not included in the nameless lookup. This results error from server which of course causes failure in the whole operation. From what i have seen, the problem appears to be in the crypt xlator during client3_3_lookup/crypt_lookup_cbk. There valid gfid i included in the buf->ia_gfid but not in inode. local->inode = inode_ref(inode); local->buf = *buf; local->postbuf = *postparent; if (xdata) local->xdata = dict_ref(xdata); gf_uuid_copy(local->loc->gfid, buf->ia_gfid); logs are from 3.7 but the same issue happens at version 5
REVIEW: https://review.gluster.org/21864 ([crypt]: file access problem with encrypt xlator in case one brick is down) posted (#1) for review on master by Vangelis Polakis
The patch is accepted at https://github.com/gluster/glusterfs-xlators/pull/1