Created attachment 1022150 [details] trace-level log filtered by directory name, with some additional info Description of problem: at my testing environment, one DHT glusterfs volume is created on one server , with only one brick. 4 clients mount the same volume, app on the client do the same thing in parallel: 1. mkdir the same directory 2. create different files in above directory. app runs with none-root user. the problem is that sometimes the ownership will be "root" after mkdir, which in turn failed the app. it is not 100% reproducable. trace-level log is attached (filtered by grepping the directory name). from the log, it seems the problem is possible caused by "setattr" in parallel. the same problem also happened in our product environment (glusterfs, afr, 3 replicas) by the same application, but not sure if it is the same cause because with log level is set to info, i can only see the same setatt failure info (operation not permitted). Version-Release number of selected component (if applicable): 3.5.2 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
it is very similar to Bug 1196033, but i think there are some differences in the reproduce steps. Also, i see the bug fixing only related to dht self-heal, but in my afr glusterfs, the problem also exists.
After this problem was reproduced with both client and server side hasing TRACE log level turned on , I got more information. From client side, serveral clients tried to mkdir the same directory in parallel: 1. the first client succeed to call mkdir, and the other clients can see the new directory by lookup. 2. then all clients begin to call setattr in parallel, the mkdir client called setattr by dht_mkdir_hased_cbk, the lookup client called setattr by dht_lookup_dir_cbk. In dht_lookup_dir_cbk, frame->root.uid/gid is promote to 0, that is root user. 3. but sometimes, the lookup client's setattr could be called before mkdir client's setattr. in turn, when mkdir client calls setattr, will fail by EPERM error, and let the mkdired directory kept with root uid/gid. log messages: [2015-05-13 11:05:26.556457] T [dht-layout.c:372:dht_layout_merge] 0-dcn_u01_cnc_vol-dht: missing disk layout on dcn_u01_cnc_vol-client-0. err = -1 [2015-05-13 11:05:26.556467] I [dht-layout.c:640:dht_layout_normalize] 0-dcn_u01_cnc_vol-dht: found anomalies in /cps/20160629/S600009UnmatchAuthJournal.yak. holes=1 overlaps=0 [2015-05-13 11:05:26.556474] D [dht-common.c:496:dht_lookup_dir_cbk] 0-dcn_u01_cnc_vol-dht: fixing assignment on /cps/20160629/S600009UnmatchAuthJournal.yak [2015-05-13 11:05:26.556484] T [dht-hashfn.c:97:dht_hash_compute] 0-dcn_u01_cnc_vol-dht: trying regex for /cps/20160629/S600009UnmatchAuthJournal.yak [2015-05-13 11:05:26.556492] T [dht-selfheal.c:806:dht_selfheal_layout_new_directory] 0-dcn_u01_cnc_vol-dht: gave fix: 0 - 4294967294 on dcn_u01_cnc_vol-client-0 for /cps/20160629/S600009UnmatchAuthJournal.yak [2015-05-13 11:05:26.556502] T [dht-selfheal.c:356:dht_selfheal_dir_setattr] 0-dcn_u01_cnc_vol-dht: setattr for /cps/20160629/S600009UnmatchAuthJournal.yak on subvol dcn_u01_cnc_vol-client-0 [2015-05-13 11:05:26.556513] T [rpc-clnt.c:1356:rpc_clnt_record] 0-dcn_u01_cnc_vol-client-0: Auth Info: pid: 1, uid: 0, gid: 0, owner: [2015-05-13 11:05:26.556522] T [rpc-clnt.c:1212:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 192, payload: 124, rpc hdr: 68 [2015-05-13 11:05:26.556546] T [rpc-clnt.c:1553:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x208a5e Program: GlusterFS 3.3, ProgVers: 330, Proc: 38) to rpc-transport (dcn_u01_cnc_vol-client-0) [2015-05-13 11:05:26.556566] T [rpc-clnt.c:671:rpc_clnt_reply_init] 0-dcn_u01_cnc_vol-client-0: received rpc message (RPC XID: 0x208a47 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (dcn_u01_cnc_vol-client-0) [2015-05-13 11:05:26.556578] T [dht-layout.c:372:dht_layout_merge] 0-dcn_u01_cnc_vol-dht: missing disk layout on dcn_u01_cnc_vol-client-0. err = -1 [2015-05-13 11:05:26.556596] I [dht-layout.c:640:dht_layout_normalize] 0-dcn_u01_cnc_vol-dht: found anomalies in /cps/20160629/S600009UnmatchAuthJournal.yak. holes=1 overlaps=0 [2015-05-13 11:05:26.556605] D [dht-common.c:496:dht_lookup_dir_cbk] 0-dcn_u01_cnc_vol-dht: fixing assignment on /cps/20160629/S600009UnmatchAuthJournal.yak [2015-05-13 11:05:26.556612] T [dht-hashfn.c:97:dht_hash_compute] 0-dcn_u01_cnc_vol-dht: trying regex for /cps/20160629/S600009UnmatchAuthJournal.yak [2015-05-13 11:05:26.556621] T [dht-selfheal.c:806:dht_selfheal_layout_new_directory] 0-dcn_u01_cnc_vol-dht: gave fix: 0 - 4294967294 on dcn_u01_cnc_vol-client-0 for /cps/20160629/S600009UnmatchAuthJournal.yak [2015-05-13 11:05:26.556629] T [dht-selfheal.c:356:dht_selfheal_dir_setattr] 0-dcn_u01_cnc_vol-dht: setattr for /cps/20160629/S600009UnmatchAuthJournal.yak on subvol dcn_u01_cnc_vol-client-0 [2015-05-13 11:05:26.556640] T [rpc-clnt.c:1356:rpc_clnt_record] 0-dcn_u01_cnc_vol-client-0: Auth Info: pid: 1, uid: 0, gid: 0, owner: [2015-05-13 11:05:26.556648] T [rpc-clnt.c:1212:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 192, payload: 124, rpc hdr: 68 [2015-05-13 11:05:26.556666] T [rpc-clnt.c:1553:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x208a5f Program: GlusterFS 3.3, ProgVers: 330, Proc: 38) to rpc-transport (dcn_u01_cnc_vol-client-0) [2015-05-13 11:05:26.556697] T [rpcsvc.c:599:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 730 [2015-05-13 11:05:26.558756] T [dht-layout.c:372:dht_layout_merge] 0-dcn_u01_cnc_vol-dht: missing disk layout on dcn_u01_cnc_vol-client-0. err = -1 [2015-05-13 11:05:26.558767] T [dht-hashfn.c:97:dht_hash_compute] 0-dcn_u01_cnc_vol-dht: trying regex for /cps/20160629/S600009UnmatchAuthJournal.yak [2015-05-13 11:05:26.558777] T [dht-selfheal.c:806:dht_selfheal_layout_new_directory] 0-dcn_u01_cnc_vol-dht: gave fix: 0 - 4294967294 on dcn_u01_cnc_vol-client-0 for /cps/20160629/S600009UnmatchAuthJournal.yak [2015-05-13 11:05:26.558785] T [dht-selfheal.c:356:dht_selfheal_dir_setattr] 0-dcn_u01_cnc_vol-dht: setattr for /cps/20160629/S600009UnmatchAuthJournal.yak on subvol dcn_u01_cnc_vol-client-0 [2015-05-13 11:05:26.558809] T [rpc-clnt.c:1356:rpc_clnt_record] 0-dcn_u01_cnc_vol-client-0: Auth Info: pid: 1, uid: 6001, gid: 6000, owner: [2015-05-13 11:05:26.558818] T [rpc-clnt.c:1212:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 192, payload: 124, rpc hdr: 68 [2015-05-13 11:05:26.558836] T [rpc-clnt.c:1553:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x208a6a Program: GlusterFS 3.3, ProgVers: 330, Proc: 38) to rpc-transport (dcn_u01_cnc_vol-client-0) [2015-05-13 11:05:26.558854] T [rpcsvc.c:599:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 730 Because dht is stacked above afr layer, this problem can also be happened in afr volume.
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.