Bug 1218587 - directory ownered by root when directories are created in parallel on serveral different mounts
Summary: directory ownered by root when directories are created in parallel on servera...
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 3.5.2
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-05 10:19 UTC by huangwei
Modified: 2016-06-17 15:58 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-06-17 15:58:30 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
trace-level log filtered by directory name, with some additional info (9.68 KB, text/plain)
2015-05-05 10:19 UTC, huangwei
no flags Details

Description huangwei 2015-05-05 10:19:26 UTC
Created attachment 1022150 [details]
trace-level log filtered by directory name, with some additional info

Description of problem:

at my  testing environment, one DHT glusterfs volume is created on one server , with only one brick. 4 clients mount the same volume, app on the client do the same thing in parallel:
1. mkdir the same directory
2. create different files in above directory.

app runs with none-root user. the problem is that sometimes the ownership will be "root" after mkdir, which in turn failed the app. 

it is not 100% reproducable. trace-level log is attached (filtered by grepping the directory name). from the log, it seems the problem is possible caused by "setattr" in parallel.

the same problem also happened in our product environment  (glusterfs, afr, 3 replicas) by the same application, but not sure if it is the same cause because with log level is set to info, i can only see the same setatt failure info (operation not permitted).


Version-Release number of selected component (if applicable):
3.5.2


How reproducible:



Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 huangwei 2015-05-05 10:23:30 UTC
it is very similar to Bug 1196033, but i think there are some differences in the reproduce steps. Also, i see the bug fixing only related to dht self-heal, but in my afr glusterfs, the problem also exists.

Comment 2 huangwei 2015-05-16 03:51:11 UTC
After this problem was reproduced with both client and server side hasing TRACE log level turned on , I got more information.

From client side, serveral clients tried to mkdir the same directory in parallel: 
 1. the first client succeed to call mkdir, and the other clients can see the new directory by lookup.

 2. then all clients begin to call setattr in parallel, the mkdir client called setattr by dht_mkdir_hased_cbk, the lookup client called setattr by dht_lookup_dir_cbk. In dht_lookup_dir_cbk, frame->root.uid/gid is promote to 0, that is root user.

 3. but sometimes, the lookup client's setattr could be called before mkdir client's setattr. in turn, when mkdir client calls setattr, will fail by EPERM error, and let the mkdired directory kept with root uid/gid.

log messages:

[2015-05-13 11:05:26.556457] T [dht-layout.c:372:dht_layout_merge] 0-dcn_u01_cnc_vol-dht: missing disk layout on dcn_u01_cnc_vol-client-0. err = -1
[2015-05-13 11:05:26.556467] I [dht-layout.c:640:dht_layout_normalize] 0-dcn_u01_cnc_vol-dht: found anomalies in /cps/20160629/S600009UnmatchAuthJournal.yak. holes=1 overlaps=0
[2015-05-13 11:05:26.556474] D [dht-common.c:496:dht_lookup_dir_cbk] 0-dcn_u01_cnc_vol-dht: fixing assignment on /cps/20160629/S600009UnmatchAuthJournal.yak
[2015-05-13 11:05:26.556484] T [dht-hashfn.c:97:dht_hash_compute] 0-dcn_u01_cnc_vol-dht: trying regex for /cps/20160629/S600009UnmatchAuthJournal.yak
[2015-05-13 11:05:26.556492] T [dht-selfheal.c:806:dht_selfheal_layout_new_directory] 0-dcn_u01_cnc_vol-dht: gave fix: 0 - 4294967294 on dcn_u01_cnc_vol-client-0 for /cps/20160629/S600009UnmatchAuthJournal.yak
[2015-05-13 11:05:26.556502] T [dht-selfheal.c:356:dht_selfheal_dir_setattr] 0-dcn_u01_cnc_vol-dht: setattr for /cps/20160629/S600009UnmatchAuthJournal.yak on subvol dcn_u01_cnc_vol-client-0
[2015-05-13 11:05:26.556513] T [rpc-clnt.c:1356:rpc_clnt_record] 0-dcn_u01_cnc_vol-client-0: Auth Info: pid: 1, uid: 0, gid: 0, owner:
[2015-05-13 11:05:26.556522] T [rpc-clnt.c:1212:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 192, payload: 124, rpc hdr: 68
[2015-05-13 11:05:26.556546] T [rpc-clnt.c:1553:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x208a5e Program: GlusterFS 3.3, ProgVers: 330, Proc: 38) to rpc-transport (dcn_u01_cnc_vol-client-0)
[2015-05-13 11:05:26.556566] T [rpc-clnt.c:671:rpc_clnt_reply_init] 0-dcn_u01_cnc_vol-client-0: received rpc message (RPC XID: 0x208a47 Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (dcn_u01_cnc_vol-client-0)
[2015-05-13 11:05:26.556578] T [dht-layout.c:372:dht_layout_merge] 0-dcn_u01_cnc_vol-dht: missing disk layout on dcn_u01_cnc_vol-client-0. err = -1
[2015-05-13 11:05:26.556596] I [dht-layout.c:640:dht_layout_normalize] 0-dcn_u01_cnc_vol-dht: found anomalies in /cps/20160629/S600009UnmatchAuthJournal.yak. holes=1 overlaps=0
[2015-05-13 11:05:26.556605] D [dht-common.c:496:dht_lookup_dir_cbk] 0-dcn_u01_cnc_vol-dht: fixing assignment on /cps/20160629/S600009UnmatchAuthJournal.yak
[2015-05-13 11:05:26.556612] T [dht-hashfn.c:97:dht_hash_compute] 0-dcn_u01_cnc_vol-dht: trying regex for /cps/20160629/S600009UnmatchAuthJournal.yak
[2015-05-13 11:05:26.556621] T [dht-selfheal.c:806:dht_selfheal_layout_new_directory] 0-dcn_u01_cnc_vol-dht: gave fix: 0 - 4294967294 on dcn_u01_cnc_vol-client-0 for /cps/20160629/S600009UnmatchAuthJournal.yak
[2015-05-13 11:05:26.556629] T [dht-selfheal.c:356:dht_selfheal_dir_setattr] 0-dcn_u01_cnc_vol-dht: setattr for /cps/20160629/S600009UnmatchAuthJournal.yak on subvol dcn_u01_cnc_vol-client-0
[2015-05-13 11:05:26.556640] T [rpc-clnt.c:1356:rpc_clnt_record] 0-dcn_u01_cnc_vol-client-0: Auth Info: pid: 1, uid: 0, gid: 0, owner:
[2015-05-13 11:05:26.556648] T [rpc-clnt.c:1212:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 192, payload: 124, rpc hdr: 68
[2015-05-13 11:05:26.556666] T [rpc-clnt.c:1553:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x208a5f Program: GlusterFS 3.3, ProgVers: 330, Proc: 38) to rpc-transport (dcn_u01_cnc_vol-client-0)
[2015-05-13 11:05:26.556697] T [rpcsvc.c:599:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 730


[2015-05-13 11:05:26.558756] T [dht-layout.c:372:dht_layout_merge] 0-dcn_u01_cnc_vol-dht: missing disk layout on dcn_u01_cnc_vol-client-0. err = -1
[2015-05-13 11:05:26.558767] T [dht-hashfn.c:97:dht_hash_compute] 0-dcn_u01_cnc_vol-dht: trying regex for /cps/20160629/S600009UnmatchAuthJournal.yak
[2015-05-13 11:05:26.558777] T [dht-selfheal.c:806:dht_selfheal_layout_new_directory] 0-dcn_u01_cnc_vol-dht: gave fix: 0 - 4294967294 on dcn_u01_cnc_vol-client-0 for /cps/20160629/S600009UnmatchAuthJournal.yak
[2015-05-13 11:05:26.558785] T [dht-selfheal.c:356:dht_selfheal_dir_setattr] 0-dcn_u01_cnc_vol-dht: setattr for /cps/20160629/S600009UnmatchAuthJournal.yak on subvol dcn_u01_cnc_vol-client-0
[2015-05-13 11:05:26.558809] T [rpc-clnt.c:1356:rpc_clnt_record] 0-dcn_u01_cnc_vol-client-0: Auth Info: pid: 1, uid: 6001, gid: 6000, owner:
[2015-05-13 11:05:26.558818] T [rpc-clnt.c:1212:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 192, payload: 124, rpc hdr: 68
[2015-05-13 11:05:26.558836] T [rpc-clnt.c:1553:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x208a6a Program: GlusterFS 3.3, ProgVers: 330, Proc: 38) to rpc-transport (dcn_u01_cnc_vol-client-0)
[2015-05-13 11:05:26.558854] T [rpcsvc.c:599:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 730


Because dht is stacked above afr layer, this problem can also be happened in afr volume.

Comment 3 Niels de Vos 2016-06-17 15:58:30 UTC
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.