Description of problem: Gluster client crashes while using both tiering and sharding Version-Release number of selected component (if applicable): gluster client, 3.12.1 gluster server side, same version How reproducible: trivial Steps to Reproduce: 1. create replicated (or distributed volume, doesn't matter) using sharding 2. everything works fine 3. add tiering on top of it, and check to have some files promoted. No modification are made on tiering config 4. works fine **EXCEPT** for file removal operations (UNLINK) Actual results: Gluster client crashes with: ``` [2017-12-05 15:43:04.517921] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-xosan-shard: Lookup on shard 1 failed. Base file gfid = cfb9c865-4f8e-4b01-a0bb-520a9baa4635 [Invalid argument] pending frames: frame : type(1) op(UNLINK) frame : type(1) op(UNLINK) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2017-12-05 15:43:04 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.1 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f66a1863460] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f66a186d394] /lib64/libc.so.6(+0x35670)[0x7f669ff48670] /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f66a06c4bd0] /lib64/libglusterfs.so.0(__gf_free+0x136)[0x7f66a188b6b6] /lib64/libglusterfs.so.0(data_destroy+0x5d)[0x7f66a185a5fd] /lib64/libglusterfs.so.0(dict_destroy+0x60)[0x7f66a185b040] /usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0x13d43)[0x7f66998e1d43] /usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0x16f76)[0x7f66998e4f76] /usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0xe35f)[0x7f66998dc35f] /usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0xfd31)[0x7f66998ddd31] /usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0x105a5)[0x7f66998de5a5] /usr/lib64/glusterfs/3.12.1/xlator/cluster/tier.so(+0x80ca3)[0x7f6699b71ca3] /usr/lib64/glusterfs/3.12.1/xlator/cluster/distribute.so(+0x3703c)[0x7f6699dce03c] /usr/lib64/glusterfs/3.12.1/xlator/cluster/replicate.so(+0xa1dc)[0x7f669a0371dc] /usr/lib64/glusterfs/3.12.1/xlator/cluster/replicate.so(+0xbadb)[0x7f669a038adb] /usr/lib64/glusterfs/3.12.1/xlator/cluster/replicate.so(+0xc566)[0x7f669a039566] /usr/lib64/glusterfs/3.12.1/xlator/protocol/client.so(+0x17d4c)[0x7f669a2c5d4c] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f66a162be60] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1e7)[0x7f66a162c147] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f66a1627f73] /usr/lib64/glusterfs/3.12.1/rpc-transport/socket.so(+0x7536)[0x7f669c7ce536] /usr/lib64/glusterfs/3.12.1/rpc-transport/socket.so(+0x9adc)[0x7f669c7d0adc] /lib64/libglusterfs.so.0(+0x883b4)[0x7f66a18c03b4] /lib64/libpthread.so.0(+0x7dc5)[0x7f66a06c2dc5] /lib64/libc.so.6(clone+0x6d)[0x7f66a000921d] ``` Expected results: file should be correctly removed Additional info: ``` # gluster volume info Volume Name: myvolume Type: Tier Volume ID: c9014830-2e51-401a-a967-2a0dd225d16a Status: Started Snapshot Count: 0 Number of Bricks: 9 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 172.31.100.104:/bricks/ssd Brick2: 172.31.100.103:/bricks/ssd Brick3: 172.31.100.102:/bricks/ssd Brick4: 172.31.100.101:/bricks/ssd Cold Tier: Cold Tier Type : Disperse Number of Bricks: 1 x (4 + 1) = 5 Brick5: 172.31.100.101:/bricks/xosan1/xosandir Brick6: 172.31.100.102:/bricks/xosan1/xosandir Brick7: 172.31.100.103:/bricks/xosan1/xosandir Brick8: 172.31.100.104:/bricks/xosan1/xosandir Brick9: 172.31.100.105:/bricks/xosan1/xosandir Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on nfs.disable: on transport.address-family: inet network.remote-dio: enable cluster.eager-lock: enable performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.strict-write-ordering: off client.event-threads: 8 server.event-threads: 8 performance.io-thread-count: 64 performance.stat-prefetch: on features.shard: on features.shard-block-size: 512MB ```
Problem still happening on 3.13.0, exact same behaviour.
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.
Hey thanks! If you need any more details about how to reproduce let me know (but it seems you found a way to reproduce the problem yourself)
Patch https://review.gluster.org/#/c/glusterfs/+/21331/ removes tier functionality from GlusterFS. https://bugzilla.redhat.com/show_bug.cgi?id=1642807 is used as the tracking bug for this. Recommendation is to convert your tier volume to regular volume (either replicate, ec, or plain distribute) with "tier detach" command before upgrade, and use backend features like dm-cache etc to utilize the caching from backend to provide better performance and functionality.
Damn, that was neat to have it embed in Gluster :p Anyway, any good documentation/example of dm-cache (ideally with a Gluster example)?