Bug 1522808 - Gluster client crashes while using both tiering and sharding
Summary: Gluster client crashes while using both tiering and sharding
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: tiering
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-06 13:45 UTC by Olivier
Modified: 2018-11-02 08:48 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-02 08:13:31 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Olivier 2017-12-06 13:45:16 UTC
Description of problem:

Gluster client crashes while using both tiering and sharding

Version-Release number of selected component (if applicable):

gluster client, 3.12.1
gluster server side, same version

How reproducible: trivial

Steps to Reproduce:
1. create replicated (or distributed volume, doesn't matter) using sharding
2. everything works fine
3. add tiering on top of it, and check to have some files promoted. No modification are made on tiering config
4. works fine **EXCEPT** for file removal operations (UNLINK)

Actual results:

Gluster client crashes with:

```
[2017-12-05 15:43:04.517921] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-xosan-shard: Lookup on shard 1 failed. Base file gfid = cfb9c865-4f8e-4b01-a0bb-520a9baa4635 [Invalid argument]
pending frames:
frame : type(1) op(UNLINK)
frame : type(1) op(UNLINK)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2017-12-05 15:43:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.1
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f66a1863460]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f66a186d394]
/lib64/libc.so.6(+0x35670)[0x7f669ff48670]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f66a06c4bd0]
/lib64/libglusterfs.so.0(__gf_free+0x136)[0x7f66a188b6b6]
/lib64/libglusterfs.so.0(data_destroy+0x5d)[0x7f66a185a5fd]
/lib64/libglusterfs.so.0(dict_destroy+0x60)[0x7f66a185b040]
/usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0x13d43)[0x7f66998e1d43]
/usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0x16f76)[0x7f66998e4f76]
/usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0xe35f)[0x7f66998dc35f]
/usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0xfd31)[0x7f66998ddd31]
/usr/lib64/glusterfs/3.12.1/xlator/features/shard.so(+0x105a5)[0x7f66998de5a5]
/usr/lib64/glusterfs/3.12.1/xlator/cluster/tier.so(+0x80ca3)[0x7f6699b71ca3]
/usr/lib64/glusterfs/3.12.1/xlator/cluster/distribute.so(+0x3703c)[0x7f6699dce03c]
/usr/lib64/glusterfs/3.12.1/xlator/cluster/replicate.so(+0xa1dc)[0x7f669a0371dc]
/usr/lib64/glusterfs/3.12.1/xlator/cluster/replicate.so(+0xbadb)[0x7f669a038adb]
/usr/lib64/glusterfs/3.12.1/xlator/cluster/replicate.so(+0xc566)[0x7f669a039566]
/usr/lib64/glusterfs/3.12.1/xlator/protocol/client.so(+0x17d4c)[0x7f669a2c5d4c]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f66a162be60]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1e7)[0x7f66a162c147]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f66a1627f73]
/usr/lib64/glusterfs/3.12.1/rpc-transport/socket.so(+0x7536)[0x7f669c7ce536]
/usr/lib64/glusterfs/3.12.1/rpc-transport/socket.so(+0x9adc)[0x7f669c7d0adc]
/lib64/libglusterfs.so.0(+0x883b4)[0x7f66a18c03b4]
/lib64/libpthread.so.0(+0x7dc5)[0x7f66a06c2dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f66a000921d]
```


Expected results: file should be correctly removed


Additional info:

```
# gluster volume info
 
Volume Name: myvolume
Type: Tier
Volume ID: c9014830-2e51-401a-a967-2a0dd225d16a
Status: Started
Snapshot Count: 0
Number of Bricks: 9
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 172.31.100.104:/bricks/ssd
Brick2: 172.31.100.103:/bricks/ssd
Brick3: 172.31.100.102:/bricks/ssd
Brick4: 172.31.100.101:/bricks/ssd
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 1) = 5
Brick5: 172.31.100.101:/bricks/xosan1/xosandir
Brick6: 172.31.100.102:/bricks/xosan1/xosandir
Brick7: 172.31.100.103:/bricks/xosan1/xosandir
Brick8: 172.31.100.104:/bricks/xosan1/xosandir
Brick9: 172.31.100.105:/bricks/xosan1/xosandir
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
nfs.disable: on
transport.address-family: inet
network.remote-dio: enable
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.strict-write-ordering: off
client.event-threads: 8
server.event-threads: 8
performance.io-thread-count: 64
performance.stat-prefetch: on
features.shard: on
features.shard-block-size: 512MB
```

Comment 1 Olivier 2017-12-18 17:09:02 UTC
Problem still happening on 3.13.0, exact same behaviour.

Comment 2 Shyamsundar 2018-10-23 14:54:10 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 3 Olivier 2018-10-23 18:24:28 UTC
Hey thanks! If you need any more details about how to reproduce let me know (but it seems you found a way to reproduce the problem yourself)

Comment 4 Amar Tumballi 2018-11-02 08:13:31 UTC
Patch https://review.gluster.org/#/c/glusterfs/+/21331/ removes tier functionality from GlusterFS. 

https://bugzilla.redhat.com/show_bug.cgi?id=1642807 is used as the tracking bug for this. Recommendation is to convert your tier volume to regular volume (either replicate, ec, or plain distribute) with "tier detach" command before upgrade, and use backend features like dm-cache etc to utilize the caching from backend to provide better performance and functionality.

Comment 5 Olivier 2018-11-02 08:48:01 UTC
Damn, that was neat to have it embed in Gluster :p Anyway, any good documentation/example of dm-cache (ideally with a Gluster example)?


Note You need to log in before you can comment on or make changes to this bug.