Bug 1305858
Summary: | dht: NULL layouts referenced while the I/O is going on tiered volume | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Bhaskarakiran <byarlaga> | |
Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> | |
Status: | CLOSED WORKSFORME | QA Contact: | storage-qa-internal <storage-qa-internal> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | dlambrig, mzywusko, nbalacha, nchilaka, rcyriac, rkavunga, sankarshan, skoduri, smohan, storage-qa-internal | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | Tiering | |||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1307208 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-01 07:02:45 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1307208, 1308400 |
Description
Bhaskarakiran
2016-02-09 12:09:10 UTC
Another crash on 10.70.35.153 BT : (gdb) bt #0 dht_layout_ref (this=0x7f373c133c80, layout=layout@entry=0x0) at dht-layout.c:149 #1 0x00007f3742ebc2db in dht_selfheal_restore (frame=frame@entry=0x7f374ea304cc, dir_cbk=dir_cbk@entry=0x7f3742ec4fa0 <dht_rmdir_selfheal_cbk>, loc=loc@entry=0x7f37325e0c74, layout=0x0) at dht-selfheal.c:1934 #2 0x00007f3742eca6e2 in dht_rmdir_hashed_subvol_cbk (frame=0x7f374ea304cc, cookie=0x7f374e9bd738, this=0x7f373c133c80, op_ret=-1, op_errno=39, preparent=0x7f37318ec04c, postparent=0x7f37318ec0bc, xdata=0x0) at dht-common.c:6788 #3 0x00007f374311bcd7 in afr_rmdir_unwind (frame=<optimized out>, this=<optimized out>) at afr-dir-write.c:1339 #4 0x00007f374311d619 in __afr_dir_write_cbk (frame=0x7f374e9c13b0, cookie=<optimized out>, this=0x7f373c1320e0, op_ret=<optimized out>, op_errno=<optimized out>, buf=buf@entry=0x0, preparent=0x7f3743db5930, postparent=postparent@entry=0x7f3743db59a0, preparent2=preparent2@entry=0x0, postparent2=postparent2@entry=0x0, xdata=xdata@entry=0x0) at afr-dir-write.c:246 #5 0x00007f374311d816 in afr_rmdir_wind_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, preparent=<optimized out>, postparent=0x7f3743db59a0, xdata=0x0) at afr-dir-write.c:1351 #6 0x00007f374339a7e1 in client3_3_rmdir_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f374e9e85e0) at client-rpc-fops.c:729 #7 0x00007f3750c4eb20 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f373c8917e0, pollin=pollin@entry=0x7f36801731a0) at rpc-clnt.c:766 #8 0x00007f3750c4eddf in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f373c891810, event=<optimized out>, data=0x7f36801731a0) at rpc-clnt.c:907 #9 0x00007f3750c4a913 in rpc_transport_notify (this=this@entry=0x7f373c8a14e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f36801731a0) at rpc-transport.c:545 #10 0x00007f3745a884b6 in socket_event_poll_in (this=this@entry=0x7f373c8a14e0) at socket.c:2236 #11 0x00007f3745a8b3a4 in socket_event_handler (fd=fd@entry=82, idx=idx@entry=99, data=0x7f373c8a14e0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2349 #12 0x00007f3750ee18ca in event_dispatch_epoll_handler (event=0x7f3743db5e80, event_pool=0x7f3752a66d10) at event-epoll.c:575 #13 event_dispatch_epoll_worker (data=0x7f3752ab3260) at event-epoll.c:678 #14 0x00007f374fce8dc5 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f374f62f21d in clone () from /lib64/libc.so.6 (gdb) q File: [root@dhcp35-153 ~]# ll /var/log/core/core.22231.1455016922.dump -rw-------. 1 root root 5343039488 Feb 9 16:53 /var/log/core/core.22231.1455016922.dump IO : dd (1GB files), linux untar and deletes ( rm -rf ) I am unable to recreate this using the procedure in comment #1. Can QE give us a way to reproduce it reliably? If it is related to "when the server which was restarted when the LVM pool became full" (comment #7), why did the LVM pool become full, and is gluster resilient to such situations? This occurs during 'rm' operations. 1. Linux untar multiple instances (3-4) and delete with "rm -rf" 2. Continue dd (varying block size - creates ) from another client. Removing the needinfo. Let me know if you any help. Do we know if this is related full LVM pool issues, as suggested in comment #7 and comment #8? Has this been reproduced on a normal volume? As there is insufficient information to debug this issue now, and the initial analysis seems to indicate that the crashes are a side effect of the gfid mismatch seen because of a full brick, and In comment#7: QE was unable to reproduce the crash on a clean volume. I am therefore closing this with "WorksForMe" Please file a new BZ if this is seen with the latest builds. |