+++ This bug was initially created as a clone of Bug #1651439 +++ +++ This bug was initially created as a clone of Bug #1633177 +++ Description of problem: gluster-NFS is crashed while expanding volume Version-Release number of selected component (if applicable): glusterfs-3.12.2-18.1.el7rhgs.x86_64 How reproducible: Steps to Reproduce: While running automation runs, gluster-NFS is crashed while expanding volume 1) create distribute volume ( 1 * 4 ) 2) write IO from 2 clients 3) Add bricks while IO is in progress 4) start re-balance 5) check for IO After step 5), mount point is hung due to gluster-NFS crash. Actual results: gluster-NFS crash and IO is hung Expected results: IO should be success Additional info: volume info: [root@rhsauto023 glusterfs]# gluster vol info Volume Name: testvol_distributed Type: Distribute Volume ID: a809a120-f582-4358-8a70-5c53f71734ee Status: Started Snapshot Count: 0 Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: rhsauto023.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed_brick0 Brick2: rhsauto030.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed_brick1 Brick3: rhsauto031.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed_brick2 Brick4: rhsauto027.lab.eng.blr.redhat.com:/bricks/brick0/testvol_distributed_brick3 Brick5: rhsauto023.lab.eng.blr.redhat.com:/bricks/brick1/testvol_distributed_brick4 Options Reconfigured: transport.address-family: inet nfs.disable: off [root@rhsauto023 glusterfs]# > volume status [root@rhsauto023 glusterfs]# gluster vol status Status of volume: testvol_distributed Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto023.lab.eng.blr.redhat.com:/br icks/brick0/testvol_distributed_brick0 49153 0 Y 22557 Brick rhsauto030.lab.eng.blr.redhat.com:/br icks/brick0/testvol_distributed_brick1 49153 0 Y 21814 Brick rhsauto031.lab.eng.blr.redhat.com:/br icks/brick0/testvol_distributed_brick2 49153 0 Y 20441 Brick rhsauto027.lab.eng.blr.redhat.com:/br icks/brick0/testvol_distributed_brick3 49152 0 Y 19886 Brick rhsauto023.lab.eng.blr.redhat.com:/br icks/brick1/testvol_distributed_brick4 49152 0 Y 23019 NFS Server on localhost N/A N/A N N/A NFS Server on rhsauto027.lab.eng.blr.redhat .com 2049 0 Y 20008 NFS Server on rhsauto033.lab.eng.blr.redhat .com 2049 0 Y 19752 NFS Server on rhsauto030.lab.eng.blr.redhat .com 2049 0 Y 21936 NFS Server on rhsauto031.lab.eng.blr.redhat .com 2049 0 Y 20557 NFS Server on rhsauto040.lab.eng.blr.redhat .com 2049 0 Y 20047 Task Status of Volume testvol_distributed ------------------------------------------------------------------------------ Task : Rebalance ID : 8e5b404f-5740-4d87-a0d7-3ce94178329f Status : completed [root@rhsauto023 glusterfs]# > NFS crash [2018-09-25 13:58:35.381085] I [dict.c:471:dict_get] (-->/usr/lib64/glusterfs/3.12.2/xlator/protocol/client.so(+0x22f5d) [0x7f93543fdf5d] -->/usr/lib64/glusterfs/3.12.2/xlator/cluster/distri bute.so(+0x202e7) [0x7f93541572e7] -->/lib64/libglusterfs.so.0(dict_get+0x10c) [0x7f9361aefb3c] ) 0-dict: !this || key=trusted.glusterfs.dht.mds [Invalid argument] pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2018-09-25 13:58:36 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.2 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f9361af8cc0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f9361b02c04] /lib64/libc.so.6(+0x36280)[0x7f9360158280] /lib64/libglusterfs.so.0(+0x3b6fa)[0x7f9361b086fa] /lib64/libglusterfs.so.0(inode_parent+0x52)[0x7f9361b09822] /usr/lib64/glusterfs/3.12.2/xlator/nfs/server.so(+0xc243)[0x7f934f95c243] /usr/lib64/glusterfs/3.12.2/xlator/nfs/server.so(+0x3e1d8)[0x7f934f98e1d8] /usr/lib64/glusterfs/3.12.2/xlator/nfs/server.so(+0x3ea2b)[0x7f934f98ea2b] /usr/lib64/glusterfs/3.12.2/xlator/nfs/server.so(+0x3ead5)[0x7f934f98ead5] /usr/lib64/glusterfs/3.12.2/xlator/nfs/server.so(+0x3ecf8)[0x7f934f98ecf8] /usr/lib64/glusterfs/3.12.2/xlator/nfs/server.so(+0x29d7c)[0x7f934f979d7c] /usr/lib64/glusterfs/3.12.2/xlator/nfs/server.so(+0x2a184)[0x7f934f97a184] /lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325)[0x7f93618ba955] /lib64/libgfrpc.so.0(rpcsvc_notify+0x10b)[0x7f93618bab3b] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f93618bca73] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x7566)[0x7f93566e2566] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x9b0c)[0x7f93566e4b0c] /lib64/libglusterfs.so.0(+0x894c4)[0x7f9361b564c4] /lib64/libpthread.so.0(+0x7dd5)[0x7f9360957dd5] /lib64/libc.so.6(clone+0x6d)[0x7f9360220b3d] --------- --- Additional comment from Red Hat Bugzilla Rules Engine on 2018-09-26 07:02:14 EDT --- This bug is automatically being proposed for a Z-stream release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.z' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Vijay Avuthu on 2018-09-26 07:03:44 EDT --- SOS reports: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/vavuthu/nfs_crash_on_expanding_volume/ jenkin Job: http://jenkins-rhs.lab.eng.blr.redhat.com:8080/view/Auto%20RHEL%207.5/job/auto-RHGS_Downstream_BVT_RHEL_7_5_RHGS_3_4_brew/28/consoleFull Glusto Logs : http://jenkins-rhs.lab.eng.blr.redhat.com:8080/view/Auto%20RHEL%207.5/job/auto-RHGS_Downstream_BVT_RHEL_7_5_RHGS_3_4_brew/ws/glusto_28.log --- Additional comment from Jiffin on 2018-09-27 08:07:28 EDT --- 0 0x00007f9361b086fa in __inode_get_xl_index (xlator=0x7f9350018d30, inode=0x7f933c0133b0) at inode.c:455 455 if ((inode->_ctx[xlator->xl_id].xl_key != NULL) && Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libacl-2.2.51-14.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-12.el7.x86_64 libuuid-2.23.2-52.el7_5.1.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x00007f9361b086fa in __inode_get_xl_index (xlator=0x7f9350018d30, inode=0x7f933c0133b0) at inode.c:455 #1 __inode_ref (inode=inode@entry=0x7f933c0133b0) at inode.c:537 #2 0x00007f9361b09822 in inode_parent (inode=inode@entry=0x7f933c01d990, pargfid=pargfid@entry=0x7f93400aa2e8 "", name=name@entry=0x0) at inode.c:1359 #3 0x00007f934f95c243 in nfs_inode_loc_fill (inode=inode@entry=0x7f933c01d990, loc=loc@entry=0x7f93400aa2b8, how=how@entry=1) at nfs-common.c:206 #4 0x00007f934f98e1d8 in nfs3_fh_resolve_inode_done (cs=cs@entry=0x7f93400a9df0, inode=inode@entry=0x7f933c01d990) at nfs3-helpers.c:3611 #5 0x00007f934f98ea2b in nfs3_fh_resolve_inode (cs=0x7f93400a9df0) at nfs3-helpers.c:3828 #6 0x00007f934f98ead5 in nfs3_fh_resolve_resume (cs=cs@entry=0x7f93400a9df0) at nfs3-helpers.c:3860 #7 0x00007f934f98ecf8 in nfs3_fh_resolve_root (cs=cs@entry=0x7f93400a9df0) at nfs3-helpers.c:3915 #8 0x00007f934f98ef41 in nfs3_fh_resolve_and_resume (cs=cs@entry=0x7f93400a9df0, fh=fh@entry=0x7f934e195ae0, entry=entry@entry=0x0, resum_fn=resum_fn@entry=0x7f934f9798b0 <nfs3_access_resume>) at nfs3-helpers.c:4011 #9 0x00007f934f979d7c in nfs3_access (req=req@entry=0x7f934022dcd0, fh=fh@entry=0x7f934e195ae0, accbits=31) at nfs3.c:1783 #10 0x00007f934f97a184 in nfs3svc_access (req=0x7f934022dcd0) at nfs3.c:1819 #11 0x00007f93618ba955 in rpcsvc_handle_rpc_call (svc=0x7f935002c430, trans=trans@entry=0x7f935007a960, msg=<optimized out>) at rpcsvc.c:695 #12 0x00007f93618bab3b in rpcsvc_notify (trans=0x7f935007a960, mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at rpcsvc.c:789 #13 0x00007f93618bca73 in rpc_transport_notify (this=this@entry=0x7f935007a960, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f9340031290) at rpc-transport.c:538 #14 0x00007f93566e2566 in socket_event_poll_in (this=this@entry=0x7f935007a960, notify_handled=<optimized out>) at socket.c:2315 #15 0x00007f93566e4b0c in socket_event_handler (fd=10, idx=7, gen=46, data=0x7f935007a960, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467 #16 0x00007f9361b564c4 in event_dispatch_epoll_handler (event=0x7f934e195e80, event_pool=0x55c696306210) at event-epoll.c:583 #17 event_dispatch_epoll_worker (data=0x7f9350043b00) at event-epoll.c:659 #18 0x00007f9360957dd5 in start_thread () from /lib64/libpthread.so.0 #19 0x00007f9360220b3d in clone () from /lib64/libc.so.6 Above as part of nfs_local_filling() it was trying to find the parent inode and there is valid inode for parent as well, but context for that inode is NULL. From code reading i was not able to find place in which ctx is NULL with valid inode p *inode -- parent $27 = {table = 0x7f935002d000, gfid = "{\033g\270K\202B\202\211\320B\"\373u", <incomplete sequence \311>, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}}, nlookup = 0, fd_count = 0, active_fd_count = 0, ref = 1, ia_type = IA_IFDIR, fd_list = {next = 0x7f933c013408, prev = 0x7f933c013408}, dentry_list = {next = 0x7f933c013418, prev = 0x7f933c013418}, hash = { next = 0x7f933c013428, prev = 0x7f933c013428}, list = {next = 0x7f93503a5408, prev = 0x7f935002d060}, _ctx = 0x0} I tried to reproduce the issue(twice) but, it was not hitting in my setup. Requesting Vijay to recheck how frequently it can be reproduced and please try to run ith debug log level for nfs-server(diagonsis-cient log level) --- Additional comment from Worker Ant on 2018-11-20 06:00:20 UTC --- REVIEW: https://review.gluster.org/21685 (inode : prevent dentry creation if parent does not have ctx) posted (#1) for review on master by jiffin tony Thottan --- Additional comment from Worker Ant on 2018-11-29 14:03:58 UTC --- REVIEW: https://review.gluster.org/21749 (nfs : set ctx for every inode looked up nfs3_fh_resolve_inode_lookup_cbk()) posted (#1) for review on master by jiffin tony Thottan --- Additional comment from Worker Ant on 2018-12-03 05:50:44 UTC --- REVIEW: https://review.gluster.org/21749 (nfs : set ctx for every inode looked up nfs3_fh_resolve_inode_lookup_cbk()) posted (#4) for review on master by Amar Tumballi --- Additional comment from Worker Ant on 2019-01-08 08:49:15 UTC --- REVIEW: https://review.gluster.org/21998 (dht: fix inode leak when heal path) posted (#1) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-02-13 18:22:33 UTC --- REVIEW: https://review.gluster.org/21998 (dht: fix double extra unref of inode at heal path) merged (#4) on master by Raghavendra G
REVIEW: https://review.gluster.org/22244 (dht: fix double extra unref of inode at heal path) posted (#1) for review on release-6 by Susant Palai
REVIEW: https://review.gluster.org/22244 (dht: fix double extra unref of inode at heal path) merged (#2) on release-6 by Shyamsundar Ranganathan
Is there anything pending on this bug? I still see the bug is in POST state even though the above patch is merged (as the commit had 'updates' tag).
(In reply to Atin Mukherjee from comment #3) > Is there anything pending on this bug? I still see the bug is in POST state > even though the above patch is merged (as the commit had 'updates' tag). There was a crash seen dht layer in which was fixed by the above patch. But the patch was written originally for https://bugzilla.redhat.com/show_bug.cgi?id=1651439 which targetted mostly the nfs use case. Since we needed the dht fix in release-6, I guess Sunil cloned the mainline bug directly. Will change the summary to reflect dht-crash part and move the bug status to modified. Susant
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report. glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html [2] https://www.gluster.org/pipermail/gluster-users/