+++ This bug was initially created as a clone of Bug #1159571 +++ +++ This bug was initially created as a clone of Bug #1159280 +++ Description of problem: Version-Release number of selected component (if applicable): glusterfs-3.6.1 How reproducible: Steps to Reproduce: 1. created 6x2 dist-rep volume 2. created some data on the mount point 3. started remove-brick Actual results: rebalance process crashed Expected results: Additional info: Core was generated by `/usr/sbin/glusterfs --volfile-server=rhs-client4.lab.eng.blr.redhat.com --volfi'. Program terminated with signal 11, Segmentation fault. #0 0x00007fbf389e9bcf in dht_lookup_everywhere_done (frame=0x7fbf3cb2c0f8, this=0x1016db0) at dht-common.c:1189 1189 gf_log (this->name, GF_LOG_DEBUG, Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.6.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.6.x86_64 libcom_err-1.41.12-14.el6_4.4.x86_64 libgcc-4.4.7-3.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 openssl-1.0.1e-16.el6_5.15.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x00007fbf389e9bcf in dht_lookup_everywhere_done (frame=0x7fbf3cb2c0f8, this=0x1016db0) at dht-common.c:1189 #1 0x00007fbf389ede1b in dht_lookup_everywhere_cbk (frame=0x7fbf3cb2c0f8, cookie=<value optimized out>, this=0x1016db0, op_ret=<value optimized out>, op_errno=<value optimized out>, inode=0x7fbf30afe0c8, buf=0x7fbf3354085c, xattr=0x7fbf3c5271ac, postparent=0x7fbf335408cc) at dht-common.c:1515 #2 0x00007fbf38c6a298 in afr_lookup_done (frame=0x7fbeffffffc6, cookie=0x7ffff2e0b8e8, this=0x1016320, op_ret=<value optimized out>, op_errno=8, inode=0x11ce7e0, buf=0x7ffff2e0bb40, xattr=0x7fbf3c527238, postparent=0x7ffff2e0bad0) at afr-common.c:2223 #3 afr_lookup_cbk (frame=0x7fbeffffffc6, cookie=0x7ffff2e0b8e8, this=0x1016320, op_ret=<value optimized out>, op_errno=8, inode=0x11ce7e0, buf=0x7ffff2e0bb40, xattr=0x7fbf3c527238, postparent=0x7ffff2e0bad0) at afr-common.c:2454 #4 0x00007fbf38ea6a33 in client3_3_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7fbf3cb2ba40) at client-rpc-fops.c:2610 #5 0x00000035cac0e005 in rpc_clnt_handle_reply (clnt=0x1076630, pollin=0x10065e0) at rpc-clnt.c:773 #6 0x00000035cac0f5c7 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1076660, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:906 #7 0x00000035cac0ae48 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:512 #8 0x00007fbf3a105e36 in socket_event_poll_in (this=0x1086060) at socket.c:2136 #9 0x00007fbf3a10775d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x1086060, poll_in=1, poll_out=0, poll_err=0) at socket.c:2246 #10 0x00000035ca462997 in event_dispatch_epoll_handler (event_pool=0xfe4ee0) at event-epoll.c:384 #11 event_dispatch_epoll (event_pool=0xfe4ee0) at event-epoll.c:445 #12 0x00000000004069d7 in main (argc=4, argv=0x7ffff2e0d7e8) at glusterfsd.c:2050 (gdb) bt #0 0x00007f4d8d522bcf in dht_lookup_everywhere_done (frame=0x7f4d9145f85c, this=0x22c1470) at dht-common.c:1189 #1 0x00007f4d8d526e1b in dht_lookup_everywhere_cbk (frame=0x7f4d9145f85c, cookie=<value optimized out>, this=0x22c1470, op_ret=<value optimized out>, op_errno=<value optimized out>, inode=0x7f4d8396b53c, buf=0x7f4d8c100d38, xattr=0x7f4d90e5ab84, postparent=0x7f4d8c100da8) at dht-common.c:1515 #2 0x00007f4d8d7a3298 in afr_lookup_done (frame=0x7f4cffffffc6, cookie=0x7fffb7202cd8, this=0x22c09e0, op_ret=<value optimized out>, op_errno=8, inode=0x25ccd70, buf=0x7fffb7202f30, xattr=0x7f4d90e5ab84, postparent=0x7fffb7202ec0) at afr-common.c:2223 #3 afr_lookup_cbk (frame=0x7f4cffffffc6, cookie=0x7fffb7202cd8, this=0x22c09e0, op_ret=<value optimized out>, op_errno=8, inode=0x25ccd70, buf=0x7fffb7202f30, xattr=0x7f4d90e5ab84, postparent=0x7fffb7202ec0) at afr-common.c:2454 #4 0x00007f4d8d9dfa33 in client3_3_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f4d9145eef4) at client-rpc-fops.c:2610 #5 0x00000035cac0e005 in rpc_clnt_handle_reply (clnt=0x22fc990, pollin=0x230d380) at rpc-clnt.c:773 #6 0x00000035cac0f5c7 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x22fc9c0, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:906 ---Type <return> to continue, or q <return> to quit--- #7 0x00000035cac0ae48 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:512 #8 0x00007f4d8ea38e36 in socket_event_poll_in (this=0x230c420) at socket.c:2136 #9 0x00007f4d8ea3a75d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x230c420, poll_in=1, poll_out=0, poll_err=0) at socket.c:2246 #10 0x00000035ca462997 in event_dispatch_epoll_handler (event_pool=0x2288ee0) at event-epoll.c:384 #11 event_dispatch_epoll (event_pool=0x2288ee0) at event-epoll.c:445 #12 0x00000000004069d7 in main (argc=11, argv=0x7fffb7204bd8) at glusterfsd.c:2050 (gdb) l 1184 goto unwind_hashed_and_cached; 1185 } else { 1186 1187 local->skip_unlink.handle_valid_link = _gf_false; 1188 1189 gf_log (this->name, GF_LOG_DEBUG, 1190 "Linkto file found on hashed subvol " 1191 "and data file found on cached " 1192 "subvolume. But linkto points to " 1193 "different cached subvolume (%s) " (gdb) 1194 "path %s", 1195 local->skip_unlink.hash_links_to->name, 1196 local->loc.path); 1197 1198 if (local->skip_unlink.opend_fd_count == 0) { 1199 (gdb) p local->skip_unlink.hash_links_to $2 = (xlator_t *) 0x0 (gdb) p local->skip_unlink.hash_links_to->name Cannot access memory at address 0x0 (gdb) p local->loc.path $1 = 0x7f4d7c019000 "/test/f411" (gdb) p *(dht_conf_t *)this->private $4 = {subvolume_lock = 1, subvolume_cnt = 8, subvolumes = 0x22d57c0, subvolume_status = 0x22d5810 "\001\001\001\001\001\001\001\001", last_event = 0x22d5830, file_layouts = 0x22d6650, dir_layouts = 0x0, .... The trusted.glusterfs.dht.linkto="qtest-replicate-8" for "/test/f411". This points to the brick that was removed and is not found in the conf->subvolumes list. (gdb) p ((dht_conf_t *)this->private)->subvolumes[0]->name $18 = 0x22ba0a0 "qtest-replicate-0" (gdb) p ((dht_conf_t *)this->private)->subvolumes[1]->name $19 = 0x22bb780 "qtest-replicate-1" (gdb) p ((dht_conf_t *)this->private)->subvolumes[2]->name $20 = 0x22bc9b0 "qtest-replicate-2" (gdb) p ((dht_conf_t *)this->private)->subvolumes[3]->name $21 = 0x22bd420 "qtest-replicate-3" (gdb) p ((dht_conf_t *)this->private)->subvolumes[4]->name $22 = 0x22bdeb0 "qtest-replicate-4" (gdb) p ((dht_conf_t *)this->private)->subvolumes[5]->name $23 = 0x22be940 "qtest-replicate-5" (gdb) p ((dht_conf_t *)this->private)->subvolumes[6]->name $24 = 0x22bf3d0 "qtest-replicate-6" (gdb) p ((dht_conf_t *)this->private)->subvolumes[7]->name $25 = 0x22bfe60 "qtest-replicate-7" (gdb) p ((dht_conf_t *)this->private)->subvolumes[8]->name Cannot access memory at address 0x0 The local->skip_unlink.hash_links_to value is set in dht_lookup_everywhere_cbk() without checking if it NULL: if (is_linkfile) { link_subvol = dht_linkfile_subvol (this, inode, buf, xattr); gf_msg_debug (this->name, 0, "found on %s linkfile %s (-> %s)", subvol->name, loc->path, link_subvol ? link_subvol->name : "''"); goto unlock; } ... ... ====================================================================================================== On the bricks: [root@rhs-client4 ~]# getfattr -d -m . /home/qtest*/test/f411 getfattr: Removing leading '/' from absolute path names # file: home/qtest12/test/f411 trusted.gfid=0sJg43JQHHRJST/cXjXyY0wg== trusted.glusterfs.dht.linkto="qtest-replicate-8" <----------THIS!!! trusted.glusterfs.quota.f3874c91-e295-45d9-a95a-252d54b15ba0.contri=0sAAAAAAAAAAA= trusted.pgfid.f3874c91-e295-45d9-a95a-252d54b15ba0=0sAAAAAQ== # file: home/qtest17/test/f411 trusted.afr.qtest-client-16=0sAAAAAAAAAAAAAAAA trusted.afr.qtest-client-17=0sAAAAAAAAAAAAAAAA trusted.gfid=0sJg43JQHHRJST/cXjXyY0wg== trusted.glusterfs.quota.f3874c91-e295-45d9-a95a-252d54b15ba0.contri=0sAAAAAAAQAAA= trusted.pgfid.f3874c91-e295-45d9-a95a-252d54b15ba0=0sAAAAAQ== ======================================================================================================
REVIEW: http://review.gluster.org/9467 (Cluster/DHT : Fixed crash due to null deref) posted (#1) for review on release-3.6 by Raghavendra Bhat (raghavendra)
COMMIT: http://review.gluster.org/9467 committed in release-3.6 by Raghavendra Bhat (raghavendra) ------ commit 709d4712941adecdc0542672cd0cdea3b86ec729 Author: Nithya Balachandran <nbalacha> Date: Sat Nov 1 22:16:32 2014 +0530 Cluster/DHT : Fixed crash due to null deref A lookup on a linkto file whose trusted.glusterfs.dht.linkto xattr points to a subvol that is not part of the volume can cause the brick process to segfault due to a null dereference. Modified to check for a non-null value before attempting to access the variable. > Change-Id: Ie8f9df058f842cfc0c2b52a8f147e557677386fa > BUG: 1159571 > Signed-off-by: Nithya Balachandran <nbalacha> > Reviewed-on: http://review.gluster.org/9034 > Tested-by: Gluster Build System <jenkins.com> > Reviewed-by: venkatesh somyajulu <vsomyaju> > Reviewed-by: Vijay Bellur <vbellur> > Signed-off-by: Raghavendra Bhat <raghavendra> Change-Id: I53b086289d2386d269648653629a0750baae07a4 BUG: 1184191 Reviewed-on: http://review.gluster.org/9467 Reviewed-by: Vijay Bellur <vbellur> Reviewed-by: Shyamsundar Ranganathan <srangana> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra Bhat <raghavendra>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.2, please reopen this bug report. glusterfs-3.6.2 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.6.2. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978 [2] http://news.gmane.org/gmane.comp.file-systems.gluster.user [3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137