+++ This bug was initially created as a clone of Bug #1451083 +++ Description of problem: Regression test failure : https://build.gluster.org/job/centos6-regression/4602/console 14:09:32 Result: PASS 14:09:32 ./tests/bugs/replicate/bug-765564.t: 1 new core files 14:09:32 End of test ./tests/bugs/replicate/bug-765564.t 14:09:32 ==================================================== 14:09:33 Thread 1 (Thread 0x7f0d6a42d700 (LWP 2661)): 14:09:33 #0 0x00007f0d692d8101 in dht_rmdir_do (frame=0x7f0d54001ca0, this=0x7f0d6400e2e0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-common.c:8058 14:09:33 local = 0x0 14:09:33 conf = 0x7f0d64033820 14:09:33 ret = -1 14:09:33 hashed_subvol = 0x0 14:09:33 gfid = '\000' <repeats 49 times> 14:09:33 __FUNCTION__ = "dht_rmdir_do" 14:09:33 #1 0x00007f0d692db3af in dht_rmdir_opendir_cbk (frame=0x7f0d54001ca0, cookie=0x7f0d6400c920, this=0x7f0d6400e2e0, op_ret=0, op_errno=117, fd=0x7f0d5400c3a0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-common.c:8671 14:09:33 local = 0x7f0d54018500 14:09:33 this_call_cnt = 0 14:09:33 prev = 0x7f0d6400c920 14:09:33 ret = 0 14:09:33 conf = 0x7f0d64033820 14:09:33 dict = 0x7f0d640015b0 14:09:33 i = 1 14:09:33 gfid = '\000' <repeats 49 times> 14:09:33 readdirp_local = 0x7f0d64062eb0 14:09:33 readdirp_frame = 0x7f0d64053d10 14:09:33 __FUNCTION__ = "dht_rmdir_opendir_cbk" 14:09:33 #2 0x00007f0d6952ba79 in afr_opendir_cbk (frame=0x7f0d54008d90, cookie=0x0, this=0x7f0d6400c920, op_ret=0, op_errno=22, fd=0x7f0d5400c3a0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-dir-read.c:69 14:09:33 fn = 0x7f0d692dac18 <dht_rmdir_opendir_cbk> 14:09:33 _parent = 0x7f0d54001ca0 14:09:33 old_THIS = 0x7f0d6400c920 14:09:33 __local = 0x7f0d54003ae0 14:09:33 __this = 0x7f0d6400c920 14:09:33 __op_ret = 0 14:09:33 __op_errno = 117 14:09:33 local = 0x7f0d54003ae0 14:09:33 call_count = 0 14:09:33 child_index = 0 14:09:33 fd_ctx = 0x7f0d5403bab0 14:09:33 __FUNCTION__ = "afr_opendir_cbk" 14:09:33 #3 0x00007f0d697f813f in client3_3_opendir_cbk (req=0x7f0d540099b0, iov=0x7f0d540099f0, count=1, myframe=0x7f0d5400bf90) at /home/jenkins/root/workspace/centos6-regression/xlators/protocol/client/src/client-rpc-fops.c:2771 14:09:33 fn = 0x7f0d6952b58f <afr_opendir_cbk> 14:09:33 _parent = 0x7f0d54008d90 14:09:33 old_THIS = 0x7f0d64007a70 14:09:33 __local = 0x7f0d54008ea0 14:09:33 local = 0x7f0d54008ea0 14:09:33 frame = 0x7f0d5400bf90 14:09:33 fd = 0x7f0d5400c3a0 14:09:33 ret = 0 14:09:33 rsp = {op_ret = 0, op_errno = 22, fd = 0, xdata = {xdata_len = 0, xdata_val = 0x0}} 14:09:33 this = 0x7f0d64007a70 14:09:33 xdata = 0x0 14:09:33 __FUNCTION__ = "client3_3_opendir_cbk" Thanks to Shyam for the analysis: In dht_rmdir_opendir_cbk for (i = 0; i < conf->subvolume_cnt; i++) { readdirp_frame = copy_frame (frame); if (!readdirp_frame) { local->call_cnt--; continue; } readdirp_local = dht_local_init (readdirp_frame, &local->loc, local->fd, 0); if (!readdirp_local) { DHT_STACK_DESTROY (readdirp_frame); local->call_cnt--; continue; } readdirp_local->main_frame = frame; readdirp_local->op_ret = 0; readdirp_local->xattr = dict_ref (dict); /* overload this field to save the subvol info */ readdirp_local->hashed_subvol = conf->subvolumes[i]; STACK_WIND_COOKIE (readdirp_frame, dht_rmdir_readdirp_cbk, conf->subvolumes[i], conf->subvolumes[i], conf->subvolumes[i]->fops->readdirp, readdirp_local->fd, 4096, 0, readdirp_local->xattr); } if (dict) dict_unref (dict); /* Could not wind readdirp to any subvol */ if (!local->call_cnt) goto err; return 0; err: if (is_last_call (this_call_cnt)) { dht_rmdir_do (frame, this); } return 0; } If the dht_rmdir_readdirp_cbk unwinds before the check for the local->call_cnt, this could still hit dht_rmdir_do. At this point, the frame contents might no longer be valid. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2017-05-15 14:14:12 EDT --- REVIEW: https://review.gluster.org/17302 (cluster/dht: Fix crash in dht_rmdir) posted (#1) for review on master by N Balachandran (nbalacha) --- Additional comment from Nithya Balachandran on 2017-05-15 14:17:32 EDT --- This bug was introduced in https://review.gluster.org/17065 --- Additional comment from Worker Ant on 2017-05-15 14:24:18 EDT --- REVIEW: https://review.gluster.org/17302 (cluster/dht: Fix crash in dht rmdir) posted (#2) for review on master by N Balachandran (nbalacha) --- Additional comment from Worker Ant on 2017-05-16 01:02:26 EDT --- REVIEW: https://review.gluster.org/17305 (cluster/dht: Fix crash in dht rmdir) posted (#2) for review on master by N Balachandran (nbalacha) --- Additional comment from Worker Ant on 2017-05-16 10:03:11 EDT --- COMMIT: https://review.gluster.org/17305 committed in master by Shyamsundar Ranganathan (srangana) ------ commit 6f7d55c9d58797beaf8d5393c03a5a545bed8bec Author: N Balachandran <nbalacha> Date: Tue May 16 10:26:25 2017 +0530 cluster/dht: Fix crash in dht rmdir Using local->call_cnt to check STACK_WINDs can cause dht_rmdir_do to be called erroneously if dht_rmdir_readdirp_cbk unwinds before we check if local->call_cnt is zero in dht_rmdir_opendir_cbk. This can cause frame corruptions and crashes. Thanks to Shyam (srangana) for the analysis. Change-Id: I5362cf78f97f21b3fade0b9e94d492002a8d4a11 BUG: 1451083 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17305 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana>
REVIEW: https://review.gluster.org/17314 (cluster/dht: Fix crash in dht rmdir) posted (#1) for review on release-3.11 by N Balachandran (nbalacha)
COMMIT: https://review.gluster.org/17314 committed in release-3.11 by Shyamsundar Ranganathan (srangana) ------ commit a723151d9389498f1b3341172a899bb9d56fdf1b Author: N Balachandran <nbalacha> Date: Tue May 16 10:26:25 2017 +0530 cluster/dht: Fix crash in dht rmdir Using local->call_cnt to check STACK_WINDs can cause dht_rmdir_do to be called erroneously if dht_rmdir_readdirp_cbk unwinds before we check if local->call_cnt is zero in dht_rmdir_opendir_cbk. This can cause frame corruptions and crashes. Thanks to Shyam (srangana) for the analysis. > BUG: 1451083 > Signed-off-by: N Balachandran <nbalacha> > Reviewed-on: https://review.gluster.org/17305 > Smoke: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Shyamsundar Ranganathan <srangana> (cherry picked from commit 6f7d55c9d58797beaf8d5393c03a5a545bed8bec) Change-Id: I5362cf78f97f21b3fade0b9e94d492002a8d4a11 BUG: 1451586 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17314 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report. glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html [2] https://www.gluster.org/pipermail/gluster-users/