Bug 1451586 - crash in dht_rmdir_do
Summary: crash in dht_rmdir_do
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.11
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nithya Balachandran
QA Contact:
URL:
Whiteboard:
Depends On: 1451083
Blocks: 1451086 1451200 1451371
TreeView+ depends on / blocked
 
Reported: 2017-05-17 06:04 UTC by Nithya Balachandran
Modified: 2017-05-30 18:52 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.11.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1451083
Environment:
Last Closed: 2017-05-30 18:52:40 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nithya Balachandran 2017-05-17 06:04:09 UTC
+++ This bug was initially created as a clone of Bug #1451083 +++

Description of problem:

Regression test failure :

https://build.gluster.org/job/centos6-regression/4602/console


14:09:32 Result: PASS
14:09:32 ./tests/bugs/replicate/bug-765564.t: 1 new core files
14:09:32 End of test ./tests/bugs/replicate/bug-765564.t
14:09:32 ====================================================




14:09:33 Thread 1 (Thread 0x7f0d6a42d700 (LWP 2661)):
14:09:33 #0  0x00007f0d692d8101 in dht_rmdir_do (frame=0x7f0d54001ca0, this=0x7f0d6400e2e0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-common.c:8058
14:09:33         local = 0x0
14:09:33         conf = 0x7f0d64033820
14:09:33         ret = -1
14:09:33         hashed_subvol = 0x0
14:09:33         gfid = '\000' <repeats 49 times>
14:09:33         __FUNCTION__ = "dht_rmdir_do"
14:09:33 #1  0x00007f0d692db3af in dht_rmdir_opendir_cbk (frame=0x7f0d54001ca0, cookie=0x7f0d6400c920, this=0x7f0d6400e2e0, op_ret=0, op_errno=117, fd=0x7f0d5400c3a0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-common.c:8671
14:09:33         local = 0x7f0d54018500
14:09:33         this_call_cnt = 0
14:09:33         prev = 0x7f0d6400c920
14:09:33         ret = 0
14:09:33         conf = 0x7f0d64033820
14:09:33         dict = 0x7f0d640015b0
14:09:33         i = 1
14:09:33         gfid = '\000' <repeats 49 times>
14:09:33         readdirp_local = 0x7f0d64062eb0
14:09:33         readdirp_frame = 0x7f0d64053d10
14:09:33         __FUNCTION__ = "dht_rmdir_opendir_cbk"
14:09:33 #2  0x00007f0d6952ba79 in afr_opendir_cbk (frame=0x7f0d54008d90, cookie=0x0, this=0x7f0d6400c920, op_ret=0, op_errno=22, fd=0x7f0d5400c3a0, xdata=0x0) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-dir-read.c:69
14:09:33         fn = 0x7f0d692dac18 <dht_rmdir_opendir_cbk>
14:09:33         _parent = 0x7f0d54001ca0
14:09:33         old_THIS = 0x7f0d6400c920
14:09:33         __local = 0x7f0d54003ae0
14:09:33         __this = 0x7f0d6400c920
14:09:33         __op_ret = 0
14:09:33         __op_errno = 117
14:09:33         local = 0x7f0d54003ae0
14:09:33         call_count = 0
14:09:33         child_index = 0
14:09:33         fd_ctx = 0x7f0d5403bab0
14:09:33         __FUNCTION__ = "afr_opendir_cbk"
14:09:33 #3  0x00007f0d697f813f in client3_3_opendir_cbk (req=0x7f0d540099b0, iov=0x7f0d540099f0, count=1, myframe=0x7f0d5400bf90) at /home/jenkins/root/workspace/centos6-regression/xlators/protocol/client/src/client-rpc-fops.c:2771
14:09:33         fn = 0x7f0d6952b58f <afr_opendir_cbk>
14:09:33         _parent = 0x7f0d54008d90
14:09:33         old_THIS = 0x7f0d64007a70
14:09:33         __local = 0x7f0d54008ea0
14:09:33         local = 0x7f0d54008ea0
14:09:33         frame = 0x7f0d5400bf90
14:09:33         fd = 0x7f0d5400c3a0
14:09:33         ret = 0
14:09:33         rsp = {op_ret = 0, op_errno = 22, fd = 0, xdata = {xdata_len = 0, xdata_val = 0x0}}
14:09:33         this = 0x7f0d64007a70
14:09:33         xdata = 0x0
14:09:33         __FUNCTION__ = "client3_3_opendir_cbk"




Thanks to Shyam for the analysis:

In dht_rmdir_opendir_cbk

      for (i = 0; i < conf->subvolume_cnt; i++) {

                readdirp_frame = copy_frame (frame);

                if (!readdirp_frame) {
                        local->call_cnt--;
                        continue;
                }

                readdirp_local = dht_local_init (readdirp_frame, &local->loc,
                                                 local->fd, 0);

                if (!readdirp_local) {
                        DHT_STACK_DESTROY (readdirp_frame);
                        local->call_cnt--;
                        continue;
                }
                readdirp_local->main_frame = frame;
                readdirp_local->op_ret = 0;
                readdirp_local->xattr = dict_ref (dict);
                /* overload this field to save the subvol info */
                readdirp_local->hashed_subvol = conf->subvolumes[i];

                STACK_WIND_COOKIE (readdirp_frame, dht_rmdir_readdirp_cbk,
                                   conf->subvolumes[i], conf->subvolumes[i],
                                   conf->subvolumes[i]->fops->readdirp,
                                   readdirp_local->fd, 4096, 0,
                                   readdirp_local->xattr);
        }

        if (dict)
                dict_unref (dict);

        /* Could not wind readdirp to any subvol */
        if (!local->call_cnt)
                goto err;

        return 0;

err:
        if (is_last_call (this_call_cnt)) {
                dht_rmdir_do (frame, this);
        }

        return 0;
}


If the dht_rmdir_readdirp_cbk unwinds before the check for the local->call_cnt, this could still hit dht_rmdir_do. At this point, the frame contents might no longer be valid.




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2017-05-15 14:14:12 EDT ---

REVIEW: https://review.gluster.org/17302 (cluster/dht: Fix crash in dht_rmdir) posted (#1) for review on master by N Balachandran (nbalacha)

--- Additional comment from Nithya Balachandran on 2017-05-15 14:17:32 EDT ---

This bug was introduced in https://review.gluster.org/17065

--- Additional comment from Worker Ant on 2017-05-15 14:24:18 EDT ---

REVIEW: https://review.gluster.org/17302 (cluster/dht: Fix crash in dht rmdir) posted (#2) for review on master by N Balachandran (nbalacha)

--- Additional comment from Worker Ant on 2017-05-16 01:02:26 EDT ---

REVIEW: https://review.gluster.org/17305 (cluster/dht: Fix crash in dht rmdir) posted (#2) for review on master by N Balachandran (nbalacha)

--- Additional comment from Worker Ant on 2017-05-16 10:03:11 EDT ---

COMMIT: https://review.gluster.org/17305 committed in master by Shyamsundar Ranganathan (srangana) 
------
commit 6f7d55c9d58797beaf8d5393c03a5a545bed8bec
Author: N Balachandran <nbalacha>
Date:   Tue May 16 10:26:25 2017 +0530

    cluster/dht: Fix crash in dht rmdir
    
    Using local->call_cnt to check STACK_WINDs can
    cause dht_rmdir_do to be called erroneously if
    dht_rmdir_readdirp_cbk unwinds before we check if
    local->call_cnt is zero in dht_rmdir_opendir_cbk.
    This can cause frame corruptions and crashes.
    
    Thanks to Shyam (srangana) for the
    analysis.
    
    Change-Id: I5362cf78f97f21b3fade0b9e94d492002a8d4a11
    BUG: 1451083
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17305
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 1 Worker Ant 2017-05-17 06:14:08 UTC
REVIEW: https://review.gluster.org/17314 (cluster/dht: Fix crash in dht rmdir) posted (#1) for review on release-3.11 by N Balachandran (nbalacha)

Comment 2 Worker Ant 2017-05-17 13:49:47 UTC
COMMIT: https://review.gluster.org/17314 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit a723151d9389498f1b3341172a899bb9d56fdf1b
Author: N Balachandran <nbalacha>
Date:   Tue May 16 10:26:25 2017 +0530

    cluster/dht: Fix crash in dht rmdir
    
    Using local->call_cnt to check STACK_WINDs can
    cause dht_rmdir_do to be called erroneously if
    dht_rmdir_readdirp_cbk unwinds before we check if
    local->call_cnt is zero in dht_rmdir_opendir_cbk.
    This can cause frame corruptions and crashes.
    
    Thanks to Shyam (srangana) for the
    analysis.
    
    > BUG: 1451083
    > Signed-off-by: N Balachandran <nbalacha>
    > Reviewed-on: https://review.gluster.org/17305
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Shyamsundar Ranganathan <srangana>
    (cherry picked from commit 6f7d55c9d58797beaf8d5393c03a5a545bed8bec)
    Change-Id: I5362cf78f97f21b3fade0b9e94d492002a8d4a11
    BUG: 1451586
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17314
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 3 Shyamsundar 2017-05-30 18:52:40 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.