Bug 1490642

Summary: glusterfs client crash when removing directories
Product: [Community] GlusterFS Reporter: Zhang Huan <zhhuan>
Component: distributeAssignee: Zhang Huan <zhhuan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, nbalacha, zhhuan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.13.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1505221 1519076 (view as bug list) Environment:
Last Closed: 2017-12-08 17:40:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1505221, 1519076    

Description Zhang Huan 2017-09-11 23:38:08 UTC
Description of problem:
Glusterfs client crashes when performing removing of directories in parallel. This issue is found by LTP test case inode02. Glusterfs needs to configure with more then 1 bricks. It is much more easy to reproduce with commit "event/epoll: Add back socket for polling of events immediately after reading the entire rpc message from the wire".

Version-Release number of selected component (if applicable):
mainline

How reproducible:
On some test machine, it crashes every time. However on some other machine, it never crashes.

Steps to Reproduce:
1. create glusterfs with >1 bricks
2. fuse mount glusterfs
3. run ltp test case inode02

Actual results:
gluster crashes when removing directories and test fails

Expected results:
test finishes without error.

Additional info:

Comment 1 Nithya Balachandran 2017-09-20 14:19:47 UTC
Can you please provide the coredump and rpm versions?

Comment 2 Zhang Huan 2017-09-22 02:14:17 UTC
I've a fix for this issue. Since I could not login to review.gluster.org, I put the link of it below FYI.
https://github.com/zhanghuan/glusterfs-1/commit/cd383bc1f49975fae769bed1cbd67e3b0a309819

Comment 3 Nithya Balachandran 2017-09-22 04:29:55 UTC
Thank you Zhang for finding the BZ and the fix. 

Are you able to log in to review.gluster.org now? It would be great if you can submit the patch there.

Regards,
Nithya

Comment 4 Zhang Huan 2017-09-22 04:40:42 UTC
No, "signed in with GitHub" still gives me a result of forbidden. It is been for a while.

I saw your comment on my patch. It is good advise, I will modify the patch accordingly and resent after test.

Thank you for your reply.

Comment 5 Nithya Balachandran 2017-09-22 05:58:17 UTC
(In reply to Zhang Huan from comment #4)
> No, "signed in with GitHub" still gives me a result of forbidden. It is been
> for a while.
> 
You can ask for help on this by logging into the #gluster channel in IRC. Ask for nigelb.

Comment 6 Nithya Balachandran 2017-10-11 08:12:21 UTC
Hi Zhang,

Please file a bug for the issue where you cannot log into review.gluster.org. Please use component project-infrastructure.

Thanks,
Nithya

Comment 7 Nithya Balachandran 2017-10-11 08:16:22 UTC
(In reply to Nithya Balachandran from comment #6)
> Hi Zhang,
> 
> Please file a bug for the issue where you cannot log into
> review.gluster.org. Please use component project-infrastructure.
> 
> Thanks,
> Nithya

Please ignore this - I just realised there is a BZ already.

Comment 8 Zhang Huan 2017-10-12 06:03:15 UTC
The login issue has been fixed. Related link is 
https://bugzilla.redhat.com/show_bug.cgi?id=1494363

I will continue to post the patch to review.gluster.org for review.

Comment 9 Worker Ant 2017-10-13 05:53:31 UTC
REVIEW: https://review.gluster.org/18517 (cluster/dht: fix crash when deleting directories) posted (#1) for review on master by Zhang Huan (zhanghuan)

Comment 10 Worker Ant 2017-10-16 10:33:17 UTC
COMMIT: https://review.gluster.org/18517 committed in master by Raghavendra G (rgowdapp) 
------
commit 206120126d455417a81a48ae473d49be337e9463
Author: Zhang Huan <zhanghuan>
Date:   Tue Sep 5 11:36:25 2017 +0800

    cluster/dht: fix crash when deleting directories
    
    In DHT, after locks on all subvolumes are acquired, it would perform the
    following steps sequentially,
    1. send remove dir on all other subvolumes except the hashed one in a loop;
    2. wait for all pending rmdir to be done
    3. remove dir on the hashed subvolume
    
    The problem is that in step 1 there is a check to skip hashed subvolume
    in the loop. If the last subvolume to check is actually the
    hashed one, and step 3 is quickly done before the last and hashed
    subvolume is checked, by accessing shared context data be destroyed in
    step 3, would cause a crash.
    
    Fix by saving shared data in a local variable to access later in the
    loop.
    
    Change-Id: I8db7cf7cb262d74efcb58eb00f02ea37df4be4e2
    BUG: 1490642
    Signed-off-by: Zhang Huan <zhanghuan>

Comment 11 Worker Ant 2017-10-23 04:50:32 UTC
REVIEW: https://review.gluster.org/18551 (cluster/dht: fix crash when deleting directories) posted (#1) for review on release-3.12 by N Balachandran (nbalacha)

Comment 12 Shyamsundar 2017-12-08 17:40:26 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report.

glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html
[2] https://www.gluster.org/pipermail/gluster-users/