Bug 1396449

Summary: [SAMBA-CIFS] : IO hungs in cifs mount while graph switch on & off
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vivek Das <vdas>
Component: sambaAssignee: rjoseph
Status: CLOSED ERRATA QA Contact: Vivek Das <vdas>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.2CC: amukherj, rcyriac, rgowdapp, rhinduja, rhs-smb, rjoseph, sanandpa, vdas
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1397754 (view as bug list) Environment:
Last Closed: 2017-03-23 06:20:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1397754    
Bug Blocks: 1351528, 1399914, 1399915, 1399916    

Comment 4 Raghavendra G 2016-11-23 08:51:09 UTC
Is it possible to attach statedump of smbd process when I/O is hung?

Comment 5 rjoseph 2016-11-23 10:01:44 UTC
From the initial analysis it seems its a hang caused due to dead lock between multiple threads.

By default we have 2 epoll threads which will receive socket notifications. bt in Comment 3 shows both the epoll threads are blocked.

Thread1 is doing the graph migration therefore "migration_in_progress" variable is set to 1. And it issues the first lookup and syncop is waiting for a reply.

Thread7 is one of the epoll thread which is sending a parent up notification up. As part of this notification a new graph is created and priv_glfs_subvol_done is called on the old unused graph. priv_glfs_subvol_done calls glfs_lock which is waiting for "migration_in_progress" this variable.
Also the parent up notification is sent via client_notify_dispatch function which sets "ctx->notifying" variable. So all other client notification will be blocked in client_notify_dispatch function.

Thread8 is the second epoll thread which recieved another notification. We call client_notify_dispatch to send this notification above but due to Thread7 it will be blocked (see ctx->notifying).

Thread10 is also waiting on ctx->notifying in client_notify_dispatch function. This thread is executed by timer_proc.

So Thread7 and Thread8 consumed both the epoll threads and are blocked. Therefore Thread1 will never get notification.

Comment 6 rjoseph 2016-11-24 13:30:08 UTC
Patch posted upstream:  http://review.gluster.org/15913

Comment 9 rjoseph 2016-11-30 10:15:48 UTC
Upstream master      : http://review.gluster.org/15913
Upstream release 3.7 : http://review.gluster.org/15976
Upstream release 3.8 : http://review.gluster.org/15977
Upstream release 3.9 : http://review.gluster.org/15978

Downstream Patch     : https://code.engineering.redhat.com/gerrit/91699

Comment 11 Vivek Das 2016-12-20 06:36:13 UTC
Versions:
--------
glusterfs-3.8.4-9.el7rhgs.x86_64
samba-client-4.4.6-3.el7rhgs.x86_64

Followed 
Steps to Reproduce:
1.Do a cifs mount
2.Run dd command say dd if=/dev/zero of=file2 bs=1G count=1024
3.Run ll /mnt/cifs (in a loop)
4.Switch ON & OFF write-behind (in a loop)
5.Keep an eye on the file size

Also Distaf test cases that does a graph switch ON & OFF while dd command is ran

Did not found any kind of hung in the process.
Marking it as verified.

Comment 13 errata-xmlrpc 2017-03-23 06:20:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html