Bug 1396449 - [SAMBA-CIFS] : IO hungs in cifs mount while graph switch on & off
Summary: [SAMBA-CIFS] : IO hungs in cifs mount while graph switch on & off
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: samba
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: rjoseph
QA Contact: Vivek Das
URL:
Whiteboard:
Depends On: 1397754
Blocks: 1351528 1399914 1399915 1399916
TreeView+ depends on / blocked
 
Reported: 2016-11-18 10:41 UTC by Vivek Das
Modified: 2017-03-23 06:20 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.8.4-7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1397754 (view as bug list)
Environment:
Last Closed: 2017-03-23 06:20:09 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Comment 4 Raghavendra G 2016-11-23 08:51:09 UTC
Is it possible to attach statedump of smbd process when I/O is hung?

Comment 5 rjoseph 2016-11-23 10:01:44 UTC
From the initial analysis it seems its a hang caused due to dead lock between multiple threads.

By default we have 2 epoll threads which will receive socket notifications. bt in Comment 3 shows both the epoll threads are blocked.

Thread1 is doing the graph migration therefore "migration_in_progress" variable is set to 1. And it issues the first lookup and syncop is waiting for a reply.

Thread7 is one of the epoll thread which is sending a parent up notification up. As part of this notification a new graph is created and priv_glfs_subvol_done is called on the old unused graph. priv_glfs_subvol_done calls glfs_lock which is waiting for "migration_in_progress" this variable.
Also the parent up notification is sent via client_notify_dispatch function which sets "ctx->notifying" variable. So all other client notification will be blocked in client_notify_dispatch function.

Thread8 is the second epoll thread which recieved another notification. We call client_notify_dispatch to send this notification above but due to Thread7 it will be blocked (see ctx->notifying).

Thread10 is also waiting on ctx->notifying in client_notify_dispatch function. This thread is executed by timer_proc.

So Thread7 and Thread8 consumed both the epoll threads and are blocked. Therefore Thread1 will never get notification.

Comment 6 rjoseph 2016-11-24 13:30:08 UTC
Patch posted upstream:  http://review.gluster.org/15913

Comment 9 rjoseph 2016-11-30 10:15:48 UTC
Upstream master      : http://review.gluster.org/15913
Upstream release 3.7 : http://review.gluster.org/15976
Upstream release 3.8 : http://review.gluster.org/15977
Upstream release 3.9 : http://review.gluster.org/15978

Downstream Patch     : https://code.engineering.redhat.com/gerrit/91699

Comment 11 Vivek Das 2016-12-20 06:36:13 UTC
Versions:
--------
glusterfs-3.8.4-9.el7rhgs.x86_64
samba-client-4.4.6-3.el7rhgs.x86_64

Followed 
Steps to Reproduce:
1.Do a cifs mount
2.Run dd command say dd if=/dev/zero of=file2 bs=1G count=1024
3.Run ll /mnt/cifs (in a loop)
4.Switch ON & OFF write-behind (in a loop)
5.Keep an eye on the file size

Also Distaf test cases that does a graph switch ON & OFF while dd command is ran

Did not found any kind of hung in the process.
Marking it as verified.

Comment 13 errata-xmlrpc 2017-03-23 06:20:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.