Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1396449 - [SAMBA-CIFS] : IO hungs in cifs mount while graph switch on & off
[SAMBA-CIFS] : IO hungs in cifs mount while graph switch on & off
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: samba (Show other bugs)
3.2
Unspecified Unspecified
high Severity high
: ---
: RHGS 3.2.0
Assigned To: rjoseph
Vivek Das
:
Depends On: 1397754
Blocks: 1351528 1399914 1399915 1399916
  Show dependency treegraph
 
Reported: 2016-11-18 05:41 EST by Vivek Das
Modified: 2017-03-23 02:20 EDT (History)
8 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1397754 (view as bug list)
Environment:
Last Closed: 2017-03-23 02:20:09 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 05:18:45 EDT

  None (edit)
Comment 4 Raghavendra G 2016-11-23 03:51:09 EST
Is it possible to attach statedump of smbd process when I/O is hung?
Comment 5 rjoseph 2016-11-23 05:01:44 EST
From the initial analysis it seems its a hang caused due to dead lock between multiple threads.

By default we have 2 epoll threads which will receive socket notifications. bt in Comment 3 shows both the epoll threads are blocked.

Thread1 is doing the graph migration therefore "migration_in_progress" variable is set to 1. And it issues the first lookup and syncop is waiting for a reply.

Thread7 is one of the epoll thread which is sending a parent up notification up. As part of this notification a new graph is created and priv_glfs_subvol_done is called on the old unused graph. priv_glfs_subvol_done calls glfs_lock which is waiting for "migration_in_progress" this variable.
Also the parent up notification is sent via client_notify_dispatch function which sets "ctx->notifying" variable. So all other client notification will be blocked in client_notify_dispatch function.

Thread8 is the second epoll thread which recieved another notification. We call client_notify_dispatch to send this notification above but due to Thread7 it will be blocked (see ctx->notifying).

Thread10 is also waiting on ctx->notifying in client_notify_dispatch function. This thread is executed by timer_proc.

So Thread7 and Thread8 consumed both the epoll threads and are blocked. Therefore Thread1 will never get notification.
Comment 6 rjoseph 2016-11-24 08:30:08 EST
Patch posted upstream:  http://review.gluster.org/15913
Comment 9 rjoseph 2016-11-30 05:15:48 EST
Upstream master      : http://review.gluster.org/15913
Upstream release 3.7 : http://review.gluster.org/15976
Upstream release 3.8 : http://review.gluster.org/15977
Upstream release 3.9 : http://review.gluster.org/15978

Downstream Patch     : https://code.engineering.redhat.com/gerrit/91699
Comment 11 Vivek Das 2016-12-20 01:36:13 EST
Versions:
--------
glusterfs-3.8.4-9.el7rhgs.x86_64
samba-client-4.4.6-3.el7rhgs.x86_64

Followed 
Steps to Reproduce:
1.Do a cifs mount
2.Run dd command say dd if=/dev/zero of=file2 bs=1G count=1024
3.Run ll /mnt/cifs (in a loop)
4.Switch ON & OFF write-behind (in a loop)
5.Keep an eye on the file size

Also Distaf test cases that does a graph switch ON & OFF while dd command is ran

Did not found any kind of hung in the process.
Marking it as verified.
Comment 13 errata-xmlrpc 2017-03-23 02:20:09 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.