Is it possible to attach statedump of smbd process when I/O is hung?
From the initial analysis it seems its a hang caused due to dead lock between multiple threads.
By default we have 2 epoll threads which will receive socket notifications. bt in Comment 3 shows both the epoll threads are blocked.
Thread1 is doing the graph migration therefore "migration_in_progress" variable is set to 1. And it issues the first lookup and syncop is waiting for a reply.
Thread7 is one of the epoll thread which is sending a parent up notification up. As part of this notification a new graph is created and priv_glfs_subvol_done is called on the old unused graph. priv_glfs_subvol_done calls glfs_lock which is waiting for "migration_in_progress" this variable.
Also the parent up notification is sent via client_notify_dispatch function which sets "ctx->notifying" variable. So all other client notification will be blocked in client_notify_dispatch function.
Thread8 is the second epoll thread which recieved another notification. We call client_notify_dispatch to send this notification above but due to Thread7 it will be blocked (see ctx->notifying).
Thread10 is also waiting on ctx->notifying in client_notify_dispatch function. This thread is executed by timer_proc.
So Thread7 and Thread8 consumed both the epoll threads and are blocked. Therefore Thread1 will never get notification.
Patch posted upstream: http://review.gluster.org/15913
Upstream master : http://review.gluster.org/15913
Upstream release 3.7 : http://review.gluster.org/15976
Upstream release 3.8 : http://review.gluster.org/15977
Upstream release 3.9 : http://review.gluster.org/15978
Downstream Patch : https://code.engineering.redhat.com/gerrit/91699
Steps to Reproduce:
1.Do a cifs mount
2.Run dd command say dd if=/dev/zero of=file2 bs=1G count=1024
3.Run ll /mnt/cifs (in a loop)
4.Switch ON & OFF write-behind (in a loop)
5.Keep an eye on the file size
Also Distaf test cases that does a graph switch ON & OFF while dd command is ran
Did not found any kind of hung in the process.
Marking it as verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.