Bug 1399914 - [SAMBA-CIFS] : IO hungs in cifs mount while graph switch on & off
Summary: [SAMBA-CIFS] : IO hungs in cifs mount while graph switch on & off
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: libgfapi
Version: 3.7.17
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: rjoseph
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On: 1396449 1397754
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-30 05:09 UTC by rjoseph
Modified: 2017-01-18 13:39 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.7.19
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1397754
Environment:
Last Closed: 2017-01-18 13:39:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description rjoseph 2016-11-30 05:09:15 UTC
+++ This bug was initially created as a clone of Bug #1397754 +++

+++ This bug was initially created as a clone of Bug #1396449 +++

Description of problem:
Running a dd command in cifs mount leads to hung state when graph switch is engaged to frequent ON & OFF

Version-Release number of selected component (if applicable):
samba-client-4.4.6-2.el7rhgs.x86_64
glusterfs-3.8.4-5.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Do a cifs mount
2.Run dd command say dd if=/dev/zero of=file2 bs=1G count=1024
3.Run ll /mnt/cifs (in a loop)
4.Switch ON & OFF write-behind (in a loop)
5.Keep an eye on the file size

Actual results:
IO hungs

Expected results:
Mount should not hung.

Additional info:
Thread 11 (Thread 0x7f209bbdc700 (LWP 10946)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca83e88 in syncenv_task (proc=proc@entry=0x7f20b899c840) at syncop.c:603
#2  0x00007f209ca84cd0 in syncenv_processor (thdata=0x7f20b899c840) at syncop.c:695
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7f209b3db700 (LWP 10947)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca83e88 in syncenv_task (proc=proc@entry=0x7f20b899cc00) at syncop.c:603
#2  0x00007f209ca84cd0 in syncenv_processor (thdata=0x7f20b899cc00) at syncop.c:695
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7f20993e2700 (LWP 10948)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca83e88 in syncenv_task (proc=proc@entry=0x7f20b89e2ad0) at syncop.c:603
#2  0x00007f209ca84cd0 in syncenv_processor (thdata=0x7f20b89e2ad0) at syncop.c:695
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f2098be1700 (LWP 10949)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca83e88 in syncenv_task (proc=proc@entry=0x7f20b89e2e90) at syncop.c:603
#2  0x00007f209ca84cd0 in syncenv_processor (thdata=0x7f20b89e2e90) at syncop.c:695
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f2097828700 (LWP 10950)):
#0  0x00007f20b8230bdd in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f209ca58ca6 in gf_timer_proc (data=0x7f20b89f0720) at timer.c:176
#2  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f2096e24700 (LWP 10951)):
#0  0x00007f20b822aef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f209caa5858 in event_dispatch_epoll (event_pool=0x7f20b89df8e0) at event-epoll.c:758
#2  0x00007f209d153c64 in glfs_poller (data=<optimized out>) at glfs.c:612
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f2096623700 (LWP 10952)):
#0  0x00007f20b424cd13 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f209caa52b0 in event_dispatch_epoll_worker (data=0x7f2090000920) at event-epoll.c:664
#2  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f20876ff700 (LWP 10957)):
#0  0x00007f20b424cd13 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f209caa52b0 in event_dispatch_epoll_worker (data=0x7f2088031ee0) at event-epoll.c:664
#2  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f207d9fe700 (LWP 10959)):
#0  0x00007f20b82309fd in sendmsg () from /lib64/libpthread.so.0
#1  0x00007f20b372af31 in unix_dgram_send_job (private_data=0x7f20b8a2b540) at ../source3/lib/unix_msg/unix_msg.c:512
#2  0x00007f20b372bbbd in pthreadpool_server (arg=0x7f20b8a2aa10) at ../source3/lib/pthreadpool/pthreadpool.c:547
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f202dc10700 (LWP 14207)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209466380c in iot_worker (data=0x7f2088b4bcc0) at io-threads.c:180
#2  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f20b424c73d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f20b85e98c0 (LWP 10916)):
#0  0x00007f20b822d6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca85b1b in syncop_lookup (subvol=subvol@entry=0x7f2088b0fd30, loc=loc@entry=0x7fff8c075e80, iatt=iatt@entry=0x7fff8c075ec0, parent=parent@entry=0x0, xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1223
#2  0x00007f209d165b27 in glfs_resolve_component (fs=fs@entry=0x7f20b8961330, subvol=subvol@entry=0x7f2088b0fd30, parent=parent@entry=0x7f202b20e06c, component=component@entry=0x7f20b8abdf00 ".", iatt=iatt@entry=0x7fff8c076040, force_lookup=<optimized out>) at glfs-resolve.c:311
#3  0x00007f209d1662d3 in priv_glfs_resolve_at (fs=fs@entry=0x7f20b8961330, subvol=subvol@entry=0x7f2088b0fd30, at=at@entry=0x7f202b20e06c, origpath=origpath@entry=0x7f20b8950770 ".", loc=loc@entry=0x7fff8c0761a0, iatt=iatt@entry=0x7fff8c0761e0, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:417
#4  0x00007f209d16773c in glfs_resolve_path (fs=fs@entry=0x7f20b8961330, subvol=subvol@entry=0x7f2088b0fd30, origpath=origpath@entry=0x7f20b8950770 ".", loc=loc@entry=0x7fff8c0761a0, iatt=iatt@entry=0x7fff8c0761e0, follow=follow@entry=0, reval=reval@entry=0) at glfs-resolve.c:541
#5  0x00007f209d167830 in glfs_lresolve (fs=fs@entry=0x7f20b8961330, subvol=subvol@entry=0x7f2088b0fd30, origpath=origpath@entry=0x7f20b8950770 ".", loc=loc@entry=0x7fff8c0761a0, iatt=iatt@entry=0x7fff8c0761e0, reval=reval@entry=0) at glfs-resolve.c:569
#6  0x00007f209d159a70 in pub_glfs_lstat (fs=0x7f20b8961330, path=0x7f20b8950770 ".", stat=stat@entry=0x7fff8c0762a0) at glfs-fops.c:309
#7  0x00007f209f61836d in vfs_gluster_lstat (handle=<optimized out>, smb_fname=0x7f20b8950660) at ../source3/modules/vfs_glusterfs.c:1002
#8  0x00007f20b7b0a9b9 in dptr_create (conn=conn@entry=0x7f20b894fb60, req=req@entry=0x7f20b8950070, fsp=fsp@entry=0x0, path=0x7f20b89505d0 ".", old_handle=old_handle@entry=false, expect_close=expect_close@entry=true, spid=20993, wcard=wcard@entry=0x7f20b89503a0 "*", wcard_has_wild=false, attr=attr@entry=23, dptr_ret=dptr_ret@entry=0x7fff8c0764e8) at ../source3/smbd/dir.c:489
#9  0x00007f20b7b48ee2 in call_trans2findfirst (conn=conn@entry=0x7f20b894fb60, req=req@entry=0x7f20b8950070, pparams=pparams@entry=0x7f20b8aac5f0, total_params=<optimized out>, ppdata=ppdata@entry=0x7f20b8aac600, total_data=0, max_data_bytes=16384) at ../source3/smbd/trans2.c:2738
#10 0x00007f20b7b55a1a in handle_trans2 (conn=conn@entry=0x7f20b894fb60, req=req@entry=0x7f20b8950070, state=state@entry=0x7f20b8aac590) at ../source3/smbd/trans2.c:9235
#11 0x00007f20b7b582ed in reply_trans2 (req=req@entry=0x7f20b8950070) at ../source3/smbd/trans2.c:9521
#12 0x00007f20b7b7ef50 in switch_message (type=<optimized out>, req=req@entry=0x7f20b8950070) at ../source3/smbd/process.c:1726
#13 0x00007f20b7b80cf0 in construct_reply (deferred_pcd=0x0, encrypted=false, seqnum=0, unread_bytes=0, size=88, inbuf=0x0, xconn=0x7f20b893e9b0) at ../source3/smbd/process.c:1762
#14 process_smb (xconn=xconn@entry=0x7f20b8948eb0, inbuf=<optimized out>, nread=88, unread_bytes=0, seqnum=0, encrypted=<optimized out>, deferred_pcd=deferred_pcd@entry=0x0) at ../source3/smbd/process.c:2008
#15 0x00007f20b7b823e1 in smbd_server_connection_read_handler (xconn=0x7f20b8948eb0, fd=9) at ../source3/smbd/process.c:2607
#16 0x00007f20b5af334c in run_events_poll (ev=0x7f20b891dd10, pollrtn=<optimized out>, pfds=0x7f20b8949c10, num_pfds=6) at ../source3/lib/events.c:257
#17 0x00007f20b5af35a0 in s3_event_loop_once (ev=0x7f20b891dd10, location=<optimized out>) at ../source3/lib/events.c:326
#18 0x00007f20b451a40d in _tevent_loop_once () from /lib64/libtevent.so.0
#19 0x00007f20b451a5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#20 0x00007f20b7b83731 in smbd_process (ev_ctx=ev_ctx@entry=0x7f20b891dd10, msg_ctx=msg_ctx@entry=0x7f20b891de00, sock_fd=sock_fd@entry=9, interactive=interactive@entry=false) at ../source3/smbd/process.c:4117
#21 0x00007f20b866c304 in smbd_accept_connection (ev=0x7f20b891dd10, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../source3/smbd/server.c:762
#22 0x00007f20b5af334c in run_events_poll (ev=0x7f20b891dd10, pollrtn=<optimized out>, pfds=0x7f20b8949c10, num_pfds=7) at ../source3/lib/events.c:257
#23 0x00007f20b5af35a0 in s3_event_loop_once (ev=0x7f20b891dd10, location=<optimized out>) at ../source3/lib/events.c:326
#24 0x00007f20b451a40d in _tevent_loop_once () from /lib64/libtevent.so.0
#25 0x00007f20b451a5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#26 0x00007f20b8667ad4 in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x7f20b891dd10) at ../source3/smbd/server.c:1127
#27 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1780

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-11-18 05:41:43 EST ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Vivek Das on 2016-11-18 06:25:12 EST ---

samba logs, brick logs : http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1396449

--- Additional comment from Vivek Das on 2016-11-23 02:28:09 EST ---

[root@dhcp47-12 ~]# pstack 26853
Thread 14 (Thread 0x7f209bbdc700 (LWP 26854)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca83e88 in syncenv_task (proc=proc@entry=0x7f20b899c6b0) at syncop.c:603
#2  0x00007f209ca84cd0 in syncenv_processor (thdata=0x7f20b899c6b0) at syncop.c:695
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f209b3db700 (LWP 26855)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca83e88 in syncenv_task (proc=proc@entry=0x7f20b899ca70) at syncop.c:603
#2  0x00007f209ca84cd0 in syncenv_processor (thdata=0x7f20b899ca70) at syncop.c:695
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f20993e2700 (LWP 26856)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca83e88 in syncenv_task (proc=proc@entry=0x7f20b89e29b0) at syncop.c:603
#2  0x00007f209ca84cd0 in syncenv_processor (thdata=0x7f20b89e29b0) at syncop.c:695
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f2098be1700 (LWP 26857)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca83e88 in syncenv_task (proc=proc@entry=0x7f20b89e2d70) at syncop.c:603
#2  0x00007f209ca84cd0 in syncenv_processor (thdata=0x7f20b89e2d70) at syncop.c:695
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f2097828700 (LWP 26858)):
#0  0x00007f20b822d6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f20959c296b in client_notify_dispatch (this=this@entry=0x7f2089a78930, event=event@entry=7, data=data@entry=0x0) at client.c:81
#2  0x00007f20959e6758 in client_setvolume_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f209830fce0) at client-handshake.c:1267
#3  0x00007f209cf3fb6c in call_bail () from /lib64/libgfrpc.so.0
#4  0x00007f209ca58c51 in gf_timer_proc (data=0x7f20b89f0600) at timer.c:167
#5  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f2096e24700 (LWP 26859)):
#0  0x00007f20b822aef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f209caa5858 in event_dispatch_epoll (event_pool=0x7f20b89df7c0) at event-epoll.c:758
#2  0x00007f209d153c64 in glfs_poller (data=<optimized out>) at glfs.c:612
#3  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f2096623700 (LWP 26860)):
#0  0x00007f20b822d6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f20959c296b in client_notify_dispatch (this=0x7f2089a77d90, event=event@entry=5, data=data@entry=0x0) at client.c:81
#2  0x00007f20959c29fa in client_notify_dispatch_uniq (this=<optimized out>, event=event@entry=5, data=data@entry=0x0) at client.c:68
#3  0x00007f20959e4236 in client_notify_parents_child_up (this=this@entry=0x7f2089a77d90) at client-handshake.c:137
#4  0x00007f20959e630e in client_post_handshake (frame=frame@entry=0x7f209830e408, this=0x7f2089a77d90) at client-handshake.c:1066
#5  0x00007f20959e6ad6 in client_setvolume_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f209830e408) at client-handshake.c:1235
#6  0x00007f209cf40680 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#7  0x00007f209cf4095f in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#8  0x00007f209cf3c883 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#9  0x00007f2095c11eb4 in socket_event_poll_in (this=this@entry=0x7f2089b935b0) at socket.c:2267
#10 0x00007f2095c14365 in socket_event_handler (fd=<optimized out>, idx=26, data=0x7f2089b935b0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397
#11 0x00007f209caa53d0 in event_dispatch_epoll_handler (event=0x7f2096622d40, event_pool=0x7f20b89df7c0) at event-epoll.c:571
#12 event_dispatch_epoll_worker (data=0x7f2090000920) at event-epoll.c:674
#13 0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f20876ff700 (LWP 26869)):
#0  0x00007f20b822d6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209d1670eb in glfs_lock (fs=0x7f20b89611a0) at glfs-internal.h:299
#2  priv_glfs_subvol_done (fs=fs@entry=0x7f20b89611a0, subvol=subvol@entry=0x7f2081959a80) at glfs-resolve.c:900
#3  0x00007f2096e25c5a in graph_setup (fs=fs@entry=0x7f20b89611a0, graph=graph@entry=0x7f2081992600) at glfs-master.c:71
#4  0x00007f2096e25d9b in notify (this=<optimized out>, event=<optimized out>, data=0x7f2081992600) at glfs-master.c:99
#5  0x00007f209ca493b6 in xlator_notify (xl=0x7f20b89f1000, event=5, data=0x7f2081992600) at xlator.c:496
#6  0x00007f209cae2d5e in default_notify (this=0x7f2081a89140, event=5, data=<optimized out>) at defaults.c:3108
#7  0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a89140, event=event@entry=5, data=data@entry=0x7f2081a88180) at xlator.c:496
#8  0x00007f209cae2c53 in default_notify (this=this@entry=0x7f2081a88180, event=5, data=data@entry=0x7f2081a87220) at defaults.c:3114
#9  0x00007f209424125a in notify (this=0x7f2081a88180, event=<optimized out>, data=0x7f2081a87220) at io-stats.c:3966
#10 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a88180, event=event@entry=5, data=data@entry=0x7f2081a87220) at xlator.c:496
#11 0x00007f209cae2c53 in default_notify (this=0x7f2081a87220, event=5, data=<optimized out>) at defaults.c:3114
#12 0x00007f209445b7f8 in notify (this=<optimized out>, event=<optimized out>, data=<optimized out>) at snapview-client.c:2394
#13 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a87220, event=event@entry=5, data=data@entry=0x7f2081a85040) at xlator.c:496
#14 0x00007f209cae2c53 in default_notify (this=this@entry=0x7f2081a85040, event=event@entry=5, data=data@entry=0x7f2081a842c0) at defaults.c:3114
#15 0x00007f20946658dd in notify (this=0x7f2081a85040, event=5, data=0x7f2081a842c0) at io-threads.c:1045
#16 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a85040, event=event@entry=5, data=data@entry=0x7f2081a842c0) at xlator.c:496
#17 0x00007f209cae2c53 in default_notify (this=0x7f2081a842c0, event=5, data=<optimized out>) at defaults.c:3114
#18 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a842c0, event=event@entry=5, data=data@entry=0x7f2081a83480) at xlator.c:496
#19 0x00007f209cae2c53 in default_notify (this=0x7f2081a83480, event=5, data=<optimized out>) at defaults.c:3114
#20 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a83480, event=event@entry=5, data=data@entry=0x7f2081a82770) at xlator.c:496
#21 0x00007f209cae2c53 in default_notify (this=0x7f2081a82770, event=5, data=<optimized out>) at defaults.c:3114
#22 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a82770, event=event@entry=5, data=data@entry=0x7f2081a818c0) at xlator.c:496
#23 0x00007f209cae2c53 in default_notify (this=0x7f2081a818c0, event=5, data=<optimized out>) at defaults.c:3114
#24 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a818c0, event=event@entry=5, data=data@entry=0x7f2081a80b40) at xlator.c:496
#25 0x00007f209cae2c53 in default_notify (this=0x7f2081a80b40, event=5, data=<optimized out>) at defaults.c:3114
#26 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a80b40, event=event@entry=5, data=data@entry=0x7f2081a7fd00) at xlator.c:496
#27 0x00007f209cae2c53 in default_notify (this=0x7f2081a7fd00, event=5, data=<optimized out>) at defaults.c:3114
#28 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a7fd00, event=event@entry=5, data=data@entry=0x7f2081a7ee00) at xlator.c:496
#29 0x00007f209cae2c53 in default_notify (this=this@entry=0x7f2081a7ee00, event=event@entry=5, data=<optimized out>) at defaults.c:3114
#30 0x00007f209550508f in dht_notify (this=0x7f2081a7ee00, event=5, data=<optimized out>) at dht-common.c:8921
#31 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a7ee00, event=event@entry=5, data=data@entry=0x7f2081a7de90) at xlator.c:496
#32 0x00007f209cae2c53 in default_notify (this=this@entry=0x7f2081a7de90, event=event@entry=5, data=data@entry=0x7f2081a7baf0) at defaults.c:3114
#33 0x00007f2095788b8a in afr_notify (this=<optimized out>, event=5, data=0x7f2081a7baf0, data2=<optimized out>) at afr-common.c:4560
#34 0x00007f2095797cd9 in notify (this=<optimized out>, event=<optimized out>, data=<optimized out>) at afr.c:43
#35 0x00007f209ca493b6 in xlator_notify (xl=0x7f2081a7de90, event=event@entry=5, data=data@entry=0x7f2081a7baf0) at xlator.c:496
#36 0x00007f209cae2c53 in default_notify (this=this@entry=0x7f2081a7baf0, event=event@entry=5, data=data@entry=0x0) at defaults.c:3114
#37 0x00007f20959c2999 in client_notify_dispatch (this=0x7f2081a7baf0, event=event@entry=5, data=data@entry=0x0) at client.c:90
#38 0x00007f20959c29fa in client_notify_dispatch_uniq (this=<optimized out>, event=event@entry=5, data=data@entry=0x0) at client.c:68
#39 0x00007f20959e4236 in client_notify_parents_child_up (this=this@entry=0x7f2081a7baf0) at client-handshake.c:137
#40 0x00007f20959e630e in client_post_handshake (frame=frame@entry=0x7f209830f640, this=0x7f2081a7baf0) at client-handshake.c:1066
#41 0x00007f20959e6ad6 in client_setvolume_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f209830f640) at client-handshake.c:1235
#42 0x00007f209cf40680 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#43 0x00007f209cf4095f in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#44 0x00007f209cf3c883 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#45 0x00007f2095c11eb4 in socket_event_poll_in (this=this@entry=0x7f2081b48040) at socket.c:2267
#46 0x00007f2095c14365 in socket_event_handler (fd=<optimized out>, idx=17, data=0x7f2081b48040, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397
#47 0x00007f209caa53d0 in event_dispatch_epoll_handler (event=0x7f20876fed40, event_pool=0x7f20b89df7c0) at event-epoll.c:571
#48 event_dispatch_epoll_worker (data=0x7f2088031ee0) at event-epoll.c:674
#49 0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#50 0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f1fcf58d700 (LWP 32304)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209466380c in iot_worker (data=0x7f208998be70) at io-threads.c:180
#2  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f1fcb663700 (LWP 897)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209466380c in iot_worker (data=0x7f2081863b50) at io-threads.c:180
#2  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f1fc7739700 (LWP 946)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209466380c in iot_worker (data=0x7f20819957e0) at io-threads.c:180
#2  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f1fc380f700 (LWP 986)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209466380c in iot_worker (data=0x7f2081ac4e30) at io-threads.c:180
#2  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f1fbf8e5700 (LWP 1157)):
#0  0x00007f20b822da82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209466380c in iot_worker (data=0x7f2089ac1d20) at io-threads.c:180
#2  0x00007f20b8229dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f20b424c73d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f20b85e98c0 (LWP 26853)):
#0  0x00007f20b822d6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f209ca85b1b in syncop_lookup (subvol=subvol@entry=0x7f2081827bc0, loc=loc@entry=0x7fff8c076040, iatt=iatt@entry=0x7fff8c076080, parent=parent@entry=0x0, xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1223
#2  0x00007f209d1655a3 in glfs_refresh_inode_safe (subvol=subvol@entry=0x7f2081827bc0, oldinode=oldinode@entry=0x7f1fccb8b128, need_lookup=need_lookup@entry=_gf_false) at glfs-resolve.c:115
#3  0x00007f209d16685d in glfs_migrate_fd_safe (fs=fs@entry=0x7f20b89611a0, newsubvol=newsubvol@entry=0x7f2081827bc0, oldfd=oldfd@entry=0x7f1fcbfae06c) at glfs-resolve.c:658
#4  0x00007f209d166e18 in __glfs_migrate_fd (fs=fs@entry=0x7f20b89611a0, newsubvol=newsubvol@entry=0x7f2081827bc0, glfd=glfd@entry=0x7f20b8a2c5d0) at glfs-resolve.c:751
#5  0x00007f209d167051 in __glfs_migrate_openfds (fs=fs@entry=0x7f20b89611a0, subvol=subvol@entry=0x7f2081827bc0) at glfs-resolve.c:814
#6  0x00007f209d16726b in __glfs_active_subvol (fs=fs@entry=0x7f20b89611a0) at glfs-resolve.c:869
#7  0x00007f209d167525 in priv_glfs_active_subvol (fs=0x7f20b89611a0) at glfs-resolve.c:924
#8  0x00007f209d15bae0 in pub_glfs_pwritev (glfd=0x7f20b8a2c5d0, iovec=iovec@entry=0x7fff8c0764c0, iovcnt=iovcnt@entry=1, offset=4697620480, flags=0) at glfs-fops.c:1073
#9  0x00007f209d15bcda in pub_glfs_pwrite (glfd=<optimized out>, buf=<optimized out>, count=<optimized out>, offset=<optimized out>, flags=<optimized out>) at glfs-fops.c:1162
#10 0x00007f20b7b6e65c in vfs_pwrite_data (req=req@entry=0x7f20b8a2d7c0, fsp=fsp@entry=0x7f20b8a2b8a0, buffer=buffer@entry=0x7f20b8a2f814 "", N=N@entry=65536, offset=offset@entry=4697620480) at ../source3/smbd/vfs.c:502
#11 0x00007f20b7b0d6fe in real_write_file (req=req@entry=0x7f20b8a2d7c0, fsp=fsp@entry=0x7f20b8a2b8a0, data=data@entry=0x7f20b8a2f814 "", pos=pos@entry=4697620480, n=n@entry=65536) at ../source3/smbd/fileio.c:128
#12 0x00007f20b7b0e35c in write_file (req=req@entry=0x7f20b8a2d7c0, fsp=fsp@entry=0x7f20b8a2b8a0, data=data@entry=0x7f20b8a2f814 "", pos=4697620480, n=n@entry=65536) at ../source3/smbd/fileio.c:381
#13 0x00007f20b7b3694b in reply_write_and_X (req=req@entry=0x7f20b8a2d7c0) at ../source3/smbd/reply.c:5101
#14 0x00007f20b7b7ef50 in switch_message (type=<optimized out>, req=req@entry=0x7f20b8a2d7c0) at ../source3/smbd/process.c:1726
#15 0x00007f20b7b80cf0 in construct_reply (deferred_pcd=0x0, encrypted=false, seqnum=0, unread_bytes=0, size=65604, inbuf=0x0, xconn=0x7f20b893e9b0) at ../source3/smbd/process.c:1762
#16 process_smb (xconn=xconn@entry=0x7f20b8948eb0, inbuf=<optimized out>, nread=65604, unread_bytes=0, seqnum=0, encrypted=<optimized out>, deferred_pcd=deferred_pcd@entry=0x0) at ../source3/smbd/process.c:2008
#17 0x00007f20b7b823e1 in smbd_server_connection_read_handler (xconn=0x7f20b8948eb0, fd=9) at ../source3/smbd/process.c:2607
#18 0x00007f20b5af334c in run_events_poll (ev=0x7f20b891dd10, pollrtn=<optimized out>, pfds=0x7f20b8949c10, num_pfds=5) at ../source3/lib/events.c:257
#19 0x00007f20b5af35a0 in s3_event_loop_once (ev=0x7f20b891dd10, location=<optimized out>) at ../source3/lib/events.c:326
#20 0x00007f20b451a40d in _tevent_loop_once () from /lib64/libtevent.so.0
#21 0x00007f20b451a5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#22 0x00007f20b7b83731 in smbd_process (ev_ctx=ev_ctx@entry=0x7f20b891dd10, msg_ctx=msg_ctx@entry=0x7f20b891de00, sock_fd=sock_fd@entry=9, interactive=interactive@entry=false) at ../source3/smbd/process.c:4117
#23 0x00007f20b866c304 in smbd_accept_connection (ev=0x7f20b891dd10, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../source3/smbd/server.c:762
#24 0x00007f20b5af334c in run_events_poll (ev=0x7f20b891dd10, pollrtn=<optimized out>, pfds=0x7f20b8949c10, num_pfds=7) at ../source3/lib/events.c:257
#25 0x00007f20b5af35a0 in s3_event_loop_once (ev=0x7f20b891dd10, location=<optimized out>) at ../source3/lib/events.c:326
#26 0x00007f20b451a40d in _tevent_loop_once () from /lib64/libtevent.so.0
#27 0x00007f20b451a5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#28 0x00007f20b8667ad4 in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x7f20b891dd10) at ../source3/smbd/server.c:1127

--- Additional comment from Raghavendra G on 2016-11-23 03:51:09 EST ---

Is it possible to attach statedump of smbd process when I/O is hung?

--- Additional comment from  on 2016-11-23 05:01:44 EST ---

From the initial analysis it seems its a hang caused due to dead lock between multiple threads.

By default we have 2 epoll threads which will receive socket notifications. bt in Comment 3 shows both the epoll threads are blocked.

Thread1 is doing the graph migration therefore "migration_in_progress" variable is set to 1. And it issues the first lookup and syncop is waiting for a reply.

Thread7 is one of the epoll thread which is sending a parent up notification up. As part of this notification a new graph is created and priv_glfs_subvol_done is called on the old unused graph. priv_glfs_subvol_done calls glfs_lock which is waiting for "migration_in_progress" this variable.
Also the parent up notification is sent via client_notify_dispatch function which sets "ctx->notifying" variable. So all other client notification will be blocked in client_notify_dispatch function.

Thread8 is the second epoll thread which recieved another notification. We call client_notify_dispatch to send this notification above but due to Thread7 it will be blocked (see ctx->notifying).

Thread10 is also waiting on ctx->notifying in client_notify_dispatch function. This thread is executed by timer_proc.

So Thread7 and Thread8 consumed both the epoll threads and are blocked. Therefore Thread1 will never get notification.

--- Additional comment from Worker Ant on 2016-11-23 05:31:57 EST ---

REVIEW: http://review.gluster.org/15913 (gfapi: glfs_subvol_done should not call glfs_lock) posted (#1) for review on master by Rajesh Joseph (rjoseph)

--- Additional comment from Worker Ant on 2016-11-25 08:41:33 EST ---

REVIEW: http://review.gluster.org/15913 (gfapi: glfs_subvol_done should not call glfs_lock) posted (#2) for review on master by Rajesh Joseph (rjoseph)

--- Additional comment from Worker Ant on 2016-11-25 08:43:55 EST ---

REVIEW: http://review.gluster.org/15913 (gfapi: glfs_subvol_done should not call glfs_lock) posted (#3) for review on master by Rajesh Joseph (rjoseph)

--- Additional comment from Worker Ant on 2016-11-25 08:56:35 EST ---

REVIEW: http://review.gluster.org/15913 (gfapi: glfs_subvol_done should not call glfs_lock) posted (#4) for review on master by Rajesh Joseph (rjoseph)

--- Additional comment from Worker Ant on 2016-11-29 11:40:19 EST ---

REVIEW: http://review.gluster.org/15913 (gfapi: glfs_subvol_done should NOT wait for graph migration.) posted (#5) for review on master by Rajesh Joseph (rjoseph)

Comment 1 Worker Ant 2016-11-30 09:19:24 UTC
REVIEW: http://review.gluster.org/15976 (gfapi: glfs_subvol_done should NOT wait for graph migration.) posted (#1) for review on release-3.7 by Rajesh Joseph (rjoseph)

Comment 2 Worker Ant 2016-12-20 07:30:02 UTC
COMMIT: http://review.gluster.org/15976 committed in release-3.7 by Kaushal M (kaushal) 
------
commit b2c5d8ee2fe0f8baa7641d564ad86bb815f9509c
Author: Rajesh Joseph <rjoseph>
Date:   Tue Nov 22 22:25:42 2016 +0530

    gfapi: glfs_subvol_done should NOT wait for graph migration.
    
    In graph_setup function glfs_subvol_done is called which
    is executed in an epoll thread. glfs_lock waits on other
    thread to finish graph migration. This can lead to dead lock
    if we consume all the epoll threads.
    
    In general any call-back function executed in epoll thread
    should not call any blocking call which waits on a network
    reply either directly or indirectly, e.g. syncop functions
    should not be called in these threads.
    
    As a fix we should not wait for migration in the call-back path.
    
    > Reviewed-on: http://review.gluster.org/15913
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > Smoke: Gluster Build System <jenkins.org>
    > Reviewed-by: Niels de Vos <ndevos>
    > CentOS-regression: Gluster Build System <jenkins.org>
    (cherry picked from commit 17d10b42fc4041442e6cd0bfda45944edea498c6)
    
    Change-Id: If96d0689fe1b4d74631e383048cdc30b01690dc2
    BUG: 1399914
    Signed-off-by: Rajesh Joseph <rjoseph>
    Reviewed-on: http://review.gluster.org/15976
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Kaushal M <kaushal>

Comment 3 Kaushal 2017-01-18 13:39:24 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.19, please open a new bug report.

glusterfs-3.7.19 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/gluster-users/2017-January/029623.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.