Bug 1386177 - SMB[md-cache]:While multiple connect and disconnect of samba share hang is seen and other share becomes inaccessible
Summary: SMB[md-cache]:While multiple connect and disconnect of samba share hang is se...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: md-cache
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Poornima G
QA Contact: Vivek Das
URL:
Whiteboard:
Depends On:
Blocks: 1351528
TreeView+ depends on / blocked
 
Reported: 2016-10-18 10:54 UTC by surabhi
Modified: 2017-03-23 06:11 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.8.4-6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-23 06:11:58 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description surabhi 2016-10-18 10:54:52 UTC
Description of problem:
**********************************
While running a script which does connect and disconnects one of  samba share multiple times where there are other shares also mounted on the same client, we see a hang on server and all other shares becomes unaccessible.Also on the samba server we see two smbd process from the same client and one keeps changing.


Samba version 4.4.5
PID     Username     Group        Machine                                   Protocol Version  Encryption           Signing              
----------------------------------------------------------------------------------------------------------------------------------------
0:26481 -1           -1           10.70.47.63 (ipv4:10.70.47.63:56445)      SMB3_00           -                    -                    
0:25279 root         root         10.70.47.63 (ipv4:10.70.47.63:56444)      SMB3_00           -                    partial(HMAC-SHA256) 


Thread 1 (Thread 0x7f31e672f880 (LWP 25279)):
#0  0x00007f31e63716d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f31ca9ecc3b in syncenv_destroy (env=0x7f31e803fff0) at syncop.c:779
#2  0x00007f31cb0bd605 in pub_glfs_fini (fs=fs@entry=0x7f31e8048ba0) at glfs.c:1215
#3  0x00007f31cb2df076 in glfs_clear_preopened (fs=0x7f31e8048ba0) at ../source3/modules/vfs_glusterfs.c:153
#4  vfs_gluster_disconnect (handle=<optimized out>) at ../source3/modules/vfs_glusterfs.c:270
#5  0x00007f31e5cca111 in close_cnum (conn=0x7f31e792a090, vuid=3997860088) at ../source3/smbd/service.c:1154
#6  0x00007f31e5cf8714 in smbXsrv_tcon_disconnect (tcon=0x7f31e8049780, vuid=3997860088) at ../source3/smbd/smbXsrv_tcon.c:983
#7  0x00007f31e5cdfb1f in smbd_smb2_tdis_wait_done (subreq=0x7f31e792e430) at ../source3/smbd/smb2_tcon.c:631
#8  0x00007f31e265ec34 in tevent_common_loop_immediate () from /lib64/libtevent.so.0
#9  0x00007f31e3c3726c in run_events_poll (ev=0x7f31e7902030, pollrtn=0, pfds=0x0, num_pfds=0) at ../source3/lib/events.c:192
#10 0x00007f31e3c37554 in s3_event_loop_once (ev=0x7f31e7902030, location=<optimized out>) at ../source3/lib/events.c:303
#11 0x00007f31e265e40d in _tevent_loop_once () from /lib64/libtevent.so.0
#12 0x00007f31e265e5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#13 0x00007f31e5cc7651 in smbd_process (ev_ctx=ev_ctx@entry=0x7f31e7902030, msg_ctx=msg_ctx@entry=0x7f31e7902120, sock_fd=sock_fd@entry=39, interactive=interactive@entry=false) at ../source3/smbd/process.c:4117
#14 0x00007f31e67b2304 in smbd_accept_connection (ev=0x7f31e7902030, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../source3/smbd/server.c:762
#15 0x00007f31e3c3739c in run_events_poll (ev=0x7f31e7902030, pollrtn=<optimized out>, pfds=0x7f31e7910c10, num_pfds=7) at ../source3/lib/events.c:257
#16 0x00007f31e3c375f0 in s3_event_loop_once (ev=0x7f31e7902030, location=<optimized out>) at ../source3/lib/events.c:326
#17 0x00007f31e265e40d in _tevent_loop_once () from /lib64/libtevent.so.0
#18 0x00007f31e265e5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#19 0x00007f31e67adad4 in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x7f31e7902030) at ../source3/smbd/server.c:1127
#20 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1780


This happens only with md-cache enabled. Tried disabling it and not seeing the hang.


Version-Release number of selected component (if applicable):
glusterfs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
private md-cache build

How reproducible:
Always


Steps to Reproduce:
1.Create 2 samba shares and map it on windows client
2.run a script which connects and disconnects share 1 for around 30 times
3.After 1 or 2 runs , observe the script output
4. Try to access the second share

Actual results:
The second share becomes inaccessible and the script fails.
On the backend hang is seen.


Expected results:
There should not be a hang and other shares should be accessible.


Additional info:
Sosreports will be updated shortly.

Comment 2 Poornima G 2016-10-19 05:22:08 UTC
I see that the main thread is hung on the synctask thread. Could you please provide the bt of all the threads in the smbd process that is hung.

Comment 3 surabhi 2016-10-19 07:02:34 UTC
# pstack 19338
Thread 14 (Thread 0x7f74811e2700 (LWP 19341)):
#0  0x00007f749d5d5a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7482049d98 in syncenv_task (proc=proc@entry=0x7f749e1b43d0) at syncop.c:603
#2  0x00007f748204abe0 in syncenv_processor (thdata=0x7f749e1b43d0) at syncop.c:695
#3  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 13 (Thread 0x7f74809e1700 (LWP 19342)):
#0  0x00007f749d5d5a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7482049d98 in syncenv_task (proc=proc@entry=0x7f749e1b4790) at syncop.c:603
#2  0x00007f748204abe0 in syncenv_processor (thdata=0x7f749e1b4790) at syncop.c:695
#3  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 12 (Thread 0x7f747ea28700 (LWP 19343)):
#0  0x00007f749d5d5a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7482049d98 in syncenv_task (proc=proc@entry=0x7f749e1fa0a0) at syncop.c:603
#2  0x00007f748204abe0 in syncenv_processor (thdata=0x7f749e1fa0a0) at syncop.c:695
#3  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x7f747e227700 (LWP 19344)):
#0  0x00007f749d5d5a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7482049d98 in syncenv_task (proc=proc@entry=0x7f749e1fa460) at syncop.c:603
#2  0x00007f748204abe0 in syncenv_processor (thdata=0x7f749e1fa460) at syncop.c:695
#3  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7f747ce6e700 (LWP 19345)):
#0  0x00007f749d5d896d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f748201ebb6 in gf_timer_proc (data=0x7f749e207cd0) at timer.c:176
#2  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7f747c46a700 (LWP 19346)):
#0  0x00007f749d5d2ef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f748206b768 in event_dispatch_epoll (event_pool=0x7f749e1f6eb0) at event-epoll.c:758
#2  0x00007f7482719c64 in glfs_poller (data=<optimized out>) at glfs.c:612
#3  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f747bc69700 (LWP 19347)):
#0  0x00007f74995f32c3 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f748206b1c0 in event_dispatch_epoll_worker (data=0x7f7474000920) at event-epoll.c:664
#2  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f746bad4700 (LWP 19356)):
#0  0x00007f74995f32c3 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f748206b1c0 in event_dispatch_epoll_worker (data=0x7f746c05e240) at event-epoll.c:664
#2  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f7459a17700 (LWP 19815)):
#0  0x00007f749d5d5a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7482049d98 in syncenv_task (proc=proc@entry=0x7f749e878e00) at syncop.c:603
#2  0x00007f748204abe0 in syncenv_processor (thdata=0x7f749e878e00) at syncop.c:695
#3  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f744bad4700 (LWP 19816)):
#0  0x00007f749d5d896d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f748201ebb6 in gf_timer_proc (data=0x7f749e860150) at timer.c:176
#2  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f7458916700 (LWP 19817)):
#0  0x00007f749d5d2ef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f748206b768 in event_dispatch_epoll (event_pool=0x7f749e875850) at event-epoll.c:758
#2  0x00007f7482719c64 in glfs_poller (data=<optimized out>) at glfs.c:612
#3  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f744b2d3700 (LWP 19818)):
#0  0x00007f74995f32c3 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f748206b1c0 in event_dispatch_epoll_worker (data=0x7f7450000920) at event-epoll.c:664
#2  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f744aad2700 (LWP 19837)):
#0  0x00007f74995f32c3 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f748206b1c0 in event_dispatch_epoll_worker (data=0x7f74559a31d0) at event-epoll.c:664
#2  0x00007f749d5d1dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f74995f2ced in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f749d993880 (LWP 19338)):
#0  0x00007f749d5d56d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f748204ac3b in syncenv_destroy (env=0x7f749e878a40) at syncop.c:779
#2  0x00007f748271b605 in pub_glfs_fini (fs=fs@entry=0x7f749e248b10) at glfs.c:1215
#3  0x00007f748293d076 in glfs_clear_preopened (fs=0x7f749e248b10) at ../source3/modules/vfs_glusterfs.c:153
#4  vfs_gluster_disconnect (handle=<optimized out>) at ../source3/modules/vfs_glusterfs.c:270
#5  0x00007f749cf2e111 in close_cnum (conn=0x7f749e16d050, vuid=3120287205) at ../source3/smbd/service.c:1154
#6  0x00007f749cf5c714 in smbXsrv_tcon_disconnect (tcon=0x7f749e87cf90, vuid=3120287205) at ../source3/smbd/smbXsrv_tcon.c:983
#7  0x00007f749cf43b1f in smbd_smb2_tdis_wait_done (subreq=0x7f749e243db0) at ../source3/smbd/smb2_tcon.c:631
#8  0x00007f74998c2c34 in tevent_common_loop_immediate () from /lib64/libtevent.so.0
#9  0x00007f749ae9b26c in run_events_poll (ev=0x7f749e14a030, pollrtn=0, pfds=0x0, num_pfds=0) at ../source3/lib/events.c:192
#10 0x00007f749ae9b554 in s3_event_loop_once (ev=0x7f749e14a030, location=<optimized out>) at ../source3/lib/events.c:303
#11 0x00007f74998c240d in _tevent_loop_once () from /lib64/libtevent.so.0
#12 0x00007f74998c25ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#13 0x00007f749cf2b651 in smbd_process (ev_ctx=ev_ctx@entry=0x7f749e14a030, msg_ctx=msg_ctx@entry=0x7f749e14a120, sock_fd=sock_fd@entry=39, interactive=interactive@entry=false) at ../source3/smbd/process.c:4117
#14 0x00007f749da16304 in smbd_accept_connection (ev=0x7f749e14a030, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../source3/smbd/server.c:762
#15 0x00007f749ae9b39c in run_events_poll (ev=0x7f749e14a030, pollrtn=<optimized out>, pfds=0x7f749e158c10, num_pfds=7) at ../source3/lib/events.c:257
#16 0x00007f749ae9b5f0 in s3_event_loop_once (ev=0x7f749e14a030, location=<optimized out>) at ../source3/lib/events.c:326
#17 0x00007f74998c240d in _tevent_loop_once () from /lib64/libtevent.so.0
#18 0x00007f74998c25ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#19 0x00007f749da11ad4 in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x7f749e14a030) at ../source3/smbd/server.c:1127
#20 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1780

Comment 4 Poornima G 2016-11-03 07:48:36 UTC
Fix posted upstream http://review.gluster.org/#/c/15764/2

Comment 5 Michael Adam 2016-11-04 12:22:59 UTC
patch is available upstream --> POST

Comment 6 Atin Mukherjee 2016-11-07 04:17:03 UTC
Poornima - given we have identified the fix, can we devel_ack this BZ for 3.2.0?

Comment 9 rjoseph 2016-11-08 05:26:32 UTC
Upstream master: http://review.gluster.org/15764

Comment 12 Vivek Das 2016-12-13 06:59:21 UTC
Versions
---------
glusterfs-server-3.8.4-8.el7rhgs.x86_64
samba-client-4.4.6-2.el7rhgs.x86_64

Not reproducible with the below steps to reproduce

Steps to Reproduce:
1.Create 2 samba shares and map it on windows client
2.run a script which connects and disconnects share 1 for around 30 times
3.After 1 or 2 runs , observe the script output
4. Try to access the second share

Marking it as verified.

Comment 14 errata-xmlrpc 2017-03-23 06:11:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.