Bug 1387205

Summary: SMB:[MD-Cache]:while connecting and disconnecting samba share multiple times from a windows client , saw multiple crashes
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: surabhi <sbhaloth>
Component: io-threadsAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, pgurusid, pkarampu, ravishankar, rhinduja, rhs-bugs, sbhaloth
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 06:13:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528    

Description surabhi 2016-10-20 10:58:26 UTC
Description of problem:

When a share is connected to a windows client and disconnected multiple times with md-cache enabled and client-io-thread enabled on volume , saw multiple crashes on server and the other share becomes inaccessible.

(gdb) bt
#0  0x00007f75f3ef25f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f75f3ef3ce8 in __GI_abort () at abort.c:90
#2  0x00007f75f5853beb in dump_core () at ../source3/lib/dumpcore.c:322
#3  0x00007f75f5846fe7 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814
#4  0x00007f75f7d3957f in smb_panic (why=why@entry=0x7f75f7d8054a "internal error") at ../lib/util/fault.c:166
#5  0x00007f75f7d39796 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83
#6  sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94
#7  <signal handler called>
#8  list_del_init (old=0x75) at ../../../../libglusterfs/src/list.h:87
#9  __iot_dequeue (conf=conf@entry=0x7f75a401ea90, pri=pri@entry=0x7f75b0191d6c, sleep=sleep@entry=0x7f75b0191d80) at io-threads.c:126
#10 0x00007f75d4f03727 in iot_worker (data=0x7f75a401ea90) at io-threads.c:199
#11 0x00007f75f7f92dc5 in start_thread (arg=0x7f75b0192700) at pthread_create.c:308
#12 0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113


Version-Release number of selected component (if applicable):
glusterfs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64


How reproducible:
Tried Once

Steps to Reproduce:
1.Connect a samba share to windows client and disconnect. execute this multiple times.
2.Access another share from client.
3. 

Actual results:
Another share is not accessible and there are few crashes on the server.


Expected results:
There should not be any crashes and the share should be accessible.

Additional info:

Comment 2 surabhi 2016-10-20 10:59:20 UTC
Thread 20 (Thread 0x7f7586ffd700 (LWP 13160)):
#0  0x00007f75f3fb42c3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f75dd4da1c0 in event_dispatch_epoll_worker (data=0x7f75a40098a0) at event-epoll.c:664
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f7586ffd700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 19 (Thread 0x7f75d82dd700 (LWP 11901)):
#0  0x00007f75f7f9996d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f75dd48dbb6 in gf_timer_proc (data=0x7f75f88cc480) at timer.c:176
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f75d82dd700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 18 (Thread 0x7f7590ff9700 (LWP 13158)):
#0  0x00007f75f7f9996d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f75dd48dbb6 in gf_timer_proc (data=0x7f75f8f2bda0) at timer.c:176
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f7590ff9700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 17 (Thread 0x7f75d78d9700 (LWP 11902)):
#0  0x00007f75f7f93ef7 in pthread_join (threadid=140144095889152, thread_return=thread_return@entry=0x0) at pthread_join.c:92
#1  0x00007f75dd4da768 in event_dispatch_epoll (event_pool=0x7f75f88bb660) at event-epoll.c:758
#2  0x00007f75ddb88c64 in glfs_poller (data=<optimized out>) at glfs.c:612
#3  0x00007f75f7f92dc5 in start_thread (arg=0x7f75d78d9700) at pthread_create.c:308
#4  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 16 (Thread 0x7f75d9696700 (LWP 11900)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75dd4b8d98 in syncenv_task (proc=proc@entry=0x7f75f88bec10) at syncop.c:603
#2  0x00007f75dd4b9be0 in syncenv_processor (thdata=0x7f75f88bec10) at syncop.c:695
#3  0x00007f75f7f92dc5 in start_thread (arg=0x7f75d9696700) at pthread_create.c:308
#4  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 15 (Thread 0x7f75f8354880 (LWP 11512)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f75ddb89cf3 in glfs_lock (fs=fs@entry=0x7f75f8f2c620) at glfs-internal.h:296
#2  glfs_init_wait (fs=fs@entry=0x7f75f8f2c620) at glfs.c:887
#3  0x00007f75ddb8a1c0 in pub_glfs_init (fs=fs@entry=0x7f75f8f2c620) at glfs.c:997
#4  0x00007f75dddac2c8 in vfs_gluster_connect (handle=0x7f75f8838eb0, service=<optimized out>, user=<optimized out>) at ../source3/modules/vfs_glusterfs.c:237
#5  0x00007f75c40236d4 in connect_acl_xattr (handle=0x7f75f95359f0, service=0x7f75f890edf0 "gluster-vol2", user=<optimized out>)
    at ../source3/modules/vfs_acl_xattr.c:182
#6  0x00007f75f78edbc0 in make_connection_snum (xconn=0x7f75f882daf0, conn=conn@entry=0x7f75f89214c0, snum=snum@entry=3, pdev=pdev@entry=0x7f75f7a15f1f "???", 
    vuser=0x7f75f8839280, vuser=0x7f75f8839280) at ../source3/smbd/service.c:678
#7  0x00007f75f78ee961 in make_connection_smb2 (req=req@entry=0x7f75f882ee40, tcon=0x7f75f8f2ba80, snum=snum@entry=3, vuser=0x7f75f8839280, 



    pdev=pdev@entry=0x7f75f7a15f1f "???", pstatus=pstatus@entry=0x7ffe0c3d2660) at ../source3/smbd/service.c:991
#8  0x00007f75f790538f in smbd_smb2_tree_connect (disconnect=0x7f75f882f4cc, out_tree_id=0x7f75f882f4c8, out_maximal_access=0x7f75f882f4c4, 
    out_capabilities=0x7f75f882f4c0, out_share_flags=0x7f75f882f4bc, out_share_type=0x7f75f882f4b8 "", in_path=<optimized out>, req=0x7f75f882ee40)
    at ../source3/smbd/smb2_tcon.c:308
#9  smbd_smb2_tree_connect_send (in_path=<optimized out>, smb2req=0x7f75f882ee40, ev=0x7f75f880f030, mem_ctx=0x7f75f882ee40) at ../source3/smbd/smb2_tcon.c:412
#10 smbd_smb2_request_process_tcon (req=req@entry=0x7f75f882ee40) at ../source3/smbd/smb2_tcon.c:93
#11 0x00007f75f78fe0d3 in smbd_smb2_request_dispatch (req=req@entry=0x7f75f882ee40) at ../source3/smbd/smb2_server.c:2564
#12 0x00007f75f78ff8f2 in smbd_smb2_io_handler (fde_flags=<optimized out>, xconn=0x7f75f882daf0) at ../source3/smbd/smb2_server.c:3861
#13 smbd_smb2_connection_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../source3/smbd/smb2_server.c:3899
#14 0x00007f75f585c39c in run_events_poll (ev=0x7f75f880f030, pollrtn=<optimized out>, pfds=0x7f75f882c760, num_pfds=5) at ../source3/lib/events.c:257
#15 0x00007f75f585c5f0 in s3_event_loop_once (ev=0x7f75f880f030, location=<optimized out>) at ../source3/lib/events.c:326
#16 0x00007f75f428340d in _tevent_loop_once (ev=ev@entry=0x7f75f880f030, location=location@entry=0x7f75f7a36a80 "../source3/smbd/process.c:4117") at ../tevent.c:533
#17 0x00007f75f42835ab in tevent_common_loop_wait (ev=0x7f75f880f030, location=0x7f75f7a36a80 "../source3/smbd/process.c:4117") at ../tevent.c:637
#18 0x00007f75f78ec651 in smbd_process (ev_ctx=ev_ctx@entry=0x7f75f880f030, msg_ctx=msg_ctx@entry=0x7f75f880f120, sock_fd=sock_fd@entry=39, 
    interactive=interactive@entry=false) at ../source3/smbd/process.c:4117
#19 0x00007f75f83d7304 in smbd_accept_connection (ev=0x7f75f880f030, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>)
    at ../source3/smbd/server.c:762
#20 0x00007f75f585c39c in run_events_poll (ev=0x7f75f880f030, pollrtn=<optimized out>, pfds=0x7f75f882c760, num_pfds=7) at ../source3/lib/events.c:257
#21 0x00007f75f585c5f0 in s3_event_loop_once (ev=0x7f75f880f030, location=<optimized out>) at ../source3/lib/events.c:326
#22 0x00007f75f428340d in _tevent_loop_once (ev=ev@entry=0x7f75f880f030, location=location@entry=0x7f75f83da776 "../source3/smbd/server.c:1127") at ../tevent.c:533
#23 0x00007f75f42835ab in tevent_common_loop_wait (ev=0x7f75f880f030, location=0x7f75f83da776 "../source3/smbd/server.c:1127") at ../tevent.c:637
#24 0x00007f75f83d2ad4 in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x7f75f880f030) at ../source3/smbd/server.c:1127
#25 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1780

Thread 14 (Thread 0x7f75d9e97700 (LWP 11899)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75dd4b8d98 in syncenv_task (proc=proc@entry=0x7f75f88be850) at syncop.c:603
#2  0x00007f75dd4b9be0 in syncenv_processor (thdata=0x7f75f88be850) at syncop.c:695
#3  0x00007f75f7f92dc5 in start_thread (arg=0x7f75d9e97700) at pthread_create.c:308
#4  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 13 (Thread 0x7f75dbe50700 (LWP 11898)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75dd4b8d98 in syncenv_task (proc=proc@entry=0x7f75f8878890) at syncop.c:603
#2  0x00007f75dd4b9be0 in syncenv_processor (thdata=0x7f75f8878890) at syncop.c:695
#3  0x00007f75f7f92dc5 in start_thread (arg=0x7f75dbe50700) at pthread_create.c:308
#4  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 12 (Thread 0x7f75d70d8700 (LWP 11903)):
#0  0x00007f75f3fb42c3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f75dd4da1c0 in event_dispatch_epoll_worker (data=0x7f75d0000920) at event-epoll.c:664
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f75d70d8700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 11 (Thread 0x7f75d41c6700 (LWP 11908)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75d4f036f3 in iot_worker (data=0x7f75c8029640) at io-threads.c:176
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f75d41c6700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 10 (Thread 0x7f75dc651700 (LWP 11897)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75dd4b8d98 in syncenv_task (proc=proc@entry=0x7f75f88784d0) at syncop.c:603
#2  0x00007f75dd4b9be0 in syncenv_processor (thdata=0x7f75f88784d0) at syncop.c:695
#3  0x00007f75f7f92dc5 in start_thread (arg=0x7f75dc651700) at pthread_create.c:308
#4  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 9 (Thread 0x7f75c70b0700 (LWP 11909)):
#0  0x00007f75f3fb42c3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f75dd4da1c0 in event_dispatch_epoll_worker (data=0x7f75c8062be0) at event-epoll.c:664
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f75c70b0700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 8 (Thread 0x7f75b1475700 (LWP 13017)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75d4f036f3 in iot_worker (data=0x7f75ada29ed0) at io-threads.c:176
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f75b1475700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 7 (Thread 0x7f75877fe700 (LWP 13159)):
#0  0x00007f75f7f93ef7 in pthread_join (threadid=140142752814848, thread_return=thread_return@entry=0x0) at pthread_join.c:92
#1  0x00007f75dd4da768 in event_dispatch_epoll (event_pool=0x7f75f8946c70) at event-epoll.c:758
#2  0x00007f75ddb88c64 in glfs_poller (data=<optimized out>) at glfs.c:612
#3  0x00007f75f7f92dc5 in start_thread (arg=0x7f75877fe700) at pthread_create.c:308
#4  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 6 (Thread 0x7f7587fff700 (LWP 13157)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75dd4b8d98 in syncenv_task (proc=proc@entry=0x7f75f8953e00) at syncop.c:603
#2  0x00007f75dd4b9be0 in syncenv_processor (thdata=0x7f75f8953e00) at syncop.c:695
#3  0x00007f75f7f92dc5 in start_thread (arg=0x7f7587fff700) at pthread_create.c:308
#4  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 5 (Thread 0x7f7591ffb700 (LWP 13156)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75dd4b8d98 in syncenv_task (proc=proc@entry=0x7f75f8953a40) at syncop.c:603
#2  0x00007f75dd4b9be0 in syncenv_processor (thdata=0x7f75f8953a40) at syncop.c:695
#3  0x00007f75f7f92dc5 in start_thread (arg=0x7f7591ffb700) at pthread_create.c:308

Thread 4 (Thread 0x7f75b1576700 (LWP 12763)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75d4f036f3 in iot_worker (data=0x7f75a46d5650) at io-threads.c:176
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f75b1576700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7f75b1677700 (LWP 12614)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75d4f036f3 in iot_worker (data=0x7f759c91ecf0) at io-threads.c:176
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f75b1677700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7f75b1778700 (LWP 12357)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f75d4f036f3 in iot_worker (data=0x7f75ac91ecf0) at io-threads.c:176
#2  0x00007f75f7f92dc5 in start_thread (arg=0x7f75b1778700) at pthread_create.c:308
#3  0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 1 (Thread 0x7f75b0192700 (LWP 12052)):
#0  0x00007f75f3ef25f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f75f3ef3ce8 in __GI_abort () at abort.c:90
#2  0x00007f75f5853beb in dump_core () at ../source3/lib/dumpcore.c:322
#3  0x00007f75f5846fe7 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814
#4  0x00007f75f7d3957f in smb_panic (why=why@entry=0x7f75f7d8054a "internal error") at ../lib/util/fault.c:166
#5  0x00007f75f7d39796 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83
#6  sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94
#7  <signal handler called>
#8  list_del_init (old=0x75) at ../../../../libglusterfs/src/list.h:87
#9  __iot_dequeue (conf=conf@entry=0x7f75a401ea90, pri=pri@entry=0x7f75b0191d6c, sleep=sleep@entry=0x7f75b0191d80) at io-threads.c:126
#10 0x00007f75d4f03727 in iot_worker (data=0x7f75a401ea90) at io-threads.c:199
#11 0x00007f75f7f92dc5 in start_thread (arg=0x7f75b0192700) at pthread_create.c:308
#12 0x00007f75f3fb3ced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 3 Poornima G 2016-10-27 08:36:31 UTC
I see that, there are io-threads on client side. The crash is related to the client io threads.

Comment 4 Atin Mukherjee 2016-10-27 09:01:55 UTC
Pranith/Ravi - could you check this crash?

Comment 5 Pranith Kumar K 2016-10-27 10:29:43 UTC
https://code.engineering.redhat.com/gerrit/#/c/87972/ is the fix which is already merged. I think we are waiting for surabhi to update the status about this issue. If samba does glfs_init() and glfs_fini() on connect/disconnect it should be same issue. We already have confirmation that related bug https://bugzilla.redhat.com/show_bug.cgi?id=1382065 is fixed with the io-threads patch.

Comment 6 Pranith Kumar K 2016-10-27 17:00:23 UTC
Surabhi,
       Please re-open the bug if you find io-threads crash even after the fix. So far with nfs-ganesha, and samba mount/umount in loop things looked good.

Pranith

Comment 7 surabhi 2016-11-07 10:32:24 UTC
Tried the test with latest builds with following steps and the crash is not seen.

When a share is connected to a windows client and disconnected multiple times with md-cache enabled and client-io-thread enabled on volume.

As no crashes are seen with client-io-thread , moving the BZ to verified with build :glusterfs-3.8.4-3.el7rhgs.x86_64.

Comment 11 errata-xmlrpc 2017-03-23 06:13:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html