Bug 1322681

Summary: SMB3 multi-channel+CTDB: smbd crashes if one of the NIC is brought down in ctdb setup with i/o's going on via multi-channel from node 1
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: surabhi <sbhaloth>
Component: sambaAssignee: Guenther Deschner <gdeschner>
Status: CLOSED NOTABUG QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amainkar, gdeschner, madam, nlevinki, rhinduja, rhs-smb, vdas
Target Milestone: ---Keywords: Reopened, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-20 11:18:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description surabhi 2016-03-31 06:13:43 UTC
Description of problem:

When a ctdb setup is done on RHGS setup and testing multi-channel following are the steps and results:

bringing down one interface leads to CTDB node going to banned state, smbd dumping core and at a time one client gets connected to both the servers.

1. Configure multiple interfaces (in this case 3)and set multi-channel option to yes in smb.conf.
2. mount the share on windows client with vip1 and check the netstat to observe I/o's going on from all the interfaces and verify the ctdb status, then bring down the NIC1 with which the share is mounted and check the i/o status and netstat.
As soon as the NIC1 is brought down the ctdb nodes goes to banned state and smbd dumps cores.Also when the interface is brought up , there is a point when client is connected to both the servers accessing the same file.

the stack trace is as follows:


Stack trace of thread 2047:
                #0  0x00007f300f3d4a98 raise (libc.so.6)
                #1  0x00007f300f3d669a abort (libc.so.6)
                #2  0x00007f3010d35c0a dump_core (libsmbconf.so.0)
                #3  0x00007f3010d28efb smb_panic_s3 (libsmbconf.so.0)
                #4  0x00007f30131eab1f smb_panic (libsamba-util.so.0)
                #5  0x00007f300f978d9e _talloc_steal_internal (libtalloc.so.2)
                #6  0x00007f3012daa2d5 smbd_smb2_request_allocate (libsmbd-base-samba4.so)
                #7  0x00007f3012daa3a3 smbd_smb2_request_next_incoming (libsmbd-base-samba4.so)
                #8  0x00007f3012db015e smbd_smb2_connection_handler (libsmbd-base-samba4.so)
                #9  0x00007f3010d3e657 run_events_poll (libsmbconf.so.0)
                #10 0x00007f3010d3e8b7 s3_event_loop_once (libsmbconf.so.0)
                #11 0x00007f300f76511d _tevent_loop_once (libtevent.so.0)
                #12 0x00007f300f7652bb tevent_common_loop_wait (libtevent.so.0)
                #13 0x00007f3012d9d6a9 smbd_process (libsmbd-base-samba4.so)
                #14 0x0000560d259d31f0 smbd_accept_connection (smbd)
                #15 0x00007f3010d3e657 run_events_poll (libsmbconf.so.0)
                #16 0x00007f3010d3e8b7 s3_event_loop_once (libsmbconf.so.0)
                #17 0x00007f300f76511d _tevent_loop_once (libtevent.so.0)
                #18 0x00007f300f7652bb tevent_common_loop_wait (libtevent.so.0)
                #19 0x0000560d259cf1e3 main (smbd)
                #20 0x00007f300f3c0580 __libc_start_main (libc.so.6)
                #21 0x0000560d259cf4f9 _start (smbd)
               
                Stack trace of thread 2048:
                #0  0x00007f3013448eb9 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f2ff8ab25e8 syncenv_task (libglusterfs.so.0)
                #2  0x00007f2ff8ab31c0 syncenv_processor (libglusterfs.so.0)
                #3  0x00007f301344360a start_thread (libpthread.so.0)
                #4  0x00007f300f4a2bbd __clone (libc.so.6)
               
                Stack trace of thread 2049:
                #0  0x00007f3013448eb9 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f2ff8ab25e8 syncenv_task (libglusterfs.so.0)
                #2  0x00007f2ff8ab31c0 syncenv_processor (libglusterfs.so.0)
                #3  0x00007f301344360a start_thread (libpthread.so.0)
                #4  0x00007f300f4a2bbd __clone (libc.so.6)

                Stack trace of thread 2051:
                #0  0x00007f30134446ad pthread_join (libpthread.so.0)
                #1  0x00007f2ff8ace78b event_dispatch_epoll (libglusterfs.so.0)
                #2  0x00007f2ff9152494 glfs_poller (libgfapi.so.0)
                #3  0x00007f301344360a start_thread (libpthread.so.0)
                #4  0x00007f300f4a2bbd __clone (libc.so.6)
               
                Stack trace of thread 2052:
                #0  0x00007f300f4a31b3 epoll_wait (libc.so.6)
                #1  0x00007f2ff8ace2f8 event_dispatch_epoll_worker (libglusterfs.so.0)
                #2  0x00007f301344360a start_thread (libpthread.so.0)
                #3  0x00007f300f4a2bbd __clone (libc.so.6)
               
                Stack trace of thread 2050:
                #0  0x00007f301344c27d __nanosleep (libpthread.so.0)
                #1  0x00007f2ff8a8ffc4 gf_timer_proc (libglusterfs.so.0)
                #2  0x00007f301344360a start_thread (libpthread.so.0)
                #3  0x00007f300f4a2bbd __clone (libc.so.6)
               
                Stack trace of thread 2053:
                #0  0x00007f300f4a31b3 epoll_wait (libc.so.6)
                #1  0x00007f2ff8ace2f8 event_dispatch_epoll_worker (libglusterfs.so.0)
                #2  0x00007f301344360a start_thread (libpthread.so.0)
                #3  0x00007f300f4a2bbd __clone (libc.so.6)




Version-Release number of selected component (if applicable):
samba4.4rhgs

How reproducible:
Always

Steps to Reproduce:
1.As mentioned in description
2.
3.

Actual results:
smbd crashes and even if multiple NIC's are up io's do not continue from node 1 becaus eof ctdb ip failover to another node.



Expected results:
smbd should not crash and ctdb node should not go to banned state.
As per multi-channel expectation the i/o's should keep on going from node one until atleast one NIC is up, but with ctdb the node failover happens and at a point client should not be connected form both teh servers accessing same file. 


Additional info:

Comment 3 Michael Adam 2016-04-29 08:44:36 UTC
These are actually 2 bugs:

- one is the crash.
- one is the fact that failover leads to multi-channel session connected to two different nodes.

We need to fix this if we want to have any chance of even calling MC tech-preview...

Comment 5 Amar Tumballi 2018-04-19 04:16:40 UTC
Closed the samba bugs in bulk when PM_Score was less than 0. As the team was working on few of them, opening all of them.