Description of problem: When a ctdb setup is done on RHGS setup and testing multi-channel following are the steps and results: bringing down one interface leads to CTDB node going to banned state, smbd dumping core and at a time one client gets connected to both the servers. 1. Configure multiple interfaces (in this case 3)and set multi-channel option to yes in smb.conf. 2. mount the share on windows client with vip1 and check the netstat to observe I/o's going on from all the interfaces and verify the ctdb status, then bring down the NIC1 with which the share is mounted and check the i/o status and netstat. As soon as the NIC1 is brought down the ctdb nodes goes to banned state and smbd dumps cores.Also when the interface is brought up , there is a point when client is connected to both the servers accessing the same file. the stack trace is as follows: Stack trace of thread 2047: #0 0x00007f300f3d4a98 raise (libc.so.6) #1 0x00007f300f3d669a abort (libc.so.6) #2 0x00007f3010d35c0a dump_core (libsmbconf.so.0) #3 0x00007f3010d28efb smb_panic_s3 (libsmbconf.so.0) #4 0x00007f30131eab1f smb_panic (libsamba-util.so.0) #5 0x00007f300f978d9e _talloc_steal_internal (libtalloc.so.2) #6 0x00007f3012daa2d5 smbd_smb2_request_allocate (libsmbd-base-samba4.so) #7 0x00007f3012daa3a3 smbd_smb2_request_next_incoming (libsmbd-base-samba4.so) #8 0x00007f3012db015e smbd_smb2_connection_handler (libsmbd-base-samba4.so) #9 0x00007f3010d3e657 run_events_poll (libsmbconf.so.0) #10 0x00007f3010d3e8b7 s3_event_loop_once (libsmbconf.so.0) #11 0x00007f300f76511d _tevent_loop_once (libtevent.so.0) #12 0x00007f300f7652bb tevent_common_loop_wait (libtevent.so.0) #13 0x00007f3012d9d6a9 smbd_process (libsmbd-base-samba4.so) #14 0x0000560d259d31f0 smbd_accept_connection (smbd) #15 0x00007f3010d3e657 run_events_poll (libsmbconf.so.0) #16 0x00007f3010d3e8b7 s3_event_loop_once (libsmbconf.so.0) #17 0x00007f300f76511d _tevent_loop_once (libtevent.so.0) #18 0x00007f300f7652bb tevent_common_loop_wait (libtevent.so.0) #19 0x0000560d259cf1e3 main (smbd) #20 0x00007f300f3c0580 __libc_start_main (libc.so.6) #21 0x0000560d259cf4f9 _start (smbd) Stack trace of thread 2048: #0 0x00007f3013448eb9 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f2ff8ab25e8 syncenv_task (libglusterfs.so.0) #2 0x00007f2ff8ab31c0 syncenv_processor (libglusterfs.so.0) #3 0x00007f301344360a start_thread (libpthread.so.0) #4 0x00007f300f4a2bbd __clone (libc.so.6) Stack trace of thread 2049: #0 0x00007f3013448eb9 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f2ff8ab25e8 syncenv_task (libglusterfs.so.0) #2 0x00007f2ff8ab31c0 syncenv_processor (libglusterfs.so.0) #3 0x00007f301344360a start_thread (libpthread.so.0) #4 0x00007f300f4a2bbd __clone (libc.so.6) Stack trace of thread 2051: #0 0x00007f30134446ad pthread_join (libpthread.so.0) #1 0x00007f2ff8ace78b event_dispatch_epoll (libglusterfs.so.0) #2 0x00007f2ff9152494 glfs_poller (libgfapi.so.0) #3 0x00007f301344360a start_thread (libpthread.so.0) #4 0x00007f300f4a2bbd __clone (libc.so.6) Stack trace of thread 2052: #0 0x00007f300f4a31b3 epoll_wait (libc.so.6) #1 0x00007f2ff8ace2f8 event_dispatch_epoll_worker (libglusterfs.so.0) #2 0x00007f301344360a start_thread (libpthread.so.0) #3 0x00007f300f4a2bbd __clone (libc.so.6) Stack trace of thread 2050: #0 0x00007f301344c27d __nanosleep (libpthread.so.0) #1 0x00007f2ff8a8ffc4 gf_timer_proc (libglusterfs.so.0) #2 0x00007f301344360a start_thread (libpthread.so.0) #3 0x00007f300f4a2bbd __clone (libc.so.6) Stack trace of thread 2053: #0 0x00007f300f4a31b3 epoll_wait (libc.so.6) #1 0x00007f2ff8ace2f8 event_dispatch_epoll_worker (libglusterfs.so.0) #2 0x00007f301344360a start_thread (libpthread.so.0) #3 0x00007f300f4a2bbd __clone (libc.so.6) Version-Release number of selected component (if applicable): samba4.4rhgs How reproducible: Always Steps to Reproduce: 1.As mentioned in description 2. 3. Actual results: smbd crashes and even if multiple NIC's are up io's do not continue from node 1 becaus eof ctdb ip failover to another node. Expected results: smbd should not crash and ctdb node should not go to banned state. As per multi-channel expectation the i/o's should keep on going from node one until atleast one NIC is up, but with ctdb the node failover happens and at a point client should not be connected form both teh servers accessing same file. Additional info:
These are actually 2 bugs: - one is the crash. - one is the fact that failover leads to multi-channel session connected to two different nodes. We need to fix this if we want to have any chance of even calling MC tech-preview...
Closed the samba bugs in bulk when PM_Score was less than 0. As the team was working on few of them, opening all of them.