1322681 – SMB3 multi-channel+CTDB: smbd crashes if one of the NIC is brought down in ctdb setup with i/o's going on via multi-channel from node 1

Bug 1322681 - SMB3 multi-channel+CTDB: smbd crashes if one of the NIC is brought down in ctdb setup with i/o's going on via multi-channel from node 1

Summary: SMB3 multi-channel+CTDB: smbd crashes if one of the NIC is brought down in ct...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	samba
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Guenther Deschner
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-31 06:13 UTC by surabhi
Modified:	2018-11-20 11:18 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-20 11:18:03 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description surabhi 2016-03-31 06:13:43 UTC

Description of problem:

When a ctdb setup is done on RHGS setup and testing multi-channel following are the steps and results:

bringing down one interface leads to CTDB node going to banned state, smbd dumping core and at a time one client gets connected to both the servers.

1. Configure multiple interfaces (in this case 3)and set multi-channel option to yes in smb.conf.
2. mount the share on windows client with vip1 and check the netstat to observe I/o's going on from all the interfaces and verify the ctdb status, then bring down the NIC1 with which the share is mounted and check the i/o status and netstat.
As soon as the NIC1 is brought down the ctdb nodes goes to banned state and smbd dumps cores.Also when the interface is brought up , there is a point when client is connected to both the servers accessing the same file.

the stack trace is as follows:


Stack trace of thread 2047:
                #0  0x00007f300f3d4a98 raise (libc.so.6)
                #1  0x00007f300f3d669a abort (libc.so.6)
                #2  0x00007f3010d35c0a dump_core (libsmbconf.so.0)
                #3  0x00007f3010d28efb smb_panic_s3 (libsmbconf.so.0)
                #4  0x00007f30131eab1f smb_panic (libsamba-util.so.0)
                #5  0x00007f300f978d9e _talloc_steal_internal (libtalloc.so.2)
                #6  0x00007f3012daa2d5 smbd_smb2_request_allocate (libsmbd-base-samba4.so)
                #7  0x00007f3012daa3a3 smbd_smb2_request_next_incoming (libsmbd-base-samba4.so)
                #8  0x00007f3012db015e smbd_smb2_connection_handler (libsmbd-base-samba4.so)
                #9  0x00007f3010d3e657 run_events_poll (libsmbconf.so.0)
                #10 0x00007f3010d3e8b7 s3_event_loop_once (libsmbconf.so.0)
                #11 0x00007f300f76511d _tevent_loop_once (libtevent.so.0)
                #12 0x00007f300f7652bb tevent_common_loop_wait (libtevent.so.0)
                #13 0x00007f3012d9d6a9 smbd_process (libsmbd-base-samba4.so)
                #14 0x0000560d259d31f0 smbd_accept_connection (smbd)
                #15 0x00007f3010d3e657 run_events_poll (libsmbconf.so.0)
                #16 0x00007f3010d3e8b7 s3_event_loop_once (libsmbconf.so.0)
                #17 0x00007f300f76511d _tevent_loop_once (libtevent.so.0)
                #18 0x00007f300f7652bb tevent_common_loop_wait (libtevent.so.0)
                #19 0x0000560d259cf1e3 main (smbd)
                #20 0x00007f300f3c0580 __libc_start_main (libc.so.6)
                #21 0x0000560d259cf4f9 _start (smbd)
               
                Stack trace of thread 2048:
                #0  0x00007f3013448eb9 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f2ff8ab25e8 syncenv_task (libglusterfs.so.0)
                #2  0x00007f2ff8ab31c0 syncenv_processor (libglusterfs.so.0)
                #3  0x00007f301344360a start_thread (libpthread.so.0)
                #4  0x00007f300f4a2bbd __clone (libc.so.6)
               
                Stack trace of thread 2049:
                #0  0x00007f3013448eb9 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f2ff8ab25e8 syncenv_task (libglusterfs.so.0)
                #2  0x00007f2ff8ab31c0 syncenv_processor (libglusterfs.so.0)
                #3  0x00007f301344360a start_thread (libpthread.so.0)
                #4  0x00007f300f4a2bbd __clone (libc.so.6)

                Stack trace of thread 2051:
                #0  0x00007f30134446ad pthread_join (libpthread.so.0)
                #1  0x00007f2ff8ace78b event_dispatch_epoll (libglusterfs.so.0)
                #2  0x00007f2ff9152494 glfs_poller (libgfapi.so.0)
                #3  0x00007f301344360a start_thread (libpthread.so.0)
                #4  0x00007f300f4a2bbd __clone (libc.so.6)
               
                Stack trace of thread 2052:
                #0  0x00007f300f4a31b3 epoll_wait (libc.so.6)
                #1  0x00007f2ff8ace2f8 event_dispatch_epoll_worker (libglusterfs.so.0)
                #2  0x00007f301344360a start_thread (libpthread.so.0)
                #3  0x00007f300f4a2bbd __clone (libc.so.6)
               
                Stack trace of thread 2050:
                #0  0x00007f301344c27d __nanosleep (libpthread.so.0)
                #1  0x00007f2ff8a8ffc4 gf_timer_proc (libglusterfs.so.0)
                #2  0x00007f301344360a start_thread (libpthread.so.0)
                #3  0x00007f300f4a2bbd __clone (libc.so.6)
               
                Stack trace of thread 2053:
                #0  0x00007f300f4a31b3 epoll_wait (libc.so.6)
                #1  0x00007f2ff8ace2f8 event_dispatch_epoll_worker (libglusterfs.so.0)
                #2  0x00007f301344360a start_thread (libpthread.so.0)
                #3  0x00007f300f4a2bbd __clone (libc.so.6)




Version-Release number of selected component (if applicable):
samba4.4rhgs

How reproducible:
Always

Steps to Reproduce:
1.As mentioned in description
2.
3.

Actual results:
smbd crashes and even if multiple NIC's are up io's do not continue from node 1 becaus eof ctdb ip failover to another node.



Expected results:
smbd should not crash and ctdb node should not go to banned state.
As per multi-channel expectation the i/o's should keep on going from node one until atleast one NIC is up, but with ctdb the node failover happens and at a point client should not be connected form both teh servers accessing same file. 


Additional info:

Comment 3 Michael Adam 2016-04-29 08:44:36 UTC

These are actually 2 bugs:

- one is the crash.
- one is the fact that failover leads to multi-channel session connected to two different nodes.

We need to fix this if we want to have any chance of even calling MC tech-preview...

Comment 5 Amar Tumballi 2018-04-19 04:16:40 UTC

Closed the samba bugs in bulk when PM_Score was less than 0. As the team was working on few of them, opening all of them.

Note You need to log in before you can comment on or make changes to this bug.