1400948 – [RHEL7] [SAMBA-CTDB]IP failover with ctdb leads to smbd crash

Bug 1400948 - [RHEL7] [SAMBA-CTDB]IP failover with ctdb leads to smbd crash

Summary: [RHEL7] [SAMBA-CTDB]IP failover with ctdb leads to smbd crash

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	samba
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Anoop C S
QA Contact:	Vivek Das
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528 1406287
TreeView+	depends on / blocked

Reported:	2016-12-02 11:43 UTC by Vivek Das
Modified:	2017-03-23 05:20 UTC (History)
CC List:	9 users (show)
Fixed In Version:	samba-4.4.6-3.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1406287 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:20:17 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0495	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 samba security, bug fixes and enhancement update	2017-03-23 09:18:26 UTC
Samba Project	12372	0	None	None	None	2016-12-06 06:57:50 UTC

Description Vivek Das 2016-12-02 11:43:34 UTC

Description of problem:
On a already established 4 node gluster cluster with samba ctdb setup. Mount the samba share using the public ip in windows client and start creating huge number of zero kb files (say 10000). On the server side stop ctdb one buy one in 3 out of 4 nodes (command : ctdb stop) and after the ip failover  when we do restart of ctdb one by one on those 3 nodes we are getting smbd core in /var/log/samba/core/smbd

Version-Release number of selected component (if applicable):
samba-client-4.4.6-2.el7rhgs.x86_64
glusterfs-3.8.4-6.el7rhgs.x86_64
Windows10

How reproducible:
Always

Steps to Reproduce:
1.On an available Four node ctdb samba setup
2.Mount the samba share using VIP in windows client
3.Start a script that creates around 10000 zero kb files
4.While the script in progress run command "ctdb stop" in 3 out of 4 nodes one by one. Stop in one wait for the ip failover, then go for the other.
5.Check the IO process which should be still running
6.Do a ctdb restart (service ctdb restart) in those 3 nodes one by one waiting for each individual state to be OK.
7.Check for the cores in /var/log/samba/core/smbd

Actual results:
Cores generated

Expected results:
No core should come

Additional info:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/smbd'.
Program terminated with signal 6, Aborted.
#0  0x00007f4137dc81d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install samba-4.4.6-2.el7rhgs.x86_64
(gdb) bt
#0  0x00007f4137dc81d7 in raise () from /lib64/libc.so.6
#1  0x00007f4137dc98c8 in abort () from /lib64/libc.so.6
#2  0x00007f4139728b9b in dump_core () from /lib64/libsmbconf.so.0

#3  0x00007f413971bf97 in smb_panic_s3 () from /lib64/libsmbconf.so.0
#4  0x00007f413bc0e57f in smb_panic () from /lib64/libsamba-util.so.0
#5  0x00007f413bc0e796 in sig_fault () from /lib64/libsamba-util.so.0

#6  <signal handler called>
#7  0x00007f4138366a5a in _talloc_free () from /lib64/libtalloc.so.2
#8  0x00007f41394f4be2 in ctdbd_migrate () from /usr/lib64/samba/libsamba-cluster-support-samba4.so

#9  0x00007f41394efe8f in fetch_locked_internal () from /usr/lib64/samba/libsamba-cluster-support-samba4.so
#10 0x00007f4135e39df0 in dbwrap_fetch_locked_internal () from /usr/lib64/samba/libdbwrap-samba4.so

#11 0x00007f413b838f7b in get_share_mode_lock () from /usr/lib64/samba/libsmbd-base-samba4.so
#12 0x00007f413b7a0754 in open_file_ntcreate () from /usr/lib64/samba/libsmbd-base-samba4.so

#13 0x00007f413b7a4119 in create_file_unixpath () from /usr/lib64/samba/libsmbd-base-samba4.so
#14 0x00007f413b7a516f in create_file_default () from /usr/lib64/samba/libsmbd-base-samba4.so

#15 0x00007f413b88613e in vfswrap_create_file () from /usr/lib64/samba/libsmbd-base-samba4.so
#16 0x00007f413b7abe38 in smb_vfs_call_create_file () from /usr/lib64/samba/libsmbd-base-samba4.so

#17 0x00007f413b7dd2e1 in smbd_smb2_request_process_create () from /usr/lib64/samba/libsmbd-base-samba4.so
#18 0x00007f413b7d3434 in smbd_smb2_request_dispatch () from /usr/lib64/samba/libsmbd-base-samba4.so

#19 0x00007f413b7d4a02 in smbd_smb2_connection_handler () from /usr/lib64/samba/libsmbd-base-samba4.so
#20 0x00007f413973134c in run_events_poll () from /lib64/libsmbconf.so.0

#21 0x00007f41397315a0 in s3_event_loop_once () from /lib64/libsmbconf.so.0
#22 0x00007f413815840d in _tevent_loop_once () from /lib64/libtevent.so.0
#23 0x00007f41381585ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#24 0x00007f413b7c1731 in smbd_process () from /usr/lib64/samba/libsmbd-base-samba4.so
#25 0x00007f413c2aa304 in smbd_accept_connection ()
#26 0x00007f413973134c in run_events_poll () from /lib64/libsmbconf.so.0
#27 0x00007f41397315a0 in s3_event_loop_once () from /lib64/libsmbconf.so.0
#28 0x00007f413815840d in _tevent_loop_once () from /lib64/libtevent.so.0
#29 0x00007f41381585ab in tevent_common_loop_wait () from /lib64/libtevent.so.0

Comment 3 surabhi 2016-12-02 13:00:19 UTC

Took initial look and the issue seems like samba upstream BZ https://bugzilla.samba.org/show_bug.cgi?id=12372 .
This may be only one part , rest needs to be looked at.

Comment 4 Michael Adam 2016-12-06 08:48:12 UTC

How is this different from bug #1400957 ?

Comment 5 Michael Adam 2016-12-06 08:55:48 UTC

(In reply to Michael Adam from comment #4)
> How is this different from bug #1400957 ?

More precisely:

The steps to reproduce are exactly the same!
And they both claim to be always reproducible.

But the effects are *different* BTs.
That's why I am confused.

Comment 7 Vivek Das 2016-12-07 03:29:58 UTC

(In reply to Michael Adam from comment #5)
> (In reply to Michael Adam from comment #4)
> > How is this different from bug #1400957 ?
> 
> More precisely:
> 
> The steps to reproduce are exactly the same!
> And they both claim to be always reproducible.
> 
> But the effects are *different* BTs.
> That's why I am confused.

Looks like i missed to mention that this bug #1400948 is found in SSl enabled setup. Enabled for management & IO.

#1400957 : Is reproduced on a non ssl enabled set up.

This is the only difference.

Comment 9 Anoop C S 2016-12-08 07:51:14 UTC

Upstream fix: https://git.samba.org/?p=samba.git;a=commit;h=4194c0797f78293fe48105ce5af70f36a3c233a8

Comment 12 surabhi 2016-12-19 14:01:11 UTC

On the build provided in #C11  tried the test case mentioned in bug description and the core mntioned in the BZ is not seen.

Tried following:

1. Start i/o's from windows client. (10000 byte files) with vip 1 mounted share
2. ctdb stop on node1 , wait for it to go to stop , ctdb stop on node2,then on node 3 . keep watching failover and then stop on othe rnodes.
3. restart ctdb service on each node one by one.
4. verify I/O's are running and there are no cores.

tried few more scenarios with multiple stop and restart options.

The cores mentioned in bug description is not seen , however saw multiple cores getting generated with ctdb restart on node1 with the bt mentioned in BZ https://bugzilla.redhat.com/show_bug.cgi?id=1400957.

After reboot the issue is not seen. Will discuss more with rhs-smb team and take action on another bZ. This bz seems to be fixed with above build.

Comment 15 Vivek Das 2016-12-30 06:57:48 UTC

Version
----------
samba-client-libs-4.4.6-4.el7rhgs.x86_64
glusterfs-cli-3.8.4-10.el7rhgs.x86_64

Followed the above mentioned steps to reproduce did not find any cores. No issues faced.

Marking it as verified.

Comment 17 errata-xmlrpc 2017-03-23 05:20:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0495.html

Note You need to log in before you can comment on or make changes to this bug.