Bug 1333360
Summary: | Samba: Multiple smbd crashes (notifyd) after a ctdb-internal network interface is brought down in a ctdb cluster. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | surabhi <sbhaloth> |
Component: | samba | Assignee: | Michael Adam <madam> |
Status: | CLOSED ERRATA | QA Contact: | surabhi <sbhaloth> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.1 | CC: | gdeschner, madam, nlevinki, rcyriac, rhinduja |
Target Milestone: | --- | Keywords: | Regression, ZStream |
Target Release: | RHGS 3.1.3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | samba-4.4.3-5.el7rhgs | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-06-23 05:37:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1311817 |
Description
surabhi
2016-05-05 11:34:03 UTC
Core file is copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1333360/ Also in the first case the samba and glusterfs log level was set to 10 , just changed that to 3 (may not be useful for this issue but did that as dev recommended) and after following the steps mentioned in this bug, hit smbd crash again but this time only one crash and not multiple May 6 10:17:58 dhcp47-10 smbd[23301]: BACKTRACE: 17 stack frames: May 6 10:17:58 dhcp47-10 smbd[23301]: #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7f8be5c4deaa] May 6 10:17:58 dhcp47-10 smbd[23301]: #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7f8be5c4df80] May 6 10:17:58 dhcp47-10 smbd[23301]: #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7f8be813f57f] May 6 10:17:58 dhcp47-10 smbd[23301]: #3 /lib64/libsamba-util.so.0(+0x24796) [0x7f8be813f796] May 6 10:17:58 dhcp47-10 smbd[23301]: #4 /lib64/libpthread.so.0(+0xf100) [0x7f8be83a0100] May 6 10:17:58 dhcp47-10 smbd[23301]: #5 /usr/lib64/samba/libdbwrap-samba4.so(dbwrap_traverse_read+0x7) [0x7f8be236d237] May 6 10:17:58 dhcp47-10 smbd[23301]: #6 /usr/lib64/samba/libsmbd-base-samba4.so(+0x83bf0) [0x7f8be7c3fbf0] May 6 10:17:58 dhcp47-10 smbd[23301]: #7 /lib64/libtalloc.so.2(_talloc_free+0x440) [0x7f8be4898e80] May 6 10:17:58 dhcp47-10 smbd[23301]: #8 /usr/lib64/samba/libsmbd-base-samba4.so(+0x84cb8) [0x7f8be7c40cb8] May 6 10:17:58 dhcp47-10 smbd[23301]: #9 /lib64/libtevent.so.0(tevent_common_loop_timer_delay+0xcf) [0x7f8be468eb4f] May 6 10:17:58 dhcp47-10 smbd[23301]: #10 /lib64/libsmbconf.so.0(run_events_poll+0x1c9) [0x7f8be5c63479] May 6 10:17:58 dhcp47-10 smbd[23301]: #11 /lib64/libsmbconf.so.0(+0x35670) [0x7f8be5c63670] May 6 10:17:58 dhcp47-10 smbd[23301]: #12 /lib64/libtevent.so.0(_tevent_loop_once+0x8d) [0x7f8be468a40d] May 6 10:17:58 dhcp47-10 smbd[23301]: #13 /lib64/libtevent.so.0(tevent_req_poll+0x1f) [0x7f8be468b6df] May 6 10:17:58 dhcp47-10 smbd[23301]: #14 /usr/sbin/smbd(main+0xa53) [0x7f8be87d7f03] May 6 10:17:58 dhcp47-10 smbd[23301]: #15 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f8be42e6b15] May 6 10:17:58 dhcp47-10 smbd[23301]: #16 /usr/sbin/smbd(+0x84a1) [0x7f8be87d94a1] May 6 10:17:58 dhcp47-10 smbd[23301]: [2016/05/06 10:17:58.961271, 0, pid=23301, effective(0, 0), real(0, 0)] ../source3/lib/dumpcore.c:318(dump_core) May 6 10:17:58 dhcp47-10 smbd[23301]: dumping core in /var/log/samba/cores/smbd The configuration is as follows: 1. There are two CTDB nodes with following nw config: Node1 : eth0, eth1, eth2, eth3 Node2 : eth0, eth1 eth0 on both the nodes : On Public network eth1, eth2, eth3 NIC's are in private network(with static IP's configured) used for internal communication between nodes. The steps to reproduce: 1. Create a dis-rep volume and mount it on windows client(which also has NIC configured in private network) using VIP(corresponding to eth1) of node1. 2. Start copying a large file from windows local share to samba share. 3. Bring down the interface eth1 with the command "ifdown eth1" 4. Observe the IP failover. 5. Once the failover has happened,Bring up eth1 with command "ifup eth1" 6. Observe ctdb status and check /var/log/messages and log.smbd for any cores. Result: The ctdb node goes to banned state and there is a smbd crash with following BT : (gdb) bt #0 0x00007f2c471765f7 in raise () from /lib64/libc.so.6 #1 0x00007f2c47177ce8 in abort () from /lib64/libc.so.6 #2 0x00007f2c48ad6beb in dump_core () at ../source3/lib/dumpcore.c:322 #3 0x00007f2c48ac9fe7 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814 #4 0x00007f2c4afbb57f in smb_panic (why=why@entry=0x7f2c4b00254a "internal error") at ../lib/util/fault.c:166 #5 0x00007f2c4afbb796 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83 #6 sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94 #7 <signal handler called> #8 dbwrap_traverse_read (db=0x0, f=f@entry=0x7f2c4aabe210 <notifyd_db_del_syswatches>, private_data=private_data@entry=0x0, count=count@entry=0x0) at ../lib/dbwrap/dbwrap.c:361 #9 0x00007f2c4aabbc40 in notifyd_peer_destructor (p=p@entry=0x7f2c4c9a8e60) at ../source3/smbd/notifyd/notifyd.c:1249 #10 0x00007f2c47714e80 in _talloc_free_internal (location=<optimized out>, ptr=<optimized out>) at ../talloc.c:1046 #11 _talloc_free (ptr=0x7f2c4c9a8e60, location=0x7f2c4ac73ac0 "../source3/smbd/notifyd/notifyd.c:1154") at ../talloc.c:1647 #12 0x00007f2c4aabcd08 in notifyd_clean_peers_next (subreq=<optimized out>) at ../source3/smbd/notifyd/notifyd.c:1154 #13 0x00007f2c4750ab4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7f2c4c998df0) at ../tevent_timed.c:341 #14 0x00007f2c48adf3f9 in run_events_poll (ev=0x7f2c4c998df0, pollrtn=0, pfds=0x7f2c4c9a7f50, num_pfds=4) at ../source3/lib/events.c:199 #15 0x00007f2c48adf5f0 in s3_event_loop_once (ev=0x7f2c4c998df0, location=<optimized out>) at ../source3/lib/events.c:326 #16 0x00007f2c4750640d in _tevent_loop_once (ev=ev@entry=0x7f2c4c998df0, location=location@entry=0x7f2c4750c5c5 "../tevent_req.c:256") at ../tevent.c:533 #17 0x00007f2c475076df in tevent_req_poll (req=req@entry=0x7f2c4c9a5440, ev=ev@entry=0x7f2c4c998df0) at ../tevent_req.c:256 #18 0x00007f2c4b653f03 in smbd_notifyd_init (interactive=false, msg=0x7f2c4c998ee0) at ../source3/smbd/server.c:411 #19 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1597 Uploaded the sosreports @http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1333360/ The node file has static IP entries for eth1 on both the nodes. cat /etc/ctdb/nodes 192.168.XXX.X 192.168.XXX.X The public_addresses file has VIP's entries for eth1. 192.168.XXX.XX/24 eth1 192.168.XXX.XX/24 eth1 Let me know if any other information is needed. Verified the BZ with following steps: 1. There are two CTDB nodes with following nw config: Node1 : eth0, eth1, eth2, eth3 Node2 : eth0, eth1 eth0 on both the nodes : On Public network eth1, eth2, eth3 NIC's are in private network(with static IP's configured) used for internal communication between nodes. The steps to reproduce: 1. Create a dis-rep volume and mount it on windows client(which also has NIC configured in private network) using VIP(corresponding to eth1) of node1. 2. Start copying a large file from windows local share to samba share. 3. Bring down the interface eth1 with the command "ifdown eth1" 4. Observe the IP failover. 5. Once the failover has happened,Bring up eth1 with command "ifup eth1" 6. Observe ctdb status and check /var/log/messages and log.smbd for any cores. There are no crashes seen. The ctdb node remains in banned state which is being discussed in another BZ. Marking this BZ as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1245 |