Hide Forgot
Description of problem: Hi! Have the same problems as reported in bug id 1234877. smbd goes into a panic every 6 minutes and produces a core dump. smbd[27140]: [2016/01/22 16:58:22.581586, 0] ../lib/util/fault.c:78(fault_report) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: =============================================================== Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581611, 0] ../lib/util/fault.c:79(fault_report) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: INTERNAL ERROR: Signal 6 in pid 27140 (4.2.3) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: Please read the Trouble-Shooting section of the Samba HOWTO Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581622, 0] ../lib/util/fault.c:81(fault_report) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: =============================================================== Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581629, 0] ../source3/lib/util.c:788(smb_panic_s3) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: PANIC (pid 27140): internal error Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581807, 0] ../source3/lib/util.c:899(log_stack_trace) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: BACKTRACE: 14 stack frames: Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7f2db2310cea] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7f2db2310dc0] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7f2db41608cf] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #3 /lib64/libsamba-util.so.0(+0x1aae6) [0x7f2db4160ae6] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #4 /lib64/libpthread.so.0(+0xf100) [0x7f2db4389100] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #5 /lib64/libc.so.6(gsignal+0x37) [0x7f2db09bf5f7] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #6 /lib64/libc.so.6(abort+0x148) [0x7f2db09c0ce8] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #7 /lib64/libc.so.6(+0x75317) [0x7f2db09ff317] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #8 /lib64/libc.so.6(+0x7cfe1) [0x7f2db0a06fe1] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #9 /lib64/libglusterfs.so.0(gf_timer_call_cancel+0x52) [0x7f2d9bc77652] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #10 /lib64/libglusterfs.so.0(gf_log_inject_timer_event+0x37) [0x7f2d9bc58de7] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #11 /lib64/libglusterfs.so.0(gf_timer_proc+0x10b) [0x7f2d9bc7781b] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #12 /lib64/libpthread.so.0(+0x7dc5) [0x7f2db4381dc5] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #13 /lib64/libc.so.6(clone+0x6d) [0x7f2db0a8021d] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.582688, 0] ../source3/lib/dumpcore.c:318(dump_core) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: dumping core in /var/log/samba/cores/smbd Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: Version-Release number of selected component (if applicable): cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) The gluster and samba packages are coming from the CentOS repo. rpm -qa | grep gluster glusterfs-fuse-3.7.6-1.el7.x86_64 glusterfs-coreutils-0.0.1-0.1.git0c86f7f.el7.x86_64 centos-release-gluster37-1.0-4.el7.centos.noarch glusterfs-3.7.6-1.el7.x86_64 glusterfs-server-3.7.6-1.el7.x86_64 samba-vfs-glusterfs-4.2.3-11.el7_2.x86_64 glusterfs-client-xlators-3.7.6-1.el7.x86_64 glusterfs-cli-3.7.6-1.el7.x86_64 glusterfs-libs-3.7.6-1.el7.x86_64 glusterfs-api-3.7.6-1.el7.x86_64 rpm -qa | grep samba samba-libs-4.2.3-11.el7_2.x86_64 samba-client-libs-4.2.3-11.el7_2.x86_64 samba-vfs-glusterfs-4.2.3-11.el7_2.x86_64 samba-common-4.2.3-11.el7_2.noarch samba-4.2.3-11.el7_2.x86_64 samba-common-tools-4.2.3-11.el7_2.x86_64 samba-common-libs-4.2.3-11.el7_2.x86_64 Volume Name: ch-online Type: Replicate Volume ID: 9f91a44a-edd9-401c-9ecc-a40e7e01332c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: ch-mb-ph-gfs-01:/gfs/brick1/brick Brick2: ch-mb-ph-gfs-02:/gfs/brick1/brick Options Reconfigured: cluster.lookup-optimize: on performance.stat-prefetch: off cluster.ensure-durability: on performance.normal-prio-threads: 16 performance.high-prio-threads: 32 performance.cache-size: 1024MB performance.io-thread-count: 32 cluster.lookup-unhashed: off server.allow-insecure: on performance.readdir-ahead: on client.bind-insecure: on client.event-threads: 8 storage.owner-uid: 10003 storage.owner-gid: 10007 cat /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 0 option event-threads 1 option rpc-auth-allow-insecure on # option base-port 49152 cat /etc/samba/smb.conf [global] netbios name = ch-mb-ph-samba idmap backend = tdb2 private dir = /mnt/ch-online/.smblock/ workgroup = mediabank server string = Samba Server Version %v log file = /var/log/samba/%m.log max log size = 50 security = user map to guest = Bad Password printing = bsd printcap name = /dev/null [customer-data] path = /customer-data read only = no browseable = yes guest ok = no kernel share modes = no force user = mediabank-service create mask = 4770 directory mask = 4770 valid users = mediabank-service vfs objects = glusterfs glusterfs:loglevel = 7 glusterfs:volume = ch-online glusterfs:volfile_server = localhost glusterfs:logfile = /var/log/samba/glusterfs-customer-data.%M.log [MBFileExchangeMTBCH] path = /customer-data/CHMEDIATEC/FileExchange read only = no browseable = yes guest ok = no kernel share modes = no force user = mediabank-service create mask = 4770 directory mask = 4770 valid users = mediabank-service dvb vfs objects = glusterfs glusterfs:loglevel = 7 glusterfs:volume = ch-online glusterfs:volfile_server = localhost glusterfs:logfile = /var/log/samba/glusterfs-fileexchange.%M.log [postprodMTBCH] path = /customer-data/postprod read only = no browseable = yes guest ok = no kernel share modes = no force user = mediabank-service create mask = 4770 directory mask = 4770 valid users = mediabank-service postprod dvb vfs objects = glusterfs glusterfs:loglevel = 7 glusterfs:volume = ch-online glusterfs:volfile_server = localhost glusterfs:logfile = /var/log/samba/glusterfs-postprod.%M.log How reproducible: Just start the smb service and have users access the different shares. There is no need for any heavy load to trigger this issue. Steps to Reproduce: 1. 2. 3. Actual results: smbd not crashing. Expected results: Additional info:
Created attachment 1117982 [details] core dump from smbd
Bug 1234877 was fixed in the Samba package, we'll need to find out if samba-4.2.3-11 contains that patch. If this requires a change to the Samba RPM, please update the product (RHEL?) and component.
Samba-4.2.3 already contains the fix for issue mentioned in the following upstream bug: https://bugzilla.samba.org/show_bug.cgi?id=11115 Back trace provided here is different from what we have seen from https://bugzilla.redhat.com/show_bug.cgi?id=1234877 and needs some investigation. Therefore https://bugzilla.samba.org/show_bug.cgi?id=11115 is not related to this bug. See my reply to the following thread: http://www.gluster.org/pipermail/gluster-users/2016-February/025293.html From a quick look from the dmesg bt, I suspect a race between some glusterfs timer related threads. But need to find the exact root cause.
Hi Anders, Attached core dump file is truncated and its hard to debug from the same. So can you please attach a new complete core dump? I can vaguely suspect an issue regarding the race between gf_timer_proc() and gf_timer_call_cancel() in accessing some already freed content from glusterfs stack. A complete core would help to root cause the issue much easier than from the high-level back trace that we have from /var/log/messages or dmesg.
I see that Mukul in bug id 1315201 has provided some new core dumps. Do you still need some from me? It can take a while to get them, because the system I tested on is not currently running gluster as it was configured when I discovered the bug.
Hi Anders, Recently uploaded cores were also truncated. There are two options to make sure that cores are not getting truncated: sure way -------- See https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c7 should work (it worked for me) ------------------------------ Use prlimit command to change core limit size for a running process (as soon as we have the pid up and running) as follows: # prlimit --pid=<smb-pid> --core=unlimited Please make sure that you provide the correct pid for the mounted share connection(either CIFS or Windows clients). Because prlimit is always associated with a process id. One among the above mentioned changes will allow Samba to produce complete cores.
Hi Anders, Can you please update your glusterfs packages to some version >= 3.7.10 or 3.7.11(which will be available soon)? Because 2 suspected fixes for this issue have merged within 3.7.9 and afterwards.
Hi Anders, Were you able to upgrade glusterfs packages to recent version(glusterfs-3.7.11)? If so, do you see crashes post-upgrade?
Hi Anders, Any updates on this bug?
The following suspected fixes have been present since glusterfs v3.7.10: http://review.gluster.org/#/c/11796/ http://review.gluster.org/#/c/13803/ Since there are no updates from the reporter after upgrading the glusterfs packages to mentioned version we are closing this bug under the assumption that no more crashes were observed. Please feel free to re-open this bug or file a new one as required in case of new issues.