Bug 1301120

Summary: smbd crashes with 3.7.6 and VFS module 4.2.3
Product: [Community] GlusterFS Reporter: Anders Rydmell <anders>
Component: gluster-smbAssignee: Raghavendra Talur <rtalur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.6CC: anders, anoopcs, bugs, hgowtham
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: anders: needinfo-
anders: needinfo-
anders: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.10 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1315201 (view as bug list) Environment:
Last Closed: 2016-09-02 09:27:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
core dump from smbd none

Description Anders Rydmell 2016-01-22 16:10:22 UTC
Description of problem:
Hi!

Have the same problems as reported in bug id 1234877.

smbd goes into a panic every 6 minutes and produces a core dump.

smbd[27140]: [2016/01/22 16:58:22.581586,  0] ../lib/util/fault.c:78(fault_report)
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:  ===============================================================
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581611,  0] ../lib/util/fault.c:79(fault_report)
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:  INTERNAL ERROR: Signal 6 in pid 27140 (4.2.3)
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:  Please read the Trouble-Shooting section of the Samba HOWTO
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581622,  0] ../lib/util/fault.c:81(fault_report)
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:  ===============================================================
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581629,  0] ../source3/lib/util.c:788(smb_panic_s3)
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:  PANIC (pid 27140): internal error
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581807,  0] ../source3/lib/util.c:899(log_stack_trace)
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:  BACKTRACE: 14 stack frames:
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7f2db2310cea]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7f2db2310dc0]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7f2db41608cf]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #3 /lib64/libsamba-util.so.0(+0x1aae6) [0x7f2db4160ae6]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #4 /lib64/libpthread.so.0(+0xf100) [0x7f2db4389100]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #5 /lib64/libc.so.6(gsignal+0x37) [0x7f2db09bf5f7]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #6 /lib64/libc.so.6(abort+0x148) [0x7f2db09c0ce8]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #7 /lib64/libc.so.6(+0x75317) [0x7f2db09ff317]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #8 /lib64/libc.so.6(+0x7cfe1) [0x7f2db0a06fe1]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #9 /lib64/libglusterfs.so.0(gf_timer_call_cancel+0x52) [0x7f2d9bc77652]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #10 /lib64/libglusterfs.so.0(gf_log_inject_timer_event+0x37) [0x7f2d9bc58de7]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #11 /lib64/libglusterfs.so.0(gf_timer_proc+0x10b) [0x7f2d9bc7781b]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #12 /lib64/libpthread.so.0(+0x7dc5) [0x7f2db4381dc5]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:   #13 /lib64/libc.so.6(clone+0x6d) [0x7f2db0a8021d]
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.582688,  0] ../source3/lib/dumpcore.c:318(dump_core)
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]:  dumping core in /var/log/samba/cores/smbd
Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: 


Version-Release number of selected component (if applicable):
cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core)

The gluster and samba packages are coming from the CentOS repo.

rpm -qa | grep gluster
glusterfs-fuse-3.7.6-1.el7.x86_64
glusterfs-coreutils-0.0.1-0.1.git0c86f7f.el7.x86_64
centos-release-gluster37-1.0-4.el7.centos.noarch
glusterfs-3.7.6-1.el7.x86_64
glusterfs-server-3.7.6-1.el7.x86_64
samba-vfs-glusterfs-4.2.3-11.el7_2.x86_64
glusterfs-client-xlators-3.7.6-1.el7.x86_64
glusterfs-cli-3.7.6-1.el7.x86_64
glusterfs-libs-3.7.6-1.el7.x86_64
glusterfs-api-3.7.6-1.el7.x86_64

rpm -qa | grep samba
samba-libs-4.2.3-11.el7_2.x86_64
samba-client-libs-4.2.3-11.el7_2.x86_64
samba-vfs-glusterfs-4.2.3-11.el7_2.x86_64
samba-common-4.2.3-11.el7_2.noarch
samba-4.2.3-11.el7_2.x86_64
samba-common-tools-4.2.3-11.el7_2.x86_64
samba-common-libs-4.2.3-11.el7_2.x86_64

Volume Name: ch-online
Type: Replicate
Volume ID: 9f91a44a-edd9-401c-9ecc-a40e7e01332c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ch-mb-ph-gfs-01:/gfs/brick1/brick
Brick2: ch-mb-ph-gfs-02:/gfs/brick1/brick
Options Reconfigured:
cluster.lookup-optimize: on
performance.stat-prefetch: off
cluster.ensure-durability: on
performance.normal-prio-threads: 16
performance.high-prio-threads: 32
performance.cache-size: 1024MB
performance.io-thread-count: 32
cluster.lookup-unhashed: off
server.allow-insecure: on
performance.readdir-ahead: on
client.bind-insecure: on
client.event-threads: 8
storage.owner-uid: 10003
storage.owner-gid: 10007

cat /etc/glusterfs/glusterd.vol
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option ping-timeout 0
    option event-threads 1
    option rpc-auth-allow-insecure on
#   option base-port 49152

cat /etc/samba/smb.conf
[global]
    netbios name = ch-mb-ph-samba
    idmap backend = tdb2
    private dir = /mnt/ch-online/.smblock/
    workgroup = mediabank
    server string = Samba Server Version %v
    log file = /var/log/samba/%m.log
    max log size = 50
    security = user
    map to guest = Bad Password
    printing = bsd
    printcap name = /dev/null

[customer-data]
    path = /customer-data
    read only = no
    browseable = yes
    guest ok = no
    kernel share modes = no
    force user = mediabank-service
    create mask = 4770
    directory mask = 4770
    valid users = mediabank-service
    vfs objects = glusterfs
    glusterfs:loglevel = 7
    glusterfs:volume = ch-online
    glusterfs:volfile_server = localhost
    glusterfs:logfile = /var/log/samba/glusterfs-customer-data.%M.log

[MBFileExchangeMTBCH]
    path = /customer-data/CHMEDIATEC/FileExchange
    read only = no
    browseable = yes
    guest ok = no
    kernel share modes = no
    force user = mediabank-service
    create mask = 4770
    directory mask = 4770
    valid users = mediabank-service dvb
    vfs objects = glusterfs
    glusterfs:loglevel = 7
    glusterfs:volume = ch-online
    glusterfs:volfile_server = localhost
    glusterfs:logfile = /var/log/samba/glusterfs-fileexchange.%M.log

[postprodMTBCH]
    path = /customer-data/postprod
    read only = no
    browseable = yes
    guest ok = no
    kernel share modes = no
    force user = mediabank-service
    create mask = 4770
    directory mask = 4770
    valid users = mediabank-service postprod dvb
    vfs objects = glusterfs
    glusterfs:loglevel = 7
    glusterfs:volume = ch-online
    glusterfs:volfile_server = localhost
    glusterfs:logfile = /var/log/samba/glusterfs-postprod.%M.log

How reproducible:
Just start the smb service and have users access the different shares. There is no need for any heavy load to trigger this issue.

Steps to Reproduce:
1.
2.
3.

Actual results:
smbd not crashing.

Expected results:


Additional info:

Comment 1 Anders Rydmell 2016-01-25 13:09:28 UTC
Created attachment 1117982 [details]
core dump from smbd

Comment 2 Niels de Vos 2016-03-06 06:30:31 UTC
Bug 1234877 was fixed in the Samba package, we'll need to find out if samba-4.2.3-11 contains that patch. If this requires a change to the Samba RPM, please update the product (RHEL?) and component.

Comment 3 Anoop C S 2016-03-07 07:45:02 UTC
Samba-4.2.3 already contains the fix for issue mentioned in the following upstream bug:

https://bugzilla.samba.org/show_bug.cgi?id=11115

Back trace provided here is different from what we have seen from https://bugzilla.redhat.com/show_bug.cgi?id=1234877 and needs some investigation. Therefore https://bugzilla.samba.org/show_bug.cgi?id=11115 is not related to this bug.

See my reply to the following thread:

http://www.gluster.org/pipermail/gluster-users/2016-February/025293.html

From a quick look from the dmesg bt, I suspect a race between some glusterfs timer related threads. But need to find the exact root cause.

Comment 4 Anoop C S 2016-03-09 02:31:14 UTC
Hi Anders,

Attached core dump file is truncated and its hard to debug from the same. So can you please attach a new complete core dump? I can vaguely suspect an issue regarding the race between gf_timer_proc() and gf_timer_call_cancel() in accessing some already freed content from glusterfs stack. A complete core would help to root cause the issue much easier than from the high-level back trace that we have from /var/log/messages or dmesg.

Comment 5 Anders Rydmell 2016-03-09 12:27:21 UTC
I see that Mukul in bug id 1315201 has provided some new core dumps.
Do you still need some from me?
It can take a while to get them, because the system I tested on is not currently running gluster as it was configured when I discovered the bug.

Comment 6 Anoop C S 2016-03-10 05:53:45 UTC
Hi Anders,

Recently uploaded cores were also truncated. There are two options to make sure that cores are not getting truncated:

sure way
--------
See https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c7

should work (it worked for me)
------------------------------
Use prlimit command to change core limit size for a running process (as soon as we have the pid up and running) as follows:
# prlimit --pid=<smb-pid> --core=unlimited
Please make sure that you provide the correct pid for the mounted share connection(either CIFS or Windows clients). Because prlimit is always associated with a process id.

One among the above mentioned changes will allow Samba to produce complete cores.

Comment 7 Anoop C S 2016-04-08 09:30:16 UTC
Hi Anders,

Can you please update your glusterfs packages to some version >= 3.7.10 or 3.7.11(which will be available soon)? Because 2 suspected fixes for this issue have merged within 3.7.9 and afterwards.

Comment 8 Anoop C S 2016-05-28 06:26:36 UTC
Hi Anders,

Were you able to upgrade glusterfs packages to recent version(glusterfs-3.7.11)? If so, do you see crashes post-upgrade?

Comment 9 Anoop C S 2016-07-07 06:57:35 UTC
Hi Anders,

Any updates on this bug?

Comment 10 Anoop C S 2016-09-02 09:27:08 UTC
The following suspected fixes have been present since glusterfs v3.7.10:

http://review.gluster.org/#/c/11796/
http://review.gluster.org/#/c/13803/

Since there are no updates from the reporter after upgrading the glusterfs packages to mentioned version we are closing this bug under the assumption that no more crashes were observed. Please feel free to re-open this bug or file a new one as required in case of new issues.