Bug 1317940

Summary: smbd crashes while accessing multiple volume shares via same client
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Anoop C S <anoopcs>
Component: sambaAssignee: rhs-smb <rhs-smb>
Status: CLOSED ERRATA QA Contact: Vivek Das <vdas>
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.1CC: aheverle, asrivast, madam, mmalhotr, nlevinki, olim, pgurusid, rcyriac, rhinduja, rhs-smb, rjoseph, rtalur, sankarshan, sbhaloth
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.3   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: glusterfs-3.7.9-2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1319374 (view as bug list) Environment:
Last Closed: 2016-06-23 05:11:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186580, 1279676, 1311817, 1315201, 1319374, 1319989    

Description Anoop C S 2016-03-15 14:46:55 UTC
Description of problem:
Consider the case where we have a setup with different gluster volumes shared through Samba. Crashes are seen in racy scenarios where same client connects to/disconnects from those different shares for which backtrace similar to the one given below is observed from core dump:

(gdb) bt
#0  0x00007f4a94e28625 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f4a94e29e05 in abort () at abort.c:92
#2  0x00007f4a96793f21 in dump_core () at ../source3/lib/dumpcore.c:336
#3  0x00007f4a9677e1a0 in smb_panic_s3 (why=<value optimized out>) at ../source3/lib/util.c:808
#4  0x00007f4a97efcac1 in smb_panic (why=0x7f4a97f0bcf5 "eturned status %d\n") at ../lib/util/fault.c:159
#5  0x00007f4a97efcb82 in fault_report (sig=11) at ../lib/util/fault.c:77
#6  sig_fault (sig=11) at ../lib/util/fault.c:88
#7  <signal handler called>
#8  0x00007f4a7e3c636e in _gf_msg_internal (domain=0x7f4a7e43d3fc "logrotate", file=<value optimized out>, function=0x7f4a7e43daa0 "_gf_log", line=<value optimized out>, level=GF_LOG_ERROR, 
    errnum=<value optimized out>, trace=0, msgid=101012, fmt=0x7f4a7e43d406 "failed to open logfile") at logging.c:1867
#9  _gf_msg (domain=0x7f4a7e43d3fc "logrotate", file=<value optimized out>, function=0x7f4a7e43daa0 "_gf_log", line=<value optimized out>, level=GF_LOG_ERROR, errnum=<value optimized out>, 
    trace=0, msgid=101012, fmt=0x7f4a7e43d406 "failed to open logfile") at logging.c:2064
#10 0x00007f4a7e3c5e38 in _gf_log (domain=0x7f4a7e43d452 "logging-infra", file=<value optimized out>, function=0x7f4a7e43dae0 "gf_log_flush_timeout_cbk", line=1815, level=GF_LOG_DEBUG, 
    fmt=0x7f4a7e43d9b8 "Log timer timed out. About to flush outstanding messages if present") at logging.c:2163
#11 0x00007f4a7e3c8b32 in gf_log_flush_timeout_cbk (data=0x7f4a992b5800) at logging.c:1814
#12 0x00007f4a7e3e66e3 in gf_timer_proc (ctx=0x7f4a992b5800) at timer.c:193
#13 0x00007f4a98121a51 in start_thread (arg=0x7f4a7b8a2700) at pthread_create.c:301
#14 0x00007f4a94ede93d in ?? () from /lib64/libc.so.6
#15 0x0000000000000000 in ?? ()

Version-Release number of selected component (if applicable):
Red Hat Gluster Storage Server 3.1

How reproducible:
Very hard.

Steps to Reproduce:
Will update soon with a good reproducer.

Actual results:
smbd crashed and restarted.

Expected results:
No crashes are seen

Additional info:
Please see https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c18 and https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c20 for detailed RCA for this issue.

Comment 2 Anoop C S 2016-04-07 12:49:43 UTC
Downstream patch:

https://code.engineering.redhat.com/gerrit/#/c/70407/

Comment 5 Anoop C S 2016-04-11 11:17:00 UTC
Upstream patches:

http://review.gluster.org/#/c/13784/ <-- master branch
http://review.gluster.org/#/c/13803/ <-- release-3.7 branch

Comment 6 Vivek Das 2016-04-26 12:01:24 UTC
Transcoding / encoding tests over video file formats and rigorous test of running huge IOs and simultaneously multiple connect and disconnect of the mounted share on windows client where performed.
No crashes were seen during these run.

Comment 12 errata-xmlrpc 2016-06-23 05:11:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Comment 13 rjoseph 2016-10-18 05:52:49 UTC
*** Bug 1214174 has been marked as a duplicate of this bug. ***