+++ This bug was initially created as a clone of Bug #1301120 +++ Description of problem: Hi! Have the same problems as reported in bug id 1234877. smbd goes into a panic every 6 minutes and produces a core dump. smbd[27140]: [2016/01/22 16:58:22.581586, 0] ../lib/util/fault.c:78(fault_report) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: =============================================================== Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581611, 0] ../lib/util/fault.c:79(fault_report) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: INTERNAL ERROR: Signal 6 in pid 27140 (4.2.3) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: Please read the Trouble-Shooting section of the Samba HOWTO Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581622, 0] ../lib/util/fault.c:81(fault_report) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: =============================================================== Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581629, 0] ../source3/lib/util.c:788(smb_panic_s3) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: PANIC (pid 27140): internal error Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.581807, 0] ../source3/lib/util.c:899(log_stack_trace) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: BACKTRACE: 14 stack frames: Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7f2db2310cea] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7f2db2310dc0] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7f2db41608cf] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #3 /lib64/libsamba-util.so.0(+0x1aae6) [0x7f2db4160ae6] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #4 /lib64/libpthread.so.0(+0xf100) [0x7f2db4389100] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #5 /lib64/libc.so.6(gsignal+0x37) [0x7f2db09bf5f7] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #6 /lib64/libc.so.6(abort+0x148) [0x7f2db09c0ce8] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #7 /lib64/libc.so.6(+0x75317) [0x7f2db09ff317] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #8 /lib64/libc.so.6(+0x7cfe1) [0x7f2db0a06fe1] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #9 /lib64/libglusterfs.so.0(gf_timer_call_cancel+0x52) [0x7f2d9bc77652] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #10 /lib64/libglusterfs.so.0(gf_log_inject_timer_event+0x37) [0x7f2d9bc58de7] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #11 /lib64/libglusterfs.so.0(gf_timer_proc+0x10b) [0x7f2d9bc7781b] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #12 /lib64/libpthread.so.0(+0x7dc5) [0x7f2db4381dc5] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: #13 /lib64/libc.so.6(clone+0x6d) [0x7f2db0a8021d] Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: [2016/01/22 16:58:22.582688, 0] ../source3/lib/dumpcore.c:318(dump_core) Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: dumping core in /var/log/samba/cores/smbd Jan 22 16:58:22 ch-mb-ph-gfs-01 smbd[27140]: Version-Release number of selected component (if applicable): cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) The gluster and samba packages are coming from the CentOS repo. rpm -qa | grep gluster glusterfs-fuse-3.7.6-1.el7.x86_64 glusterfs-coreutils-0.0.1-0.1.git0c86f7f.el7.x86_64 centos-release-gluster37-1.0-4.el7.centos.noarch glusterfs-3.7.6-1.el7.x86_64 glusterfs-server-3.7.6-1.el7.x86_64 samba-vfs-glusterfs-4.2.3-11.el7_2.x86_64 glusterfs-client-xlators-3.7.6-1.el7.x86_64 glusterfs-cli-3.7.6-1.el7.x86_64 glusterfs-libs-3.7.6-1.el7.x86_64 glusterfs-api-3.7.6-1.el7.x86_64 rpm -qa | grep samba samba-libs-4.2.3-11.el7_2.x86_64 samba-client-libs-4.2.3-11.el7_2.x86_64 samba-vfs-glusterfs-4.2.3-11.el7_2.x86_64 samba-common-4.2.3-11.el7_2.noarch samba-4.2.3-11.el7_2.x86_64 samba-common-tools-4.2.3-11.el7_2.x86_64 samba-common-libs-4.2.3-11.el7_2.x86_64 Volume Name: ch-online Type: Replicate Volume ID: 9f91a44a-edd9-401c-9ecc-a40e7e01332c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: ch-mb-ph-gfs-01:/gfs/brick1/brick Brick2: ch-mb-ph-gfs-02:/gfs/brick1/brick Options Reconfigured: cluster.lookup-optimize: on performance.stat-prefetch: off cluster.ensure-durability: on performance.normal-prio-threads: 16 performance.high-prio-threads: 32 performance.cache-size: 1024MB performance.io-thread-count: 32 cluster.lookup-unhashed: off server.allow-insecure: on performance.readdir-ahead: on client.bind-insecure: on client.event-threads: 8 storage.owner-uid: 10003 storage.owner-gid: 10007 cat /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 0 option event-threads 1 option rpc-auth-allow-insecure on # option base-port 49152 cat /etc/samba/smb.conf [global] netbios name = ch-mb-ph-samba idmap backend = tdb2 private dir = /mnt/ch-online/.smblock/ workgroup = mediabank server string = Samba Server Version %v log file = /var/log/samba/%m.log max log size = 50 security = user map to guest = Bad Password printing = bsd printcap name = /dev/null [customer-data] path = /customer-data read only = no browseable = yes guest ok = no kernel share modes = no force user = mediabank-service create mask = 4770 directory mask = 4770 valid users = mediabank-service vfs objects = glusterfs glusterfs:loglevel = 7 glusterfs:volume = ch-online glusterfs:volfile_server = localhost glusterfs:logfile = /var/log/samba/glusterfs-customer-data.%M.log [MBFileExchangeMTBCH] path = /customer-data/CHMEDIATEC/FileExchange read only = no browseable = yes guest ok = no kernel share modes = no force user = mediabank-service create mask = 4770 directory mask = 4770 valid users = mediabank-service dvb vfs objects = glusterfs glusterfs:loglevel = 7 glusterfs:volume = ch-online glusterfs:volfile_server = localhost glusterfs:logfile = /var/log/samba/glusterfs-fileexchange.%M.log [postprodMTBCH] path = /customer-data/postprod read only = no browseable = yes guest ok = no kernel share modes = no force user = mediabank-service create mask = 4770 directory mask = 4770 valid users = mediabank-service postprod dvb vfs objects = glusterfs glusterfs:loglevel = 7 glusterfs:volume = ch-online glusterfs:volfile_server = localhost glusterfs:logfile = /var/log/samba/glusterfs-postprod.%M.log How reproducible: Just start the smb service and have users access the different shares. There is no need for any heavy load to trigger this issue. Steps to Reproduce: 1. 2. 3. Actual results: smbd not crashing. Expected results: Additional info: --- Additional comment from Anders Rydmell on 2016-01-25 08:09 EST --- --- Additional comment from Niels de Vos on 2016-03-06 01:30:31 EST --- Bug 1234877 was fixed in the Samba package, we'll need to find out if samba-4.2.3-11 contains that patch. If this requires a change to the Samba RPM, please update the product (RHEL?) and component. --- Additional comment from Anoop C S on 2016-03-07 02:45:02 EST --- Samba-4.2.3 already contains the fix for issue mentioned in the following upstream bug: https://bugzilla.samba.org/show_bug.cgi?id=11115 Back trace provided here is different from what we have seen from https://bugzilla.redhat.com/show_bug.cgi?id=1234877 and needs some investigation. Therefore https://bugzilla.samba.org/show_bug.cgi?id=11115 is not related to this bug. See my reply to the following thread: http://www.gluster.org/pipermail/gluster-users/2016-February/025293.html From a quick look from the dmesg bt, I suspect a race between some glusterfs timer related threads. But need to find the exact root cause.
Created attachment 1133702 [details] Core dump file
*** Bug 1314834 has been marked as a duplicate of this bug. ***
<anoopcs> Mukul, Regarding 1315201 <anoopcs> Mukul, Do we have any reproducer? <Mukul> anoopcs, no, I have to verify with customer if he can reproduce in his end <anoopcs> Mukul, Ok. I will comment in the bug with the patch that I suspect to be the fix for this crash. <Mukul> anoopcs, OK
Hi Mukul, Thanks for all your support. As a first step, I could finally root cause the issue of getting truncated core files every time. Due to the absence of LimitCORE parameter in smb service file, systemd defaults the soft and hard limits for coredump files to 0. And we have a strange piece of code in Samba where we set soft limit to maximum of 16MB and current soft limit(which will be 0). Thus we always end up in creation of truncated core files of 16MB for Samba crashes.This limitation in Samba have been recently fixed upstream. (https://git.samba.org/?p=samba.git;a=commit;h=58d3462bc58290d8eb5e554c6c59cf6b73ccf58a) So at this moment, I would like to request them to modify smb service file(/usr/lib/systemd/system/smb.service) to include the following under [Service] section, restart smb/ctdb services and try accessing the shares. LimitCORE=infinity After restarting services, as I mentioned before in previous comment https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c6, verify that cat /proc/<smbd-pid>/limits shows Max core file size as unlimited for soft and hard columns. You can then attach the newly found cores which I would assume to be complete.
Hello Anoop, I had attached the fresh core files. Thanks Mukul
Hi Mukul, Since we can't move forward with the first solution to generate complete core files I will now put forward an alternate procedure. This procedure will enable Samba to dump complete cores thereafter for all new/existing client connections. prerequisite: If not present, install util-linux package in order to get prlimit binary. 1. Run the following one liner: # for i in $(pgrep smbd); do prlimit --pid=$i --core=unlimited; done; 2. Verify the changes made in step 1 for soft limits: (scripted way) # > /tmp/samba-core-file-size; for i in $(pgrep smbd); do cat /proc/$i/limits | grep "Max core file size" | tr -s ' ' | cut -d ' ' -f5 >> /tmp/samba-core-file-size; done; # cat /tmp/samba-core-file-size must display all as "unlimited". # > /tmp/samba-core-file-size; for i in $(pgrep smbd); do cat /proc/$i/limits | grep "Max core file size" | tr -s ' ' | cut -d ' ' -f6 >> /tmp/samba-core-file-size; done; # cat /tmp/samba-core-file-size must display all as "unlimited". OR (manual way) Soft and hard limits for 'Max core file size' field from the output of `cat /proc/<pid>/limits`, where pid=each pid from the output of `pgrep smbd`, must show as unlimited.
Hello, Thanks Anoop for the analysis done on the core dump. Waiting for your update regarding any workaround or patch which can be provided to the customer. Mukul
Hello, Thanks Anoop Mukul
Hello Anoop, Can the test build be prioritize as customer is waiting for the same ? Thanks Mukul
Hello Michael, I had corrected the $subject Thanks Mukul
Hi, I have attached the fixes for the crashes reported in this bug, both the patches need to be applied on top of 3.7.1.16(which customer is having). Let me know if you need anything else.
Poornima, Please provide the patch link as well on the case. Thanks, Bipin Kunal
The corresponding upstream patches link: http://review.gluster.org/#/c/13784 http://review.gluster.org/#/c/6459/
As discussed/concluded, if the customer is ok with the downtime for updating to 3.1.2(gluster, samba, ctdb), then we can provide the hostfix for 3.1.2 else for 3.1.1 Hotfix for 3.1.1: The patches are attached in the BZ: - https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c32 - https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c33 Hotfix for 3.1.2: The upstream patches apply cleanly, hence can be cherrypicked from the below locations: - http://review.gluster.org/#/c/6459/ - http://review.gluster.org/#/c/13784
Reply for comment 40, it is ok if you exclude /libglusterfs/src/unittest/log_mock.c changes in the patch. Reply for comment 41, accessing same volume simultaneously from Two or more Samba nodes which are not part of Samba ctdb cluster will lead to lot of problems with locking etc. It is preferred if a test volume having same contents in created on updated node. I would like to wait for poornima's comment before proceeding with test.
Created attachment 1140610 [details] Patch that applies on 3.1.2 rhgs This patch should be used instead of http://review.gluster.org/#/c/13784
Mukul, Please use the attached patch, and http://review.gluster.org/#/c/6459/ for the build. The problem was, a file(libglusterfs/src/unittest/log_mock.c) modified in this patch, is not packaged. Regarding the testing, as mentioned above, the same gluster volume should not be used by standalone samba and clustered samba simultaneously. This can lead to data corruption. Hence volume should not be used simultaneously or export a testvolume from standalone samba. Also i see that the volume has some nfs options set, is the volume being accessed by samba and nfs simultaneously?
Thanks Poornima and Raghavendra. Removed log_mock.c changes from the patch and it works fine. @ Mukul : Here is the test-fix available. Check build : https://brewweb.devel.redhat.com/taskinfo?taskID=10737756 Please ask on the customer to test this fix. Please inform customer that this is only for testing purpose and has not been fully tested. Hotfix will be given once he is satisfied with the fix. Please recommend necessary measure to be taken before they upgrade. I will recommend you even to use these rpms to test basic functionality, like upgrading, creating new volume, using existing volume, using samba etc. -Regards, Bipin
Hello, Customer wanted the hotfix build https://brewweb.devel.redhat.com/taskinfo?taskID=10770475 to be tested by QE as customer will be applying the hotfix on the prod environment. So, can the QE test the hotfix build before providing the same to customer as they have an ETA for tomorrow i.e Tuesday. Thanks Mukul
From QE side, we have run couple of regressions using both windows and linux cifs clients and tried to simulate the transcoding / encoding using a tool (multiple files were used from multiple clients). No crashes were seen during these runs.
I have forwarded hotfix to customer based on #46 #56. Rejy, Please provide hot_fix_requested+
Adding to QE testing. Apart from the sanity tests including windows and linux cifs side. We have also tested running Iozone tool over multiple clients and more over we ran a rigorous test of running huge IOs and simultaneously multiple connect and disconnect of the mounted share. No crashes were seen during this run.
Transcoding / encoding tests over video file formats and rigorous test of running huge IOs and simultaneously multiple connect and disconnect of the mounted share on windows client where performed. No crashes were seen during these run.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240