Bug 1315201
Summary: | [GSS] - smbd crashes on 3.1.1 with samba-vfs 4.1 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Mukul Malhotra <mmalhotr> | ||||||
Component: | samba | Assignee: | Anoop C S <anoopcs> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Vivek Das <vdas> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | rhgs-3.1 | CC: | anders, anoopcs, asrivast, avasudev, bkunal, bugs, byarlaga, dconsoli, hgowtham, madam, mchangir, mmalhotr, nlevinki, pgurusid, rcyriac, rhinduja, rjoseph, rtalur, sankarshan, vdas | ||||||
Target Milestone: | --- | Keywords: | Triaged, ZStream | ||||||
Target Release: | RHGS 3.1.3 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | 1301120 | Environment: | |||||||
Last Closed: | 2016-06-23 05:10:50 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1317940, 1319374, 1319989 | ||||||||
Bug Blocks: | 1299184 | ||||||||
Attachments: |
|
Description
Mukul Malhotra
2016-03-07 09:01:20 UTC
Created attachment 1133702 [details]
Core dump file
*** Bug 1314834 has been marked as a duplicate of this bug. *** <anoopcs> Mukul, Regarding 1315201 <anoopcs> Mukul, Do we have any reproducer? <Mukul> anoopcs, no, I have to verify with customer if he can reproduce in his end <anoopcs> Mukul, Ok. I will comment in the bug with the patch that I suspect to be the fix for this crash. <Mukul> anoopcs, OK Hi Mukul, Thanks for all your support. As a first step, I could finally root cause the issue of getting truncated core files every time. Due to the absence of LimitCORE parameter in smb service file, systemd defaults the soft and hard limits for coredump files to 0. And we have a strange piece of code in Samba where we set soft limit to maximum of 16MB and current soft limit(which will be 0). Thus we always end up in creation of truncated core files of 16MB for Samba crashes.This limitation in Samba have been recently fixed upstream. (https://git.samba.org/?p=samba.git;a=commit;h=58d3462bc58290d8eb5e554c6c59cf6b73ccf58a) So at this moment, I would like to request them to modify smb service file(/usr/lib/systemd/system/smb.service) to include the following under [Service] section, restart smb/ctdb services and try accessing the shares. LimitCORE=infinity After restarting services, as I mentioned before in previous comment https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c6, verify that cat /proc/<smbd-pid>/limits shows Max core file size as unlimited for soft and hard columns. You can then attach the newly found cores which I would assume to be complete. Hello Anoop, I had attached the fresh core files. Thanks Mukul Hi Mukul, Since we can't move forward with the first solution to generate complete core files I will now put forward an alternate procedure. This procedure will enable Samba to dump complete cores thereafter for all new/existing client connections. prerequisite: If not present, install util-linux package in order to get prlimit binary. 1. Run the following one liner: # for i in $(pgrep smbd); do prlimit --pid=$i --core=unlimited; done; 2. Verify the changes made in step 1 for soft limits: (scripted way) # > /tmp/samba-core-file-size; for i in $(pgrep smbd); do cat /proc/$i/limits | grep "Max core file size" | tr -s ' ' | cut -d ' ' -f5 >> /tmp/samba-core-file-size; done; # cat /tmp/samba-core-file-size must display all as "unlimited". # > /tmp/samba-core-file-size; for i in $(pgrep smbd); do cat /proc/$i/limits | grep "Max core file size" | tr -s ' ' | cut -d ' ' -f6 >> /tmp/samba-core-file-size; done; # cat /tmp/samba-core-file-size must display all as "unlimited". OR (manual way) Soft and hard limits for 'Max core file size' field from the output of `cat /proc/<pid>/limits`, where pid=each pid from the output of `pgrep smbd`, must show as unlimited. Hello, Thanks Anoop for the analysis done on the core dump. Waiting for your update regarding any workaround or patch which can be provided to the customer. Mukul Hello, Thanks Anoop Mukul Hello Anoop, Can the test build be prioritize as customer is waiting for the same ? Thanks Mukul Hello Michael, I had corrected the $subject Thanks Mukul Hi, I have attached the fixes for the crashes reported in this bug, both the patches need to be applied on top of 3.7.1.16(which customer is having). Let me know if you need anything else. Poornima, Please provide the patch link as well on the case. Thanks, Bipin Kunal The corresponding upstream patches link: http://review.gluster.org/#/c/13784 http://review.gluster.org/#/c/6459/ As discussed/concluded, if the customer is ok with the downtime for updating to 3.1.2(gluster, samba, ctdb), then we can provide the hostfix for 3.1.2 else for 3.1.1 Hotfix for 3.1.1: The patches are attached in the BZ: - https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c32 - https://bugzilla.redhat.com/show_bug.cgi?id=1315201#c33 Hotfix for 3.1.2: The upstream patches apply cleanly, hence can be cherrypicked from the below locations: - http://review.gluster.org/#/c/6459/ - http://review.gluster.org/#/c/13784 Reply for comment 40, it is ok if you exclude /libglusterfs/src/unittest/log_mock.c changes in the patch. Reply for comment 41, accessing same volume simultaneously from Two or more Samba nodes which are not part of Samba ctdb cluster will lead to lot of problems with locking etc. It is preferred if a test volume having same contents in created on updated node. I would like to wait for poornima's comment before proceeding with test. Created attachment 1140610 [details] Patch that applies on 3.1.2 rhgs This patch should be used instead of http://review.gluster.org/#/c/13784 Mukul, Please use the attached patch, and http://review.gluster.org/#/c/6459/ for the build. The problem was, a file(libglusterfs/src/unittest/log_mock.c) modified in this patch, is not packaged. Regarding the testing, as mentioned above, the same gluster volume should not be used by standalone samba and clustered samba simultaneously. This can lead to data corruption. Hence volume should not be used simultaneously or export a testvolume from standalone samba. Also i see that the volume has some nfs options set, is the volume being accessed by samba and nfs simultaneously? Thanks Poornima and Raghavendra. Removed log_mock.c changes from the patch and it works fine. @ Mukul : Here is the test-fix available. Check build : https://brewweb.devel.redhat.com/taskinfo?taskID=10737756 Please ask on the customer to test this fix. Please inform customer that this is only for testing purpose and has not been fully tested. Hotfix will be given once he is satisfied with the fix. Please recommend necessary measure to be taken before they upgrade. I will recommend you even to use these rpms to test basic functionality, like upgrading, creating new volume, using existing volume, using samba etc. -Regards, Bipin Hello, Customer wanted the hotfix build https://brewweb.devel.redhat.com/taskinfo?taskID=10770475 to be tested by QE as customer will be applying the hotfix on the prod environment. So, can the QE test the hotfix build before providing the same to customer as they have an ETA for tomorrow i.e Tuesday. Thanks Mukul From QE side, we have run couple of regressions using both windows and linux cifs clients and tried to simulate the transcoding / encoding using a tool (multiple files were used from multiple clients). No crashes were seen during these runs. I have forwarded hotfix to customer based on #46 #56. Rejy, Please provide hot_fix_requested+ Adding to QE testing. Apart from the sanity tests including windows and linux cifs side. We have also tested running Iozone tool over multiple clients and more over we ran a rigorous test of running huge IOs and simultaneously multiple connect and disconnect of the mounted share. No crashes were seen during this run. Transcoding / encoding tests over video file formats and rigorous test of running huge IOs and simultaneously multiple connect and disconnect of the mounted share on windows client where performed. No crashes were seen during these run. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |