Bug 1315544
Summary: | [GSS] -Gluster NFS server crashing in __mnt3svc_umountall | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Oonkwee Lim <olim> | ||||||||
Component: | gluster-nfs | Assignee: | Soumya Koduri <skoduri> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Arthy Loganathan <aloganat> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | unspecified | CC: | acavalla, akaiser, aloganat, amukherj, asrivast, bkunal, ctowsley, dnunes, fahmed, fgaspar, hamiller, ndevos, olim, omasek, pdhange, pousley, rabhat, rcyriac, rhinduja, rhs-bugs, rnalakka, sankarshan, sbhaloth, skoduri, storage-qa-internal | ||||||||
Target Milestone: | --- | Flags: | akaiser:
needinfo+
|
||||||||
Target Release: | RHGS 3.2.0 | ||||||||||
Hardware: | All | ||||||||||
OS: | All | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | glusterfs-3.8.4-15 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
Previously, when a NFS client unmounted all volumes, Red Hat Gluster Storage Native NFS server freed a structure that was still being used, which resulted in a segmentation fault on the server (use-after-free). The server now does not free the structure while the mount service is available, so the segmentation fault no longer occurs.
|
Story Points: | --- | ||||||||
Clone Of: | |||||||||||
: | 1421759 (view as bug list) | Environment: | |||||||||
Last Closed: | 2017-03-23 05:27:19 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1421759 | ||||||||||
Bug Blocks: | 1351515, 1351530 | ||||||||||
Attachments: |
|
Comment 14
Bipin Kunal
2016-05-16 14:03:04 UTC
Maybe related, it seems that Windows 7 and 2008 send UMNTALL requests: https://bugzilla.redhat.com/show_bug.cgi?id=GLUSTER-1666 These requests are normally sent after a(n unclean?) reboot. A few more details are in https://tools.ietf.org/html/rfc1813#section-5.2.4 Linux and pynfs do not implement UMNTALL, so it might only be reproducible with Windows or an other OS. Niels, I tried playing with Windows 2012 NFS client yesterday but could not produce anything useful. Can you suggest me the test or steps for reproducer? Thanks, Bipin Kunal Created attachment 1160478 [details]
1. fix UMNTALL behaviour, 2. remove mountdict
Completely untested patch that addresses the following two points:
1. fix UMNTALL to only UMNT the exports from the client calling the procedure
2. remove the duplication of structures in mountdict, use mountlist everywhere
I am confident that these two patches prevent the crashes that have been seen. The mountdict was used to optimize lookups when many clients are mounting exports. The removal of the mountdict may cause some performance drop while (un)mounting, but I do not expect that to be critical.
These patches were only build-tested, but it shows the approach I'd like to take in order to fix this problem.
(In reply to Bipin Kunal from comment #24) > I tried playing with Windows 2012 NFS client yesterday but could not produce > anything useful. > > Can you suggest me the test or steps for reproducer? Sorry, I still have no idea how this can happen. There might be a need for multiple clients mounting at the same time when one client sends the UMNTALL procedure. But without more details from the customers that hit this problem, it is very difficult/impossible to understand the cause. Maybe one of the customers can capture a tcpdump on the NFS-server for all the MOUNT procedures? Depending on their configuration, this would be done on a different port than the NFS traffic. So the tcpdump should not contain much data, only the MNT, UMNT and UMNTALL procedures. Check with 'rpcinfo -p $SERVER' what port is used. Provide the tcpdump and the nfs.log once the process crashed again. Created attachment 1246422 [details]
tcpdump
Created attachment 1246423 [details]
images from client
Okay...From the code, I see a potential issue with mountdict. During gluster-nfs process initialization --> mnt3svc_init (xlator_t *nfsx) { ... ... ... mstate->mountdict = dict_new (); ... .. } The ref taken for mountdict above seem to be getting un-refed in __mnt3svc_umountall() __mnt3svc_umountall (struct mount3_state *ms) { dict_unref (ms->mountdict); } So in a lifetime of gNFS process >1 UMOUNTALL requests may result in double unref resulting in accessing freed memory. Ideally the above dict_unref (ms->mountdict) should have been in mnt3svc_deinit() IMO. Based on the above, I am working with Riyas to check if we can reproduce the issue by trying out multiple mount/umount or reboot scenarios using windows client. As per the current HOTFIX process document, setting NeedInfo to PM https://mojo.redhat.com/docs/DOC-1037888 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/97861/ The hotfix build provided for the issue is been verified as mentioned in #C107. The build can be provided to the customers. __mnt3svc_umountall crash is not seen while doing gnfs mount and umountall. Verified the fix on the build, glusterfs-server-3.8.4-15.el7rhgs.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |