Bug 1477190

Summary: [GNFS] GNFS got crashed while mounting volume on solaris client
Product: [Community] GlusterFS Reporter: Niels de Vos <ndevos>
Component: nfsAssignee: Niels de Vos <ndevos>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 3.12CC: bugs, pasik, srangana
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-05 17:38:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1468291    
Bug Blocks: 1472773, 1473826    

Description Niels de Vos 2017-08-01 12:40:27 UTC
+++ This bug was initially created as a clone of Bug #1472773 +++

Description of problem:

GNFS got crashed while mounting volume on solaris client

Version-Release number of selected component (if applicable):
Any version with https://review.gluster.org/17822

How reproducible:
Consistently

Steps to Reproduce:
1.Create a Volume 2*(4+2) Distributed-Disperse Volume
2.Export the volume via GNFS.Set nfs.disable to off
3.Mount the volume to solaris client

# mount -o proto=tcp,vers=3 nfs://10.70.41.251:/disperseVol /mnt/GNFS_mani/
nfs mount: 10.70.41.251: : RPC: Program not registered
nfs mount: retrying: /mnt/GNFS_mani


Actual results:
GNFS got crashed 

Expected results:
GNFS should not get crash

Additional info:

(gdb) bt
#0  0x00007f8b43d91205 in _gf_ref_put (ref=ref@entry=0x0) at refcount.c:36
#1  0x00007f8b35820455 in nfs3_call_state_wipe (cs=cs@entry=0x0) at nfs3.c:559
#2  0x00007f8b35823dd2 in nfs3_lookup (req=req@entry=0x7f8b3015f3f0, fh=fh@entry=0x7f8b37066ad0, 
    fhlen=<optimized out>, name=name@entry=0x7f8b37066b10 "disperseVol") at nfs3.c:1586
#3  0x00007f8b35824408 in nfs3svc_lookup (req=0x7f8b3015f3f0) at nfs3.c:1615
#4  0x00007f8b43ae58c5 in rpcsvc_handle_rpc_call (svc=0x7f8b3006b9f0, trans=trans@entry=0x7f8b30167270, 
    msg=<optimized out>) at rpcsvc.c:695
#5  0x00007f8b43ae5aab in rpcsvc_notify (trans=0x7f8b30167270, mydata=<optimized out>, 
    event=<optimized out>, data=<optimized out>) at rpcsvc.c:789
#6  0x00007f8b43ae79e3 in rpc_transport_notify (this=this@entry=0x7f8b30167270, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f8b30160720) at rpc-transport.c:538
#7  0x00007f8b389163d6 in socket_event_poll_in (this=this@entry=0x7f8b30167270, 
    notify_handled=<optimized out>) at socket.c:2306
#8  0x00007f8b3891897c in socket_event_handler (fd=34, idx=33, gen=10, data=0x7f8b30167270, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2458
#9  0x00007f8b43d7d0f6 in event_dispatch_epoll_handler (event=0x7f8b37067e80, event_pool=0x55d3ffe94fd0)
    at event-epoll.c:572
#10 event_dispatch_epoll_worker (data=0x55d3ffedb5f0) at event-epoll.c:648
#11 0x00007f8b42b81e25 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f8b4244e34d in clone () from /lib64/libc.so.6


--- Additional comment from Niels de Vos on 2017-07-27 14:54:14 CEST ---

Two more upstream patches have been posted:

https://review.gluster.org/17897
- libglusterfs: the global_xlator should have valid cbks

https://review.gluster.org/17898
- nfs: use "/" as subdir for volume mounts


With these changes subdir mounting (and restricting access) works for me. The additional test by deleting the subdir after mounting does not segfault anymore either.

Comment 1 Worker Ant 2017-08-01 12:43:10 UTC
REVIEW: https://review.gluster.org/17946 (libglusterfs: the global_xlator should have valid cbks) posted (#1) for review on release-3.12 by Niels de Vos (ndevos)

Comment 2 Worker Ant 2017-08-01 12:43:14 UTC
REVIEW: https://review.gluster.org/17947 (nfs: use "/" as subdir for volume mounts) posted (#1) for review on release-3.12 by Niels de Vos (ndevos)

Comment 3 Worker Ant 2017-08-02 15:04:14 UTC
COMMIT: https://review.gluster.org/17946 committed in release-3.12 by Shyamsundar Ranganathan (srangana) 
------
commit 64391ea321d269671814afd13cbdfe099b893e4d
Author: Niels de Vos <ndevos>
Date:   Tue Aug 1 14:41:22 2017 +0200

    libglusterfs: the global_xlator should have valid cbks
    
    There is a case where Gluster/NFS needs to resolve a path outside of the
    nfs-xlator itself. While resolving the path to fetch the GFID for
    creating the NFS-filehandle, gfapi may set an inode-ctx through
    glfs_resolve_at(). This inode-ctx is linked with the global_xlator.
    
    Because the global_xlator does not have any cbks, loc_wipe() will cause
    a segfault when it calls inode_unref() and xl->cbks->forget(). It is
    assumed that all xlators have a cbks symbol, otherwise loading of the
    xlator will fail. The global_xlator is not loaded in the same way, so
    there is no failure noticed when the instance is created. By adding an
    empty `struct xlator_cbks`, the global_xlator behaves similat to other
    xlators that do not implement all callbacks.
    
    I would have preferred to keep the inode-ctx setting through
    glfs_resolve_at() contained within Gluster/NFS. Unfortunately
    Gluster/NFS also uses the inode-ctx, and is not prepared to see the
    values that glfs_resolve_at() stores there.
    
    This problem is not easily reproducible because it involves mounting
    over WebNFS (like Solaris 10 can do). The segfault will also not be
    immediate, unless the following is done:
    
    1. create a subdir on a volume
    2. mount the volume/subdir over WebNFS
    3. unmount the volume/subdir
    4. mount the root of the volume
    5. delete the subdir on the volume -> segfault of Gluster/NFS
    
    Cherry picked from commit cec5036f7e99ae265bb5e0e7f3df30166466eb2c:
    > Change-Id: I2bd71d033e97edc07ba93b2d4ada558f65d68999
    > BUG: 1468291
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: https://review.gluster.org/17897
    > Smoke: Gluster Build System <jenkins.org>
    > Reviewed-by: Amar Tumballi <amarts>
    > Reviewed-by: jiffin tony Thottan <jthottan>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Jeff Darcy <jeff.us>
    
    Change-Id: I2bd71d033e97edc07ba93b2d4ada558f65d68999
    BUG: 1477190
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/17946
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 4 Worker Ant 2017-08-02 15:09:18 UTC
COMMIT: https://review.gluster.org/17947 committed in release-3.12 by Shyamsundar Ranganathan (srangana) 
------
commit 726797d3e62bfe1ebf0c3af63b00084e1b159886
Author: Niels de Vos <ndevos>
Date:   Tue Aug 1 14:41:54 2017 +0200

    nfs: use "/" as subdir for volume mounts
    
    For cases where subdir mounting is checked, it makes it much easier to
    return a subdir of "/" in case no subdir is passed. This reduces the
    number of corner cases where permissions are checked for subdir mounts,
    but not for volume mounts (or the other way around).
    
    The problem was identified by WebNFS mounting a volume, which got denied
    after commit e3f48fa2. Handling this would require an exception for
    non-subdir mounts, or make non-subdir mounts equal to subdir mounts.
    This change takes the 2nd approach.
    
    Cherry picked from commit 45c973576d6356dbe4da897e9f0528eac7529d48:
    > Change-Id: I0d810ae90b267a2cc3eac8d55368a0f1b0787f6a
    > Fixes: e3f48fa2 ("nfs: add permission checking for mounting over WebNFS")
    > BUG: 1468291
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: https://review.gluster.org/17898
    > Smoke: Gluster Build System <jenkins.org>
    > Reviewed-by: soumya k <skoduri>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: jiffin tony Thottan <jthottan>
    
    Change-Id: I0d810ae90b267a2cc3eac8d55368a0f1b0787f6a
    BUG: 1477190
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: https://review.gluster.org/17947
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 5 Shyamsundar 2017-09-05 17:38:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/