Bug 1372283

Summary:

[GSS] NFS Sub-directory mount not working on solaris10 client

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Bipin Kunal <bkunal>

Component:

gluster-nfs

Assignee:

Bipin Kunal <bkunal>

Status:

CLOSED ERRATA

QA Contact:

surabhi <sbhaloth>

Severity:

high

Docs Contact:

Priority:

high

Version:

rhgs-3.1

CC:

amukherj, asrivast, bkunal, jthottan, mbenjamin, ndevos, rcyriac, rhinduja, rhs-bugs, rnalakka, skoduri, storage-qa-internal

Target Milestone:

---

Keywords:

Triaged

Target Release:

RHGS 3.3.0

Hardware:

x86_64

OS:

Solaris

Whiteboard:

Fixed In Version:

glusterfs-3.8.4-23

Doc Type:

Enhancement

Doc Text:

Subdirectories of a volume can now be mounted on Solaris 10 clients using WebNFS.

Story Points:

---

Clone Of:

Clones:

1426667 1445594 (view as bug list)

Environment:

Last Closed:

2017-09-21 04:28:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1426667, 1451224

Bug Blocks:

1369781, 1417145

Attachments:

Description	Flags
Script to disable MOUNT/tcp on a Gluster/NFS server	none
Fix for segfault when doing subdir mounts over UDP	none
Rebased patch on top of RHGS-3.1.3	none
successful mount	none
unsuccessful mount	none
bz-1372283-nfs_mntpath_to_xlator	none
Partial patch to extend the functionality and correctness	none
patch	none

Description Bipin Kunal 2016-09-01 10:27:19 UTC

Description of problem:
NFS Sub-directory mount not working on Solaris 10 client.

Gives error :

# mount -o proto=tcp,vers=3 nfs://<server>:38465/replicate/bipin2 /bipin/mount
nfs mount: <server>: : RPC: Program not registered
nfs mount: retrying: /bipin/mount

Although complete volume mount works well.

# mount -o proto=tcp,vers=3 nfs://<server>:38465/replicate /bipin/mount

Sub-dir mount on RHEL client works fine

Version-Release number of selected component (if applicable):
NFS protocol Ver = 3
Server version : 3.7.9-11.el6rhs
Client version : Solaris10


How reproducible:
Always

Steps to Reproduce:
1. Enable sub-directory mount for volume
2. Mount gluster volume subdirectory from NFS client on solaris


Actual results:
Sub-directory mount fails

Expected results:
Sub-dir mount should be successful 


Additional info:

Comment 2 Bipin Kunal 2016-09-01 10:33:34 UTC

nfs.log gives error :

[2016-08-31 10:55:11.554873] E [nfs3.c:341:__nfs3_get_volume_id] (-->/usr/lib64/glusterfs/3.7.9/xlator/nfs/server.so(nfs3_lookup_reply+0x4d) [0x7fe09db206ed] -->/usr/lib64/glusterfs/3.7.9/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x78) [0x7fe09db1f1f8] -->/usr/lib64/glusterfs/3.7.9/xlator/nfs/server.so(__nfs3_get_volume_id+0xae) [0x7fe09db1f13e] ) 0-nfs-nfsv3: invalid argument: xl [Invalid argument]
==============================================================================

On the server : 

#  rpcinfo -p localhost
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100005    1   tcp  38466  mountd
    100003    3   tcp   2049  nfs
    100021    4   tcp  38468  nlockmgr
    100227    3   tcp   2049  nfs_acl
    100024    1   udp  58999  status
    100024    1   tcp  50032  status
    100021    1   udp   1018  nlockmgr
    100021    1   tcp    600  nlockmgr

# rpcinfo | grep nfs
    100003    3    tcp       0.0.0.0.8.1            nfs        superuser
    100227    3    tcp       0.0.0.0.8.1            nfs_acl    superuser


On the solaris client : 
    # rpcinfo -p localhost
   program vers proto   port  service
    100000    4   tcp    111  rpcbind
    100000    3   tcp    111  rpcbind
    100000    2   tcp    111  rpcbind
    100000    4   udp    111  rpcbind
    100000    3   udp    111  rpcbind
    100000    2   udp    111  rpcbind
    100024    1   udp  32773  status
    100024    1   tcp  32771  status
    100133    1   udp  32773
    100133    1   tcp  32771
1073741824    1   tcp  32772
    100021    1   udp   4045  nlockmgr
    100021    2   udp   4045  nlockmgr
    100021    3   udp   4045  nlockmgr
    100021    4   udp   4045  nlockmgr
    100021    1   tcp   4045  nlockmgr
    100021    2   tcp   4045  nlockmgr
    100021    3   tcp   4045  nlockmgr
    100021    4   tcp   4045  nlockmgr
    100229    1   tcp  32775  metad
    100229    2   tcp  32775  metad
    100001    2   udp  32777  rstatd
    100001    3   udp  32777  rstatd
    100001    4   udp  32777  rstatd
    100422    1   tcp  32776
    100422    2   tcp  32776
    100242    1   tcp  32777  metamedd
    100230    1   tcp  32778  metamhd
    100068    2   udp  32778
    100068    3   udp  32778
    100068    4   udp  32778
    100068    5   udp  32778
    100083    1   tcp  32779
    100002    2   tcp  32780  rusersd
    100002    3   tcp  32780  rusersd
    100002    2   udp  32779  rusersd
    100002    3   udp  32779  rusersd
    100011    1   udp  32780  rquotad
    300598    1   udp  32793
    300598    1   tcp  32781
 805306368    1   udp  32793
 805306368    1   tcp  32781
    100249    1   udp  32800
    100249    1   tcp  32782
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100227    2   udp   2049  nfs_acl
    100227    3   udp   2049  nfs_acl
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    2   tcp   2049  nfs_acl
    100227    3   tcp   2049  nfs_acl

# rpcinfo -T tcp dhcp6-132.gsslab.pnq.redhat.com nfs
rpcinfo: RPC: Program/version mismatch; low version = 0, high version = 0
program 100003 version 0 is not available

# rpcinfo | grep nfs
    100003    2    udp       0.0.0.0.8.1         nfs        1
    100003    3    udp       0.0.0.0.8.1         nfs        1
    100227    2    udp       0.0.0.0.8.1         nfs_acl    1
    100227    3    udp       0.0.0.0.8.1         nfs_acl    1
    100003    2    tcp       0.0.0.0.8.1         nfs        1
    100003    3    tcp       0.0.0.0.8.1         nfs        1
    100003    4    tcp       0.0.0.0.8.1         nfs        1
    100227    2    tcp       0.0.0.0.8.1         nfs_acl    1
    100227    3    tcp       0.0.0.0.8.1         nfs_acl    1


On RHEL client :

# rpcinfo -p localhost
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  52190  status
    100024    1   tcp  34675  status
    100005    1   udp  20048  mountd
    100005    1   tcp  20048  mountd
    100005    2   udp  20048  mountd
    100005    2   tcp  20048  mountd
    100005    3   udp  20048  mountd
    100005    3   tcp  20048  mountd
    100021    1   udp  50508  nlockmgr
    100021    3   udp  50508  nlockmgr
    100021    4   udp  50508  nlockmgr
    100021    1   tcp  56749  nlockmgr
    100021    3   tcp  56749  nlockmgr
    100021    4   tcp  56749  nlockmgr

# rpcinfo | grep nfs
    100003    3    tcp6      ::.8.1                 nfs        superuser
    100003    4    tcp6      ::.8.1                 nfs        superuser
    100227    3    tcp6      ::.8.1                 nfs_acl    superuser
    100003    3    udp6      ::.8.1                 nfs        superuser
    100003    4    udp6      ::.8.1                 nfs        superuser
    100227    3    udp6      ::.8.1                 nfs_acl    superuser

Comment 6 Soumya Koduri 2016-09-01 13:30:36 UTC

Maybe it is Day-1 issue. Niels shall be able to confirm.

(gdb) bt
#0  __nfs3_get_volume_id (nfs3=0x7f40b405d8b0, xl=0x0, volumeid=0x7f40ab5fc720 "") at nfs3.c:336
#1  0x00007f40b887b1f8 in nfs3_request_xlator_deviceid (rq=<value optimized out>) at nfs3.c:706
#2  0x00007f40b887c6ed in nfs3_lookup_reply (req=0x7f40b818006c, stat=NFS3ERR_NOENT, newfh=0x0, stbuf=0x0, 
    postparent=<value optimized out>) at nfs3.c:1236
#3  0x00007f40b88860a2 in nfs3_lookup (req=0x7f40b818006c, fh=<value optimized out>, fhlen=<value optimized out>, 
    name=<value optimized out>) at nfs3.c:1528
#4  0x00007f40b8886516 in nfs3svc_lookup (req=0x7f40b818006c) at nfs3.c:1558
#5  0x00007f40c68d71a4 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, 
    msg=0x7f40a40016f0) at rpcsvc.c:705
#6  0x00007f40c68d73a3 in rpcsvc_notify (trans=0x7f40a40074d0, mydata=<value optimized out>, 
    event=<value optimized out>, data=0x7f40a40016f0) at rpcsvc.c:799
#7  0x00007f40c68d8af8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, 
    data=<value optimized out>) at rpc-transport.c:546
#8  0x00007f40bb6587ed in socket_event_poll_in (this=0x7f40a40074d0) at socket.c:2353
#9  0x00007f40bb659abd in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, 
    data=0x7f40a40074d0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2466
#10 0x00007f40c6b82080 in event_dispatch_epoll_handler (data=0x7f40b4068be0) at event-epoll.c:575
#11 event_dispatch_epoll_worker (data=0x7f40b4068be0) at event-epoll.c:678
#12 0x00007f40c5bf8aa1 in start_thread (arg=0x7f40ab5fe700) at pthread_create.c:301
#13 0x00007f40c5561aad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) f 3
#3  0x00007f40b88860a2 in nfs3_lookup (req=0x7f40b818006c, fh=<value optimized out>, fhlen=<value optimized out>, 
    name=<value optimized out>) at nfs3.c:1528
1528	                nfs3_lookup_reply (req, stat, NULL, NULL, NULL);
(gdb) p name
$14 = <value optimized out>
(gdb) f  4
#4  0x00007f40b8886516 in nfs3svc_lookup (req=0x7f40b818006c) at nfs3.c:1558
1558	        ret = nfs3_lookup (req, &fh, args.what.dir.data.data_len, name);
(gdb) p name
$15 = "replicate/bipin2", '\000' <repeats 128 times>"\217, X\272\306@\177\000\000~_\265\306@\177\000\000\217X\272\306@\177\000\000\235X\272\306@\177\000\000\b\000\000\000\000\000\000\000:\003\261\306@\177", '\000' <repeats 74 times>, "O\002", '\000' <repeats 22 times>, "\020PA\307@\177", '\000' <repeats 111 times>, "199", '\000' <repeats 40 times>, "\020\000\000\000\060\000\000\000\360\317_\253@\177\000\000\060\317_\253@\177", '\000' <repeats 61 times>, "38924", '\000' <repeats 40 times>, "\030\000\000\000\060\000\000\000\260\320_\253@\177\000\000\360\317_\253@\177", '\000' <repeats 66 times>"\360, \315_\253@\177\000\000`\320_\253@\177\000\000`\320_\253@\177\000\000\335\005]\305@\177\000\000"...
(gdb) 
$16 = "replicate/bipin2", '\000' <repeats 128 times>"\217, X\272\306@\177\000\000~_\265\306@\177\000\000\217X\272\306@\177\000\000\235X\272\306@\177\000\000\b\000\000\000\000\000\000\000:\003\261\306@\177", '\000' <repeats 74 times>, "O\002", '\000' <repeats 22 times>, "\020PA\307@\177", '\000' <repeats 111 times>, "199", '\000' <repeats 40 times>, "\020\000\000\000\060\000\000\000\360\317_\253@\177\000\000\060\317_\253@\177", '\000' <repeats 61 times>, "38924", '\000' <repeats 40 times>, "\030\000\000\000\060\000\000\000\260\320_\253@\177\000\000\360\317_\253@\177", '\000' <repeats 66 times>"\360, \315_\253@\177\000\000`\320_\253@\177\000\000`\320_\253@\177\000\000\335\005]\305@\177\000\000"...
(gdb) c
Continuing.
[Switching to Thread 0x7f40ba032700 (LWP 9165)]

Breakpoint 5, __nfs3_get_volume_id (nfs3=0x7f40b405d8b0, xl=0x0, volumeid=0x7f40ba030720 "") at nfs3.c:336
336	{
(gdb) b nfs3_lookup
Breakpoint 6 at 0x7f40b8885dc0: file nfs3.c, line 1489.
(gdb) c
Continuing.

Breakpoint 6, nfs3_lookup (req=0x7f40b8180b78, fh=0x7f40ba031b10, fhlen=0, name=0x7f40ba030b10 "replicate/bipin2")
    at nfs3.c:1489
1489	{
(gdb) n
1496	        GF_VALIDATE_OR_GOTO (GF_NFS3, req, out);
(gdb) 
1489	{
(gdb) 
1496	        GF_VALIDATE_OR_GOTO (GF_NFS3, req, out);
(gdb) 
1497	        GF_VALIDATE_OR_GOTO (GF_NFS3, fh, out);
(gdb) 
1498	        GF_VALIDATE_OR_GOTO (GF_NFS3, name, out);
(gdb) 
1500	        nfs3_log_fh_entry_call (rpcsvc_request_xid (req), "LOOKUP", fh,
(gdb) 
[Switching to Thread 0x7f40ab5fe700 (LWP 9180)]

Breakpoint 5, __nfs3_get_volume_id (nfs3=0x7f40b405d8b0, xl=0x0, volumeid=0x7f40ab5fc720 "") at nfs3.c:336
336	{
(gdb) c
Continuing.

Breakpoint 6, nfs3_lookup (req=0x7f40b818006c, fh=0x7f40ab5fdb10, fhlen=0, name=0x7f40ab5fcb10 "replicate/bipin2")
    at nfs3.c:1489
1489	{
(gdb) n
1496	        GF_VALIDATE_OR_GOTO (GF_NFS3, req, out);
(gdb) 
1489	{
(gdb) 
1496	        GF_VALIDATE_OR_GOTO (GF_NFS3, req, out);
(gdb) 
1497	        GF_VALIDATE_OR_GOTO (GF_NFS3, fh, out);
(gdb) l
1492	        int                             ret = -EFAULT;
1493	        struct nfs3_state               *nfs3 = NULL;
1494	        nfs3_call_state_t               *cs = NULL;
1495	
1496	        GF_VALIDATE_OR_GOTO (GF_NFS3, req, out);
1497	        GF_VALIDATE_OR_GOTO (GF_NFS3, fh, out);
1498	        GF_VALIDATE_OR_GOTO (GF_NFS3, name, out);
1499	
1500	        nfs3_log_fh_entry_call (rpcsvc_request_xid (req), "LOOKUP", fh,
1501	                                name);
(gdb) 
1502	        nfs3_validate_nfs3_state (req, nfs3, stat, nfs3err, ret);
1503	        if (nfs3_solaris_zerolen_fh (fh, fhlen))
1504	                nfs3_funge_solaris_zerolen_fh (nfs3, fh, name, stat, nfs3err);
1505	        else
1506	                nfs3_validate_gluster_fh (fh, stat, nfs3err);
1507	        nfs3_validate_strlen_or_goto (name, NFS_NAME_MAX, nfs3err, stat, ret);
1508	        nfs3_map_fh_to_volume (nfs3, fh, req, vol, stat, nfs3err);
1509	        nfs3_volume_started_check (nfs3, vol, ret, out);
1510	        nfs3_handle_call_state_init (nfs3, cs, req, vol, stat, nfs3err);
1511	
(gdb) n
1498	        GF_VALIDATE_OR_GOTO (GF_NFS3, name, out);
(gdb) 
1500	        nfs3_log_fh_entry_call (rpcsvc_request_xid (req), "LOOKUP", fh,
(gdb) 
[Switching to Thread 0x7f40ba032700 (LWP 9165)]

Breakpoint 5, __nfs3_get_volume_id (nfs3=0x7f40b405d8b0, xl=0x0, volumeid=0x7f40ba030720 "") at nfs3.c:336
336	{
(gdb) b nfs3_solaris_zerolen_fh
Breakpoint 7 at 0x7f40b887a940: file nfs3.c, line 431.
(gdb) c
Continuing.

Program received signal SIGPIPE, Broken pipe.
0x00007f40c5559b97 in __libc_writev (fd=<value optimized out>, vector=<value optimized out>, 
    count=<value optimized out>) at ../sysdeps/unix/sysv/linux/writev.c:56
56	      result = INLINE_SYSCALL (writev, 3, fd, CHECK_N (vector, count), count);
(gdb) c
Continuing.

Breakpoint 6, nfs3_lookup (req=0x7f40b8180b78, fh=0x7f40ba031b10, fhlen=0, name=0x7f40ba030b10 "replicate/bipin2")
    at nfs3.c:1489
1489	{
(gdb) c
Continuing.

Breakpoint 7, nfs3_solaris_zerolen_fh (fh=0x7f40ba031b10, fhlen=0) at nfs3.c:431
431	        if (!fh)
(gdb) l
426	}
427	
428	int
429	nfs3_solaris_zerolen_fh (struct nfs3_fh *fh, int fhlen)
430	{
431	        if (!fh)
432	                return 0;
433	
434	        if (nfs3_fh_validate (fh))
435	                return 0;
(gdb) 
436	
437	        if (fhlen == 0)
438	                return 1;
439	
440	        return 0;
441	}
442	
443	
444	/* Function pointer that represents the generic prototypes of functions used
445	 * to serialize NFS3 message structures into the XDR format.
(gdb) b nfs3_fh_validate
Breakpoint 8 at 0x7f40b8879050: file nfs3-fh.c, line 32.
(gdb) c
Continuing.

Breakpoint 8, nfs3_fh_validate (fh=0x7f40ba031b10) at nfs3-fh.c:32
32		if (!fh)
(gdb) n
35		if (fh->ident[0] != GF_NFSFH_IDENT0)
(gdb) 
41	       if (fh->ident[2] != GF_NFSFH_IDENT2)
(gdb) 
nfs3_fh_validate (fh=0x7f40ba031b10) at nfs3-fh.c:48
48	}
(gdb) 
nfs3_solaris_zerolen_fh (fh=<value optimized out>, fhlen=0) at nfs3.c:441
441	}
(gdb) p fh->ident[2]
value has been optimized out
(gdb) p fh->ident[1]
value has been optimized out
(gdb) n
nfs3_lookup (req=0x7f40b8180b78, fh=0x7f40ba031b10, fhlen=0, name=0x7f40ba030b10 "replicate/bipin2") at nfs3.c:1504
1504	                nfs3_funge_solaris_zerolen_fh (nfs3, fh, name, stat, nfs3err);
(gdb) b nfs_mntpath_to_xlator
Breakpoint 9 at 0x7f40b8864685: file nfs-common.c, line 80.
(gdb) c
Continuing.

Breakpoint 9, nfs_mntpath_to_xlator (cl=0x7f40b4015150, path=0x7f40ba030b10 "replicate/bipin2") at nfs-common.c:80
80	{
(gdb) p cl->xlator
$17 = (xlator_t *) 0x7f40b40110e0
(gdb) p cl->xlator->name
$18 = 0x7f40b400f500 "brickrmvtest"
(gdb) p cl
$19 = (xlator_list_t *) 0x7f40b4015150
(gdb) p $19->next->xlator->name
$20 = 0x7f40b400b210 "replicate"
(gdb) p $19->next
$21 = (struct xlator_list *) 0x7f40b4015210
(gdb) p $21->next->xlator->name
Cannot access memory at address 0x0
(gdb) 

Solaris client seems to be doing lookup of path 'replicate/bipin2' -> 'volume/subdir' instead of issuing mount cmd directly on the entire path.  And we seem to be having special handling for solaris client. When fhlen is zero, "nfs3_funge_solaris_zerolen_fh()"->"nfs_mntpath_to_xlator" , we are matching up paths with exportlist which contains only volume names. That is the reason lookup of volume/subdir path fails with ENOENT error. This looks like Day1 issue. We need to add support or fix our code to handle subdir paths too for solaris client.

Comment 7 Bipin Kunal 2016-09-03 02:34:20 UTC

Niels,
 
Please have a look at C #6 and let me know your opinion.

Comment 9 Niels de Vos 2016-09-06 10:04:54 UTC

From https://github.com/gluster/glusterfs/commit/ddb31110db8e1b5995d392ced988f34d2f9145d2#diff-c76fa92cce27f67d089e7d8cac4a44bdR2114 :

+        /*
+         * 1. First check if the MOUNT is for whole volume.
+         *      i.e. __mnt3udp_get_export_volume_inode ()
+         * 2. If NOT, then TRY for SUBDIR MOUNT.
+         *      i.e. __mnt3udp_get_export_subdir_inode ()
+         * 3. If a subdir is exported using nfs.export-dir,
+         *      then the mount type would be MNT3_EXPTYPE_DIR,
+         *      so make sure to find the proper path to be
+         *      resolved using __volume_subdir()
+         * 3. Make sure subdir export is allowed.
+         */


It seems to be intended to be able to mount a subdir in case it is listed in nfs.export-dir.

Comment 13 Bipin Kunal 2016-09-13 11:30:49 UTC

While testing more I was able to mount sub-directory and it did not need any extra patch. 
Steps are really weird but it worked on my setup. 

On Solaris client :

1) try mounting sub-directory
    We expect error : "RPC: Program not registered"
2) Abort the mount

On Gluster server

3) Kill gNFS process associated with the server being used for mounting
     kill -9 <pid>
     You can get gNFS pid in gluster vol status <volname>

On Solaris client :

4)  try mounting sub-directory
    We expect error "RPC: Program not registered"
5) Abort the mount

On Gluster server

6) restart glusterd service to start gNFS process
     service glusterd restart
7) Check gluster vol status to verify if gNFS is up
     You might have to wait for 1-2 sec or try checking status few times

On Solaris client
8) Try mounting sub-directory
      We expect sub-dir mount to be successful

9) Access mount 
   If not able to access and get connectivity issue with sever
   Expect : gNFS crashed

On Gluster server
10) Check the volume status to check the status of gNFS process and check nfs.log from backtrace

11) restart glusterd to restart gNFS and then access mount from client

Comment 14 Soumya Koduri 2016-09-13 11:49:55 UTC

As updated by Bipin , sub-dir mount works certain times but not always.

Below are the observations from pkt trace - 


1) When sub-dir mount doesn't work -

[skoduri@skoduri ~]$ tshark -r /tmp/client.pcap   "rpc.program == 100003 || rpc.program == 100005 || rpc.program == 100000"
253 20:08:52.412564 10.10.188.199 → 10.65.6.132  NFS 194 V3 LOOKUP Call, DH: 0x00000000/replicate/bipin2
269 20:08:52.637463  10.65.6.132 → 10.10.188.199 NFS 90 V3 LOOKUP Reply (Call In 253) Error: NFS3ERR_NOENT
272 20:08:52.638154 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
287 20:08:52.863357  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 272) PROGRAM_NOT_AVAILABLE
294 20:08:53.090922 10.10.188.199 → 10.65.6.132  NFS 194 V3 LOOKUP Call, DH: 0x00000000/replicate/bipin2
301 20:08:53.316083  10.65.6.132 → 10.10.188.199 NFS 90 V3 LOOKUP Reply (Call In 294) Error: NFS3ERR_NOENT
304 20:08:53.316732 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
318 20:08:53.542107  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 304) PROGRAM_NOT_AVAILABLE
407 20:08:58.769031 10.10.188.199 → 10.65.6.132  NFS 194 V3 LOOKUP Call, DH: 0x00000000/replicate/bipin2
409 20:08:58.994121  10.65.6.132 → 10.10.188.199 NFS 90 V3 LOOKUP Reply (Call In 407) Error: NFS3ERR_NOENT
412 20:08:58.994879 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
416 20:08:59.220470  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 412) PROGRAM_NOT_AVAILABLE
656 20:09:09.447638 10.10.188.199 → 10.65.6.132  NFS 194 V3 LOOKUP Call, DH: 0x00000000/replicate/bipin2
665 20:09:09.673086  10.65.6.132 → 10.10.188.199 NFS 90 V3 LOOKUP Reply (Call In 656) Error: NFS3ERR_NOENT
668 20:09:09.673780 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
675 20:09:09.899194  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 668) PROGRAM_NOT_AVAILABLE
[skoduri@skoduri ~]$ 


As can be seen above, solaris client tries to mount via UDP. But since by default gluster-NFS doesn't support mount over udp, portmap replies with PROGRAM_NOT_AVAILABLE error.

2) when sub-dir mount works

[skoduri@skoduri ~]$ tshark -r /tmp/client2.pcap   "rpc.program == 100003 || rpc.program == 100005 || rpc.program == 100000"
 45 20:10:08.196895 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
 47 20:10:08.422716  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 45) PROGRAM_NOT_AVAILABLE
 89 20:10:09.106495 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
 90 20:10:09.331793  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 89) PROGRAM_NOT_AVAILABLE
179 20:10:15.008324 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
187 20:10:15.233649  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 179) PROGRAM_NOT_AVAILABLE
326 20:10:25.915413 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
337 20:10:26.141768  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 326) PROGRAM_NOT_AVAILABLE
902 20:10:46.368657 10.10.188.199 → 10.65.6.132  NFS 194 V3 LOOKUP Call, DH: 0x00000000/replicate/bipin2
904 20:10:46.595552  10.65.6.132 → 10.10.188.199 NFS 90 V3 LOOKUP Reply (Call In 902) Error: NFS3ERR_NOENT
907 20:10:46.596331 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
911 20:10:46.821527  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 907) Port:951
912 20:10:46.821815 10.10.188.199 → 10.65.6.132  MOUNT 82 V3 NULL Call
923 20:10:47.046824  10.65.6.132 → 10.10.188.199 MOUNT 66 V3 NULL Reply (Call In 912)
924 20:10:47.047246 10.10.188.199 → 10.65.6.132  MOUNT 174 V3 MNT Call replicate/bipin2
936 20:10:47.284963  10.65.6.132 → 10.10.188.199 MOUNT 134 V3 MNT Reply (Call In 924)


But sometimes portmap seem to be responding with "951" for GETPORT call for MOUNT over UDP even though there doesn't seem any service listening on that port. From rpcinfo -

[root@dhcp6-132 ~]# rpcinfo -p
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100005    1   tcp  38466  mountd
    100003    3   tcp   2049  nfs
    100021    4   tcp  38468  nlockmgr
    100227    3   tcp   2049  nfs_acl
    100024    1   udp  58835  status
    100024    1   tcp  50952  status
    100021    1   udp    946  nlockmgr
    100021    1   tcp    948  nlockmgr
[root@dhcp6-132 ~]# 

Since this behaviour is seen just after gluster-NFS restart, I suspect if that port is opened for a while and that resulted in MNT calls getting succeeded.

Comment 15 Soumya Koduri 2016-09-13 11:51:37 UTC

And the reason why volume mount works always is because lookup on the volume path always succeeds as updated in comment#6

 84 21:08:10.300234 10.10.188.199 → 10.65.6.132  NFS 194 V3 LOOKUP Call, DH: 0x00000000/replicate/bipin2
 88 21:08:10.526815  10.65.6.132 → 10.10.188.199 NFS 90 V3 LOOKUP Reply (Call In 84) Error: NFS3ERR_NOENT
 91 21:08:10.527892 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
101 21:08:10.753877  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 91) PROGRAM_NOT_AVAILABLE
114 21:08:10.981222 10.10.188.199 → 10.65.6.132  NFS 194 V3 LOOKUP Call, DH: 0x00000000/replicate/bipin2
119 21:08:11.206717  10.65.6.132 → 10.10.188.199 NFS 90 V3 LOOKUP Reply (Call In 114) Error: NFS3ERR_NOENT
122 21:08:11.207641 10.10.188.199 → 10.65.6.132  Portmap 98 V2 GETPORT Call MOUNT(100005) V:3 UDP
127 21:08:11.432815  10.65.6.132 → 10.10.188.199 Portmap 70 V2 GETPORT Reply (Call In 122) PROGRAM_NOT_AVAILABLE
548 21:08:27.230009 10.10.188.199 → 10.65.6.132  NFS 190 V3 LOOKUP Call, DH: 0x00000000/replicate
553 21:08:27.459050  10.65.6.132 → 10.10.188.199 NFS 234 V3 LOOKUP Reply (Call In 548), FH: 0xbca72b2a

Comment 16 Soumya Koduri 2016-09-13 12:05:15 UTC

If I enable "nfs.mount-udp" option, subdir mount works fine. But there is an issue even after the successful mount. First GETATTR post mount results in NFS server panic. We need to fix that first I think.

Comment 17 Soumya Koduri 2016-09-13 12:19:46 UTC

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8cd0a63700 (LWP 8064)]
0x00007f8cd9ec5a2d in nfs3_getattr_resume (carg=0x7f8cd169806c) at nfs3.c:847
847	                if (ictx->generation != priv->generation) {
(gdb) bt
#0  0x00007f8cd9ec5a2d in nfs3_getattr_resume (carg=0x7f8cd169806c) at nfs3.c:847
#1  0x00007f8cd9eccb63 in nfs3_fh_resolve_inode_done (cs=0x7f8cd169806c, 
    inode=<value optimized out>) at nfs3-helpers.c:3624
#2  0x00007f8cd9ecccab in nfs3_fh_resolve_inode (cs=0x7f8cd169806c)
    at nfs3-helpers.c:3831
#3  0x00007f8cd9eccd35 in nfs3_fh_resolve_resume (cs=0x7f8cd169806c)
    at nfs3-helpers.c:3863
#4  0x00007f8cd9ecce88 in nfs3_fh_resolve_root (cs=0x7f8cd169806c)
    at nfs3-helpers.c:3918
#5  0x00007f8cd9ec52a3 in nfs3_getattr (req=0x7f8cd97be06c, fh=0x7f8cd0a62b20)
    at nfs3.c:899
#6  0x00007f8cd9ec56cd in nfs3svc_getattr (req=0x7f8cd97be06c) at nfs3.c:935
#7  0x00007f8ce7f151a4 in rpcsvc_handle_rpc_call (svc=<value optimized out>, 
    trans=<value optimized out>, msg=0x7f8cc4001810) at rpcsvc.c:705
#8  0x00007f8ce7f153a3 in rpcsvc_notify (trans=0x7f8cd4108060, 
    mydata=<value optimized out>, event=<value optimized out>, 
    data=0x7f8cc4001810) at rpcsvc.c:799
---Type <return> to continue, or q <return> to quit---
#9  0x00007f8ce7f16af8 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:546
#10 0x00007f8cdcc967ed in socket_event_poll_in (this=0x7f8cd4108060)
    at socket.c:2353
#11 0x00007f8cdcc97abd in socket_event_handler (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7f8cd4108060, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2466
#12 0x00007f8ce81c0080 in event_dispatch_epoll_handler (data=0x7f8cd4068be0)
    at event-epoll.c:575
#13 event_dispatch_epoll_worker (data=0x7f8cd4068be0) at event-epoll.c:678
#14 0x00007f8ce7236aa1 in start_thread (arg=0x7f8cd0a63700)
    at pthread_create.c:301
#15 0x00007f8ce6b9faad in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) l
842	         * haven't been validated since the state change are affected.
843	         */
844	        if (inode_ctx_get(cs->resolvedloc.inode,cs->nfsx,&raw_ctx) == 0) {
845	                ictx = (struct nfs_inode_ctx *)raw_ctx;
846	                priv = cs->nfsx->private;
847	                if (ictx->generation != priv->generation) {
848	                        ret = nfs_lookup (cs->nfsx, cs->vol, &nfu,
849	                                          &cs->resolvedloc,
850	                                          nfs3svc_getattr_lookup_cbk, cs);
851	                        goto check_err;
(gdb) p ictx
$1 = (struct nfs_inode_ctx *) 0x2
(gdb) p cs->resolvedloc.inode
$2 = (inode_t *) 0x7f8cd329c108
(gdb) p raw_ctx
$3 = 2
(gdb) 


Maybe we are not setting proper inode_ctx while creating inode in udp mount.

Comment 26 Bipin Kunal 2016-10-24 06:03:09 UTC

Hello Alok, 

Please have a look at C #25. This is reply from customer.

I would request you to get this prioritized, at least we need a timeline for this.

Thanks,
Bipin Kunal

Comment 27 Niels de Vos 2016-10-24 11:17:18 UTC

Created attachment 1213450 [details]
Script to disable MOUNT/tcp on a Gluster/NFS server

After running this script on a Gluster/NFS server, the MOUNT protocol can only be used over UDP (not TCP as standard). This might make it possible to script a test-case and not rely on the Solaris 10 NFS-client.

In combination with a hacked nfsshell (running on the NFS-client), it is possible to get the same tcpdump from a Linux system. This also reproduces the segfault in Gluster/NFS.

1. get the sources from https://github.com/nixpanic/nfsshell/tree/wip/stat
2. run 'make' to compile 'nfsshell'
3. run nfssshell like:
       $ ./nfsshell
       nfs> host server.example.com
       nfs> mount volume/subdir
       nfs> stat .
4. observe Gluster/NFS segfault in /var/log/gluster/nfs.log

After restarting the Gluster/NFS service, it sometimes fails to register at rpcbind (because of the segfault it did not unregister). Subsequent mounting then fails because there is no MOUNT service on UDP available. A next restart of Gluster/NFS seems to 'correct' this.

Comment 30 Niels de Vos 2016-10-24 19:28:03 UTC

The problem is most likely caused by the inode context that gets set in libgfapi through glfs_resolve_component() and glfs_resolve_at() (called from xlators/nfs/server/src/mount3.c:__mnt3udp_get_export_subdir_inode).

gfapi is not involved in the rest of the revalidation in this case, and the inode context is not used afterwards. glfs_resolve_at() is an internal function, and could get a parameter that prevents calling inode_ctx_set() in glfs_resolve_component(). I'll try to experiment a little with that and see how that works out.

Comment 31 Niels de Vos 2016-10-24 19:56:49 UTC

Created attachment 1213554 [details]
Fix for segfault when doing subdir mounts over UDP

This patch seems to fix the segfault for me when reproducing with nfsshell (see comment #27). I have not decided yet if this approach is acceptable for merging upstream, and have not sent it to review.gluster.org either.

Please test this on a RHGS-3.1 build with a Solaris client, thanks!

Comment 32 Bipin Kunal 2016-10-26 12:30:28 UTC

Niels,

I tried using the patch given by you on top of 3.1 and 3.1.1 but I get hunk issue for both. Could you please re verify the RHGS/gluster version to be used for this patch.

Comment 33 Niels de Vos 2016-10-26 19:13:22 UTC

Created attachment 1214391 [details]
Rebased patch on top of RHGS-3.1.3

Please check try this one. The other version was written on top of the upstream master branch.

Comment 34 Bipin Kunal 2016-10-27 11:17:32 UTC

Thanks Niles.

Here is the scratch build : https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11964476

I tested patch with solaris client and was able to perform subdir mount successfully with any crash and restart of process was even not needed.


# mount -o proto=tcp,vers=3 nfs://dhcp6-132.gsslab.pnq.redhat.com:38465/replicate/bipin2/test/ /bipin/mount1
# cd /bipin/mount1
# ls
lzhdf
# echo "zkhgczd" >> lzhdf
# cat lzhdf
zkhgczd
# 



Although subdir mount is working fine but nfs.log do report a error at the time of mount as below:

[2016-10-27 10:50:32.160677] E [nfs3.c:341:__nfs3_get_volume_id] (-->/usr/lib64/glusterfs/3.7.9/xlator/nfs/server.so(nfs3_lookup_reply+0x4d) [0x7f1ea30416ed] -->/usr/lib64/glusterfs/3.7.9/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x78) [0x7f1ea30401f8] -->/usr/lib64/glusterfs/3.7.9/xlator/nfs/server.so(__nfs3_get_volume_id+0xae) [0x7f1ea304013e] ) 0-nfs-nfsv3: invalid argument: xl [Invalid argument]

Comment 35 Bipin Kunal 2016-10-31 11:48:56 UTC

Niles,

What is your opinion on the error coming in nfs.log at mount time? Check C #34

Thanks,
Bipin

Comment 38 Niels de Vos 2016-11-24 09:59:53 UTC

10:49 < ndevos> bkunal: hmm, does that happen on each time of the mounting, or
    only the first time (before any I/O was done over the mount)?
10:49 < bkunal> ndevos, yes
10:49 < bkunal> ndevos, everytime
10:49 < ndevos> well, thats ugly :-/
10:50 < bkunal> ndevos, I have solaris setup with me. that can be seen easily
10:55 < ndevos> bkunal: can you enable log-level=TRACE for the gluster/nfs and
    mount again? I would like to see if nfs_mntpath_to_xlator() then logs the
    "Subvolume search"
10:56 < bkunal> ndevos, sure, Anything else ?
10:56 < ndevos> bkunal: easiest to do so is check the whole glusterfs command
    in ps, kill it and start it manually with the --log-level=TRACE option
10:56 < ndevos> bkunal: nope, thats all for now

Comment 40 Bipin Kunal 2016-11-29 05:02:20 UTC

Created attachment 1225603 [details]
successful mount

Comment 41 Bipin Kunal 2016-11-29 05:02:53 UTC

Created attachment 1225604 [details]
unsuccessful mount

Comment 43 Niels de Vos 2016-11-30 09:36:44 UTC

Both logs have the "E [nfs3.c:341:__nfs3_get_volume_id]" error, so it does not
seem to be critical, or related to the mount failures.

The logic in nfs_mntpath_to_xlator() does not seem to take subdirectory mounts
into account.

 73 xlator_t *
 74 nfs_mntpath_to_xlator (xlator_list_t *cl, char *path)
 75 {
 76         char            *volname = NULL;
 77         char            *volptr = NULL;
 78         size_t           pathlen;
 79         xlator_t        *targetxl = NULL;
 80 
 81         if ((!cl) || (!path))
 82                 return NULL;
 83 
 84         volname = strdupa (path);
 85         pathlen = strlen (volname);
 86         gf_msg_trace (GF_NFS, 0, "Subvolume search: %s", path);
 87         if (volname[0] == '/')
 88                 volptr = &volname[1];
 89         else
 90                 volptr = &volname[0];
 91 
 92         if (pathlen && volname[pathlen - 1] == '/')
 93                 volname[pathlen - 1] = '\0';
 94 
 95         while (cl) {
 96                 if (strcmp (volptr, cl->xlator->name) == 0) {
 97                         targetxl = cl->xlator;
 98                         break;
 99                 }
100 
101                 cl = cl->next;
102         }
103 
104         return targetxl;
105 
106 }

The above while-loop needs to get the volname as input, not the volname/subdir
as it is currently. The logs clearly contain

  Subvolume search: replicate/bipin2/

Splitting the volname string at the first /, instead of dropping the last /
should fix this (modify line 92/93).

Bipin, do you want to send a patch for this?

Comment 44 Bipin Kunal 2016-11-30 11:02:23 UTC

Sure, Niels.

I will try and get back to you.

I will leave needinfo on me.

Thanks again.

Comment 46 Bipin Kunal 2017-01-23 05:24:33 UTC

Hi Niels,

I have made some changes in the function "nfs_mntpath_to_xlator" as per your instruction. I have not yet tested my patch. 
Please have a look below and let me know if you think this will work.

xlator_t *
nfs_mntpath_to_xlator (xlator_list_t *cl, char *path)
{
		char            *volname = NULL;
		char            *volptr = NULL;
		size_t           pathlen;
		xlator_t        *targetxl = NULL;
		int             i;
		if ((!cl) || (!path))
				return NULL;

		volname = strdupa (path);
		pathlen = strlen (volname);
		gf_msg_trace (GF_NFS, 0, "Subvolume search: %s", path);
		if (volname[0] == '/')
				volptr = &volname[1];
		else
				volptr = &volname[0];

->		for (i=0; (pathlen - 1); i++ ) {
->				if(volname[i] == '/' && i!=0) {
->						break;
->				}
->		}
->
->		if (pathlen && volname[i] == '/')
->				volname[i] = '\0';

		while (cl) {
				if (strcmp (volptr, cl->xlator->name) == 0) {
						targetxl = cl->xlator;
						break;
				}

				cl = cl->next;
		}

		return targetxl;

}

Comment 47 Bipin Kunal 2017-01-23 11:13:36 UTC

Created attachment 1243538 [details]
bz-1372283-nfs_mntpath_to_xlator

Comment 48 Bipin Kunal 2017-02-09 07:09:58 UTC

Niels,

Did you get time to review the attached patch in C #47?

Thanks,
Bipin Kunal

Comment 49 Niels de Vos 2017-02-14 14:28:03 UTC

I did reply by email with some inline comments a while back. Please let me know if you can not find the mail and I'll dig it up again.

Comment 50 Jiffin 2017-02-15 07:19:59 UTC

Bipin, Can you please post upstream patch for the proposed fix and attach the link to this bug

Comment 51 Niels de Vos 2017-02-16 14:36:02 UTC

Created attachment 1250881 [details]
Partial patch to extend the functionality and correctness

Bipin and I discussed a little over IRC about the additional failures that he got when doing the right thing in nfs_mntpath_to_xlator(). This change (tries to) resolve the subdir in the right volume, and sets the filehandle to the correct GFID.

Addressing the TODO will be needed, and Bipin can include it all in his change.

Comment 52 Bipin Kunal 2017-02-22 14:18:07 UTC

Created attachment 1256460 [details]
patch

Comment 54 Bipin Kunal 2017-02-22 14:38:29 UTC

from nfs.log :

[2017-02-22 13:38:51.714383] D [MSGID: 0] [nfs.c:458:nfs_subvolume_set_started] 0-nfs: Starting up: replicate , vols started till now: 1
[2017-02-22 13:45:00.279608] D [rpc-clnt-ping.c:300:rpc_clnt_start_ping] 0-replicate-client-0: returning as transport is already disconnected OR there are no frames (0 || 0)
[2017-02-22 13:45:00.279658] D [rpc-clnt-ping.c:300:rpc_clnt_start_ping] 0-replicate-client-1: returning as transport is already disconnected OR there are no frames (0 || 0)
[2017-02-22 13:45:00.279668] D [logging.c:1830:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present
[2017-02-22 13:45:00.279685] D [logging.c:1792:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2017-02-22 13:46:00.112932] D [MSGID: 0] [nfs3-helpers.c:1665:nfs3_log_fh_entry_call] 0-nfs-nfsv3: XID: 58afdea4, LOOKUP: args: FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000, name: replicate/bipin1
[2017-02-22 13:47:04.341267] D [logging.c:1830:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present
[2017-02-22 13:47:04.341329] D [logging.c:1792:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2017-02-22 13:47:04.341503] D [MSGID: 0] [nfs3-helpers.c:1665:nfs3_log_fh_entry_call] 0-nfs-nfsv3: XID: 58a6b2ac, LOOKUP: args: FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000, name: replicate/bipin1
[2017-02-22 13:47:06.328219] D [MSGID: 0] [nfs-common.c:110:nfs_mntpath_to_xlator] 0-nfs: Volptr : replicate                            and cl->xlator->name : replicate
[2017-02-22 13:53:37.123540] D [logging.c:1830:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present
[2017-02-22 13:53:37.123585] D [logging.c:1792:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2017-02-22 13:53:37.123664] D [MSGID: 0] [nfs-common.c:110:nfs_mntpath_to_xlator] 0-nfs: Volptr : replicate                            and cl->xlator->name : replicate
[2017-02-22 13:53:37.123720] D [logging.c:1766:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 1 extra log messages
[2017-02-22 13:53:37.123743] D [logging.c:1769:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 1 extra log messages
pending frames:
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2017-02-22 13:53:37
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.9
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb8)[0x7fd877aa8b78]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x32f)[0x7fd877ad1bef]
/lib64/libc.so.6(+0x335ce32660)[0x7fd876445660]
/usr/lib64/glusterfs/3.7.9/xlator/nfs/server.so(nfs3_lookup+0x28c)[0x7fd8698200cc]
/usr/lib64/glusterfs/3.7.9/xlator/nfs/server.so(nfs3svc_lookup+0x96)[0x7fd869820716]
/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x314)[0x7fd8778711a4]
/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103)[0x7fd8778713a3]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7fd877872af8]
/usr/lib64/glusterfs/3.7.9/rpc-transport/socket.so(+0x97ed)[0x7fd86c5f27ed]
/usr/lib64/glusterfs/3.7.9/rpc-transport/socket.so(+0xaabd)[0x7fd86c5f3abd]
/usr/lib64/libglusterfs.so.0(+0x99080)[0x7fd877b1c080]
/lib64/libpthread.so.0(+0x335d607aa1)[0x7fd876b92aa1]

Comment 56 Atin Mukherjee 2017-03-10 10:45:07 UTC

upstream patch : https://review.gluster.org/#/c/16770/

Comment 57 Atin Mukherjee 2017-04-10 15:50:02 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/103007/

Comment 59 surabhi 2017-05-04 12:14:44 UTC

Tried with previous versions and was able to reproduce the issue.

The sub-directory mount on solaris 10 client fails as follows:

mount -o proto=tcp,vers=3 nfs://10.70.47.59:/testvol/subdir /mnt/solmount
nfs mount: 10.70.47.59: : RPC: Program not registered
nfs mount: retrying: /mnt/solmount


Then upgraded to gluster build : glusterfs-3.8.4-24.el7rhgs.x86_64

Now the sub-directory mount on solaris client succeeds. 


mount -o proto=tcp,vers=3 nfs://10.70.35.213:/distrep3/subdir /mnt/solmount
# cd /mnt/solmount
# ls
file2  file3  hello  j1     j10    j2     j3     j4     j5     j6     j7     j8     j9

Also tried running some I/O's on sub-directory mount point it succeeds.

Will do some more tests before moving this BZ to verified.

Comment 60 surabhi 2017-05-11 11:40:07 UTC

Tried running some more i/o's on sub-directory mount point and it works fine.

Moving the BZ to verified. More testing will be done with regressions

Comment 65 errata-xmlrpc 2017-09-21 04:28:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 66 errata-xmlrpc 2017-09-21 04:54:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774