RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1858586 - autofs share doesn't mount when using nobind over RDMA where nfs-server and nfs-client are the same systems. [rhel-7.9.z]
Summary: autofs share doesn't mount when using nobind over RDMA where nfs-server and n...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: autofs
Version: 7.7
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Ian Kent
QA Contact: Kun Wang
URL:
Whiteboard:
Depends On:
Blocks: 1822123 1858742
TreeView+ depends on / blocked
 
Reported: 2020-07-19 10:38 UTC by Achilles Gaikwad
Modified: 2024-03-25 16:11 UTC (History)
5 users (show)

Fixed In Version: autofs-5.0.7-115
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1858742 (view as bug list)
Environment:
Last Closed: 2020-12-15 11:18:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch - mount_nfs.c fix local rdma share not mounting (1.35 KB, patch)
2020-07-20 02:21 UTC, Ian Kent
no flags Details | Diff

Description Achilles Gaikwad 2020-07-19 10:38:43 UTC
Description of problem:

    When using the same system as nfs-server and nfs-client, and
    using `nobind` option for autofs we would fall to the code where
    we let `mount.nfs(8)` to handle the mount. However, when the
    nfs-server and the nfs-client is the same system we end up calling
    `rpc_ping` which gives negative return code. Due to this we fall to
    the label next: and never attempt a mount of nfs share. (Please check
    debug logs added in `Actual results:` section below.)
    This patch fixes this BUG by not probing rpc_ping if we're
    using rdma.

Environment:

nfs-server and nfs-client is the same system. We're mounting the share locally via RDMA.

Version-Release number of selected component (if applicable):

RHEL 7 : 

3.10.0-1127.13.1.el7.x86_64
autofs-5.0.7-109.el7.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Have an IB interface via which you'll attempt the mount of nfs-share.

~~~
# ip -4 addr show ib0 
7: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    inet 192.168.1.69/24 brd 192.168.1.255 scope global ib0
       valid_lft forever preferred_lft forever
~~~

2. Create a nfs-server

o Have the configuration for rdma enabled:
~~~
# cat /etc/sysconfig/nfs | grep -v "#" 
RPCNFSDARGS="--rdma=20049"
RPCMOUNTDOPTS=""
STATDARG=""
SMNOTIFYARGS=""
RPCIDMAPDARGS=""
RPCGSSDARGS=""
GSS_USE_PROXY="yes"
BLKMAPDARGS=""
~~~

o create a directory to export:

~~~
# mkdir -p /export/home
# mkdir /mnt2
# chmod a+rwx /export
~~~

o Your /etc/exports should look like this:

~~~
# cat /etc/exports
/export *(rw,insecure,no_root_squash)
~~~

o start the nfs-server and make sure its running:

~~~
# systemctl restart nfs-server ; cat /proc/fs/nfsd/portlist 
rdma 20049
rdma 20049
udp 2049
tcp 2049
udp 2049
tcp 2049
~~~

o Disable firewall.

~~~
# systemctl disable --now firewalld
~~~

3. Attempt to manually mount the nfs-share. Once it is mounted, unmount it. This step is optional, just for verifying that the nfs-share mounts manually without any problems

~~~
# mount -t nfs 192.168.1.69:/export /mnt -o proto=rdma,port=20049 -vvv
mount.nfs: timeout set for Sun Jul 19 15:42:27 2020
mount.nfs: trying text-based options 'proto=rdma,port=20049,vers=4.1,addr=192.168.1.69,clientaddr=192.168.1.69'
# df -hT /mnt 
Filesystem           Type  Size  Used Avail Use% Mounted on
192.168.1.69:/export nfs4   70G   20G   50G  29% /mnt
# cat /proc/mounts | grep rdma
192.168.1.69:/export /mnt nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=rdma,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.69,local_lock=none,addr=192.168.1.69 0 0
~~~

o Umount the share

~~~
# umount /mnt
~~~

4. Add the following autofs configuration:

o Master file:
~~~
# cat /etc/auto.master 
/mnt2 /etc/auto.mnt nobind 
+auto.master
~~~

o Map file:
~~~
# cat  /etc/auto.mnt
test -fstype=nfs,proto=rdma,port=20049 nfs-server:/export
~~~

o Start autofs in foreground with debugging enabled in one terminal:
~~~
# automount -fdv
~~~

o Attempt `ls` on /mnt2/test

~~~
# ls /mnt2/test
~~~

Actual results:

o Share isn't mounted. Following debug logs are seen:

~~~
bind mounts disabled
handle_packet: type = 3
handle_packet_missing_indirect: token 58, name test, request pid 40185
attempting to mount entry /mnt2/test
lookup_mount: lookup(file): looking up test
lookup_mount: lookup(file): test -> -fstype=nfs,proto=rdma,port=20049 nfs-server:/export
parse_mount: parse(sun): expanded entry: -fstype=nfs,proto=rdma,port=20049 nfs-server:/export
parse_mount: parse(sun): gathered options: fstype=nfs,proto=rdma,port=20049
parse_mount: parse(sun): dequote("nfs-server:/export") -> nfs-server:/export
parse_mount: parse(sun): core of entry: options=fstype=nfs,proto=rdma,port=20049, loc=nfs-server:/export
sun_mount: parse(sun): mounting root /mnt2, mountpoint test, what nfs-server:/export, fstype nfs, options proto=rdma,port=20049
mount(nfs): root=/mnt2 name=test what=nfs-server:/export, fstype=nfs, options=proto=rdma,port=20049
mount(nfs): nfs options="proto=rdma,port=20049", nobind=32, nosymlink=0, ro=0
mount_mount: mount(nfs): calling mkdir_path /mnt2/test
mount(nfs): nfs: mount failure nfs-server:/export on /mnt2/test
dev_ioctl_send_fail: token = 58
failed to mount /mnt2/test
~~~

Expected results:

Share should be mounted :
~~~
# ls -l /mnt2/test/
total 4
drwxrwxrwx. 28 root root 4096 Jul 18 14:19 home
~~~

o Autofs debug logs when the share should work: (check Additional info)

~~~
# automount -fdv 
Starting automounter version 5.1.6, master map auto.master
using kernel protocol version 5.05
:::
bind mounts disabled
handle_packet: type = 3
handle_packet_missing_indirect: token 59, name test, request pid 40345
attempting to mount entry /mnt2/test
lookup_mount: lookup(file): looking up test
lookup_mount: lookup(file): test -> -fstype=nfs,proto=rdma,port=20049 nfs-server:/export
parse_mount: parse(sun): expanded entry: -fstype=nfs,proto=rdma,port=20049 nfs-server:/export
parse_mount: parse(sun): gathered options: fstype=nfs,proto=rdma,port=20049
parse_mount: parse(sun): dequote("nfs-server:/export") -> nfs-server:/export
parse_mount: parse(sun): core of entry: options=fstype=nfs,proto=rdma,port=20049, loc=nfs-server:/export
sun_mount: parse(sun): mounting root /mnt2, mountpoint test, what nfs-server:/export, fstype nfs, options proto=rdma,port=20049
mount(nfs): root=/mnt2 name=test what=nfs-server:/export, fstype=nfs, options=proto=rdma,port=20049
mount(nfs): nfs options="proto=rdma,port=20049", nobind=32, nosymlink=0, ro=0
mount_mount: mount(nfs): calling mkdir_path /mnt2/test
mount(nfs): calling mount -t nfs -s -o proto=rdma,port=20049 nfs-server:/export /mnt2/test
spawn_mount: mtab link detected, passing -n to mount
mount_mount: mount(nfs): mounted nfs-server:/export on /mnt2/test
dev_ioctl_send_ready: token = 59
mounted /mnt2/test
~~~




Additional info:

The debug output in `Expected results:` is after compiling upstream autofs on RHEL7 and patching the file `modules/mount_nfs.c`.
Following is the patch that was applied: 

~~~
diff --git a/modules/mount_nfs.c b/modules/mount_nfs.c
index 4e3e703..5a8c3bf 100644
--- a/modules/mount_nfs.c
+++ b/modules/mount_nfs.c
@@ -375,9 +375,13 @@ dont_probe:
                 */
                if (this->proximity == PROXIMITY_LOCAL) {
                        char *host = this->name ? this->name : "localhost";
-                       int ret;
-
-                       ret = rpc_ping(host, port, vers, 2, 0, RPC_CLOSE_DEFAULT);
+                       /* If we're using RDMA, rpc_ping will fail
+                        * when nfs-server is local.
+                        * Therefore, don't probe when we're using RDMA
+                        */
+                       int ret = 1;
+                       if(!rdma)
+                               ret = rpc_ping(host, port, vers, 2, 0, RPC_CLOSE_DEFAULT);
                        if (ret <= 0)
                                goto next;
                }
~~~

This patch has also been sent to upstream for review. Please backport this patch as this fixes the issue.

Root cause analysis:

When using rdma and local nfs-server, we fall to the code section : 
~~~
218         }
219         /*
220          * We can't probe protocol rdma so leave it to mount.nfs(8)
221          * and and suffer the delay if a server isn't available.
222          */
223         if (rdma)
224                 goto dont_probe;
225 
~~~

From here we goto the `if` condition where rpc_ping() returns negative for some reason. I did not investigate rpc_ping further. However, because we get a negative value for rpc_ping, the condition line 381 is TRUE. Then we goto next:
~~~
263 dont_probe:
:::
372                 /* If this is a fallback from a bind mount failure
373                  * check if the local NFS server is available to try
374                  * and prevent lengthy mount failure waits.
375                  */
376                 if (this->proximity == PROXIMITY_LOCAL) {
377                         char *host = this->name ? this->name : "localhost";
378                         int ret;
379 
380                         ret = rpc_ping(host, port, vers, 2, 0, RPC_CLOSE_DEFAULT);
381                         if (ret <= 0)
382                                 goto next;
383                 }
384 
:::
~~~

When we goto `next` following piece of code is executed. Notice that we don't return in next, but we fall thru forced_fail: where we print information on line 418.
~~~
408 next:
409                 free(loc);
410                 this = this->next;
411         }
412 
413 forced_fail:
414         free_host_list(&hosts);
415 
416         /* If we get here we've failed to complete the mount */
417 
418         info(ap->logopt, MODPREFIX "nfs: mount failure %s on %s", what, fullpath);
419 
420         if (ap->type != LKP_INDIRECT)
421                 return 1;
422 
423         if ((!(ap->flags & MOUNT_FLAG_GHOST) && name_len) || !existed)
424                 rmdir_path(ap, fullpath, ap->dev);
425 
426         return 1;
427 }
~~~

Therefore, we never attempt a mount of the nfs-share which we delegated to mount.nfs(8) earlier.

- The issue is a userspace issue
- The issue is reproducible on upstream autofs
- The patch provided above applies to upstream autofs
- Issue is not reproducible if nfs-share is a remote system. (i.e. non local system)

I had to add a lot of prints to the code to make sure that we're falling thru the labels therefore not mounting the nfs-share. :)

Comment 4 Ian Kent 2020-07-20 02:21:20 UTC
Created attachment 1701697 [details]
Patch - mount_nfs.c fix local rdma share not mounting

Comment 28 errata-xmlrpc 2020-12-15 11:18:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (autofs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5438


Note You need to log in before you can comment on or make changes to this bug.