| Summary: | NFS fails to mount on boot if both client and server were rebooted at the same time [rhel7] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Yongcheng Yang <yoyang> |
| Component: | nfs-utils | Assignee: | Steve Dickson <steved> |
| Status: | CLOSED ERRATA | QA Contact: | Yongcheng Yang <yoyang> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.3 | CC: | eguan, steved, yoyang |
| Target Milestone: | rc | Keywords: | Patch, Reproducer |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | nfs-utils-1.3.0-0.47.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1352856 | Environment: | |
| Last Closed: | 2017-08-01 19:48:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | 1352856 | ||
| Bug Blocks: | |||
|
Description
Yongcheng Yang
2016-12-13 06:37:28 UTC
According to Bug 1352856 Comment 10, we need the following upstream commit. commit df0b99980d74505299e9289c2ccddd03a48b664f Author: NeilBrown <neilb> Date: Sat Aug 20 10:39:52 2016 -0400 mount: RPC_PROGNOTREGISTERED should not be a permanent error We also need
commit 0935cebc1e130c6adfd870c88a6493277c84d47f
Author: Chuck Lever <chuck.lever>
Date: Fri Mar 19 16:14:26 2010 -0400
mount: Mount should retry unreachable hosts
and
commit 6a060231b029aa6b7a0af4fa69c84603f9f663dd
Author: NeilBrown <neilb>
Date: Tue Dec 6 10:50:06 2016 -0500
mount: take history into account when assessing if an error is permanent.
Seems this issue still exists in nfs-utils-1.3.0-0.40.el7 [root ~]# service nfs stop Redirecting to /bin/systemctl stop nfs.service [root@ ~]# mount $HOSTNAME:/export_test/ /mnt/mnt_test/ -o retry=10 & [1] 30175 [root@ ~]# mount.nfs: requested NFS version or transport protocol is not supported [1]+ Exit 32 mount $HOSTNAME:/export_test/ /mnt/mnt_test/ -o retry=10 [root@ ~]# [root@ ~]# mount $HOSTNAME:/export_test/ /mnt/mnt_test/ -o retry=100 mount.nfs: requested NFS version or transport protocol is not supported [root@ ~]# rpm -q nfs-utils nfs-utils-1.3.0-0.40.el7.x86_64 [root@ ~]# Guessing maybe it's introduced again by the following commit. As rhel6 acts OK while not containing it. (In reply to Steve Dickson from comment #2) > > commit 6a060231b029aa6b7a0af4fa69c84603f9f663dd > Author: NeilBrown <neilb> > Date: Tue Dec 6 10:50:06 2016 -0500 > > mount: take history into account when assessing if an error is permanent. Testing with the upstream nfs-utils version (i.e. /tmp/mount.nfs): [root@ ~]# /tmp/mount.nfs $HOSTNAME:/export_test/ /mnt/mnt_test/ -o retry=100 /tmp/mount.nfs: /lib64/libtirpc.so.1: no version information available (required by /tmp/mount.nfs) mount.nfs: requested NFS version or transport protocol is not supported (In reply to Yongcheng Yang from comment #4) > Seems this issue still exists in nfs-utils-1.3.0-0.40.el7 > > [root ~]# service nfs stop > Redirecting to /bin/systemctl stop nfs.service > [root@ ~]# mount $HOSTNAME:/export_test/ /mnt/mnt_test/ -o retry=10 & > [1] 30175 > [root@ ~]# mount.nfs: requested NFS version or transport protocol is not > supported > > [1]+ Exit 32 mount $HOSTNAME:/export_test/ /mnt/mnt_test/ > -o retry=10 > [root@ ~]# > [root@ ~]# mount $HOSTNAME:/export_test/ /mnt/mnt_test/ -o retry=100 > mount.nfs: requested NFS version or transport protocol is not supported > [root@ ~]# rpm -q nfs-utils > nfs-utils-1.3.0-0.40.el7.x86_64 > [root@ ~]# > I'm not seeing this... could you please attach a tshark network trace and do a mount -vvv (In reply to Steve Dickson from comment #5) > I'm not seeing this... could you please attach a tshark > network trace and do a mount -vvv This can be easily reproduced in single host, so I'm not collecting the tshark. Just stop the nfs.service and try to mount it. If no retry specified, the default value for foreground mounts retries should be 2 minutes. [root@ ~]# systemctl stop nfs [root@ ~]# date ; mount 127.0.0.1:/export_test/ /mnt/mnt_test/ -o retry=1 ; date Wed 3 May 05:30:26 EDT 2017 mount.nfs: requested NFS version or transport protocol is not supported Wed 3 May 05:30:29 EDT 2017 [root@ ~]# [root@ ~]# mount 127.0.0.1:/export_test/ /mnt/mnt_test/ -o retry=1 -vvv mount.nfs: timeout set for Wed May 3 05:31:44 2017 mount.nfs: trying text-based options 'retry=1,vers=4.1,addr=127.0.0.1,clientaddr=127.0.0.1' mount.nfs: mount(2): Connection refused mount.nfs: trying text-based options 'retry=1,addr=127.0.0.1' mount.nfs: prog 100003, trying vers=3, prot=6 mount.nfs: portmap query retrying: RPC: Program not registered mount.nfs: prog 100003, trying vers=3, prot=17 mount.nfs: portmap query failed: RPC: Program not registered mount.nfs: trying text-based options 'retry=1,vers=4.1,addr=127.0.0.1,clientaddr=127.0.0.1' mount.nfs: mount(2): Connection refused mount.nfs: trying text-based options 'retry=1,addr=127.0.0.1' mount.nfs: prog 100003, trying vers=3, prot=6 mount.nfs: portmap query retrying: RPC: Program not registered mount.nfs: prog 100003, trying vers=3, prot=17 mount.nfs: portmap query failed: RPC: Program not registered mount.nfs: trying text-based options 'retry=1,vers=4.1,addr=127.0.0.1,clientaddr=127.0.0.1' mount.nfs: mount(2): Connection refused mount.nfs: trying text-based options 'retry=1,addr=127.0.0.1' mount.nfs: prog 100003, trying vers=3, prot=6 mount.nfs: portmap query retrying: RPC: Program not registered mount.nfs: prog 100003, trying vers=3, prot=17 mount.nfs: portmap query failed: RPC: Program not registered mount.nfs: requested NFS version or transport protocol is not supported [root@ ~]# (In reply to Yongcheng Yang from comment #6) > (In reply to Steve Dickson from comment #5) > > I'm not seeing this... could you please attach a tshark > > network trace and do a mount -vvv > > This can be easily reproduced in single host, so I'm not collecting the > tshark. > > Just stop the nfs.service and try to mount it. If no retry specified, the > default value for foreground mounts retries should be 2 minutes. > Also we can specify the option "retry", which gets failed immediately either. Just state some more clarification about this problem. This bug was cloned from rhel6 to fix the issue mentioned in Bug 1352856 Comment #10. I.e., the mount should not give up before retry period expired. Expected results (logs of rhel6): ---------------------------------- [root@ibm-x3550m3-06 ~]# service nfs stop Shutting down NFS daemon: [FAILED] Shutting down NFS mountd: [FAILED] Shutting down NFS quotas: [FAILED] [root@ibm-x3550m3-06 ~]# time mount 127.0.0.1:/export_test/ /mnt/mnt_test/ -o retry=1 mount.nfs: Connection timed out real 1m5.129s <<<<<<<<<<<<< wait for 60 seconds user 0m0.011s sys 0m0.025s [root@ibm-x3550m3-06 ~]# Actual results: --------------- [root@hp-dl380pg8-09 ~]# systemctl restart nfs [root@hp-dl380pg8-09 ~]# exportfs -i *:/export_test/ [root@hp-dl380pg8-09 ~]# mount 127.0.0.1:/export_test/ /mnt/mnt_test/ [root@hp-dl380pg8-09 ~]# umount /mnt/mnt_test/ [root@hp-dl380pg8-09 ~]# systemctl stop nfs [root@hp-dl380pg8-09 ~]# time mount 127.0.0.1:/export_test/ /mnt/mnt_test/ -o retry=1 mount.nfs: requested NFS version or transport protocol is not supported real 0m3.010s <<<<<<<<<<<<< give up and failed immediately user 0m0.000s sys 0m0.008s [root@hp-dl380pg8-09 ~]# rpm -q nfs-utils nfs-utils-1.3.0-0.41.el7.x86_64 [root@hp-dl380pg8-09 ~]# (In reply to Yongcheng Yang from comment #7) > (In reply to Yongcheng Yang from comment #6) > > (In reply to Steve Dickson from comment #5) > > > I'm not seeing this... could you please attach a tshark > > > network trace and do a mount -vvv > > > > This can be easily reproduced in single host, so I'm not collecting the > > tshark. > > > > Just stop the nfs.service and try to mount it. If no retry specified, the > > default value for foreground mounts retries should be 2 minutes. > > > > Also we can specify the option "retry", which gets failed immediately either. > > > Just state some more clarification about this problem. > > This bug was cloned from rhel6 to fix the issue mentioned in Bug 1352856 > Comment #10. I.e., the mount should not give up before retry period > expired. > > Expected results (logs of rhel6): > ---------------------------------- > [root@ibm-x3550m3-06 ~]# service nfs stop > Shutting down NFS daemon: [FAILED] > Shutting down NFS mountd: [FAILED] > Shutting down NFS quotas: [FAILED] > [root@ibm-x3550m3-06 ~]# time mount 127.0.0.1:/export_test/ /mnt/mnt_test/ > -o retry=1 > mount.nfs: Connection timed out > > real 1m5.129s <<<<<<<<<<<<< wait for 60 seconds > user 0m0.011s > sys 0m0.025s > [root@ibm-x3550m3-06 ~]# > > > Actual results: > --------------- > [root@hp-dl380pg8-09 ~]# systemctl restart nfs > [root@hp-dl380pg8-09 ~]# exportfs -i *:/export_test/ > [root@hp-dl380pg8-09 ~]# mount 127.0.0.1:/export_test/ /mnt/mnt_test/ > [root@hp-dl380pg8-09 ~]# umount /mnt/mnt_test/ > [root@hp-dl380pg8-09 ~]# systemctl stop nfs > [root@hp-dl380pg8-09 ~]# time mount 127.0.0.1:/export_test/ /mnt/mnt_test/ > -o retry=1 > mount.nfs: requested NFS version or transport protocol is not supported > > real 0m3.010s <<<<<<<<<<<<< give up and failed immediately No... We can't do that because the NFS server may come up in that timeout value... Yes this is different than RHEL6 but it what it is I guess I'm a bit confused on what the problem is. Is the problem RHEL6 times out sooner than RHEL7? Is the problem the error messages is different between RHEL6 and RHEL7? Is the problem the mount is not being retried? Another question, does the upstream mount work as expected?
If so here are the comments that are not in RHEL7
commit 0277815d9509ffc197c27973313f364616245704
Author: Steve Dickson <steved>
Date: Thu May 4 09:50:49 2017 -0400
mount.nfs: Restore errno after v3 mounts on ECONNREFUSED errors
commit 48cdcf68a9209ae239dfc3d1a0b482089ef2cd2a
Author: NeilBrown <neilb>
Date: Wed Feb 15 10:31:28 2017 -0500
mount: call setgroups() before setuid()
commit cf73923358c47238088cbdd0bffdf1b7a4b7d0e7
Author: NeilBrown <neilb>
Date: Tue Dec 6 10:42:22 2016 -0500
mount: don't hide temporary error code on timeout.
^^^^^^ I'm wondering if this is the problem??
commit 2d0683f3843446a479cd9c451ea01e005937eebb
Author: NeilBrown <neilb>
Date: Wed Aug 3 13:13:49 2016 -0400
mount: use a public address for IPv6 callback.
commit 8cd75bc7b179294347f88baa25e12df0461d8f29
Author: NeilBrown <neilb>
Date: Wed Aug 3 13:07:37 2016 -0400
mount: don't treat temporary name resolution failure as permanent
(In reply to Steve Dickson from comment #10) > I guess I'm a bit confused on what the problem is. > > Is the problem RHEL6 times out sooner than RHEL7? Nope > > Is the problem the error messages is different > between RHEL6 and RHEL7? Nope > > Is the problem the mount is not being retried? Yes, the retry mount doesn't work as expected. Thankfully you've got what I mean at last. (In reply to Steve Dickson from comment #11) > Another question, does the upstream mount work as expected? Yes, have checked the upstream, the mount is OK (i.e. as expected). So if time allowed, please help to solve this issue. [root@ ~]# systemctl stop nfs [root@ ~]# time mount 127.0.0.1:/export_test/ /mnt/mnt_test/ -o retry=1 mount.nfs: requested NFS version or transport protocol is not supported real 0m3.011s ^^^^^^^^^^^^^^^ NO wait even specifyied "retry" user 0m0.003s sys 0m0.007s ##### Testing with the upsteam mount version #### [root@ ~]# time ./mount.nfs 127.0.0.1:/export_test/ /mnt/mnt_test/ -o retry=1 ./mount.nfs: /lib64/libtirpc.so.1: no version information available (required by ./mount.nfs) ^^^^^^^^^^^^ please ignore it, still don't know how to remove the warning mount.nfs: Connection refused real 1m5.017s ^^^^^^^^^^^^^^^ Wait 60 seconds as expected user 0m0.003s sys 0m0.009s [root@ ~]# Here is what I'm seeing f25# rpm -q nfs-utils nfs-utils-2.1.1-5.rc3.fc25.x86_64 f25# time mount -o retry=1 rhel7srv:/home/tmp /mnt/tmp mount.nfs: Connection refused real 1m5.313s user 0m0.029s sys 0m0.057s rhel7# rpm -q nfs-utils nfs-utils-1.3.0-0.45.el7.x86_64 rhel7# time mount -o retry=1 rhel7srv:/home/tmp /mnt/tmp mount.nfs: Connection timed out real 1m5.083s user 0m0.012s sys 0m0.032s (In reply to Steve Dickson from comment #13) Hi Steve, please have a try with another reproducer below. [root@hp-dl380eg8-03 ~]# cat background_mount.sh #!/bin/bash systemctl stop nfs for i in `seq 10`; do mkdir -p /tmp/mnt_test_${i} mount 127.0.0.1:/export_test /tmp/mnt_test_${i} & done sleep 2 [root@hp-dl380eg8-03 ~]# [root@hp-dl380eg8-03 ~]# rpm -q nfs-utils nfs-utils-1.3.0-0.45.el7.x86_64 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ testing within RHEL [root@hp-dl380eg8-03 ~]# ./background_mount.sh [root@hp-dl380eg8-03 ~]# mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: requested NFS version or transport protocol is not supported mount.nfs: requested NFS version or transport protocol is not supported [root@hp-dl380eg8-03 ~]# ps aux | grep mount root 21413 0.0 0.0 112664 964 pts/0 S+ 03:28 0:00 grep --color=auto mount [root@hp-dl380eg8-03 ~]# [root@bootp-73-5-211 tmp]# rpm -q nfs-utils nfs-utils-2.1.1-5.rc3.fc25.x86_64 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ testing within Fedora 25 [root@bootp-73-5-211 tmp]# ./background_mount.sh [root@bootp-73-5-211 tmp]# ps aux | grep mount root 29540 0.0 0.0 23736 1204 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_1 root 29542 0.0 0.0 23736 1284 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_2 root 29544 0.1 0.0 42996 3876 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_1 -o rw root 29545 0.1 0.0 42996 3880 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_2 -o rw root 29546 0.0 0.0 23736 1288 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_3 root 29548 0.0 0.0 42996 3932 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_3 -o rw root 29549 0.0 0.0 23736 1200 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_4 root 29552 0.0 0.0 23736 1264 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_5 root 29554 0.0 0.0 42996 4028 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_4 -o rw root 29555 0.0 0.0 23736 1244 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_6 root 29557 0.0 0.0 23736 1260 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_7 root 29558 0.0 0.0 42996 4016 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_6 -o rw root 29560 0.0 0.0 23736 1288 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_8 root 29562 0.0 0.0 42996 3888 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_7 -o rw root 29563 0.0 0.0 42996 3996 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_5 -o rw root 29564 0.0 0.0 42996 4064 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_8 -o rw root 29565 0.0 0.0 23736 1260 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_9 root 29567 0.0 0.0 23736 1192 pts/1 S 15:26 0:00 mount 127.0.0.1:/export_test /tmp/mnt_test_10 root 29569 0.0 0.0 42996 4000 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_9 -o rw root 29570 0.0 0.0 42996 4024 pts/1 S 15:26 0:00 /sbin/mount.nfs 127.0.0.1:/export_test /tmp/mnt_test_10 -o rw root 29574 0.0 0.0 10772 1004 pts/1 R+ 15:26 0:00 grep --color=auto mount [root@bootp-73-5-211 tmp]# pkill mount [root@bootp-73-5-211 tmp]# systemctl start nfs > rhel7# rpm -q nfs-utils > nfs-utils-1.3.0-0.45.el7.x86_64 > > rhel7# time mount -o retry=1 rhel7srv:/home/tmp /mnt/tmp > mount.nfs: Connection timed out > > real 1m5.083s > user 0m0.012s > sys 0m0.032s During my testing, do encounter this situation sometimes. Myabe it can be reproduced just try some more times. It's do annoying have to wait 60 seconds. [root@ ~]# rpm -q nfs-utils nfs-utils-1.3.0-0.45.el7.x86_64 [root@ ~]# time mount 127.0.0.1:/export_test/ /mnt/mnt_test/ -o retry=1 mount.nfs: Connection timed out real 1m5.018s user 0m0.002s sys 0m0.012s [root@ ~]# systemctl restart nfs [root@ ~]# systemctl stop nfs [root@ ~]# time mount 127.0.0.1:/export_test/ /mnt/mnt_test/ -o retry=1 mount.nfs: requested NFS version or transport protocol is not supported real 0m3.010s user 0m0.003s sys 0m0.005s [root@ ~]# time mount 127.0.0.1:/export_test/ /mnt/mnt_test/ -o retry=1 mount.nfs: requested NFS version or transport protocol is not supported real 0m3.009s user 0m0.001s sys 0m0.006s [root@ ~]# This commit make the rhel7 mount act like the upstream mount
commit 0277815d9509ffc197c27973313f364616245704
Author: Steve Dickson <steved>
Date: Thu May 4 09:50:49 2017 -0400
mount.nfs: Restore errno after v3 mounts on ECONNREFUSED errors
Moving to VERIFIED as test logs of comment #17 above. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2233 |