Bug 1428941

Summary:	NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO
Product:	Red Hat Enterprise Linux 7	Reporter:	jinjian.1
Component:	nfs-utils	Assignee:	Steve Dickson <steved>
Status:	CLOSED DUPLICATE	QA Contact:	Yongcheng Yang <yoyang>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	7.3	CC:	chunwang, cloud, eguan, f.dreesen, gyde, jas, mike, steved, tk, yoyang
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-04-08 13:02:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description jinjian.1 2017-03-03 16:25:34 UTC

Description of problem:

 configured 3 nfs mount entries in /etc/fstab like:

10.1.50.6:/    /mnt/efs nfs4    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 1 0
10.1.31.83:/    /mnt/efs-1 nfs4    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 1 0
10.1.38.216:/    /mnt/efs-2 nfs4    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 1 0

After reboot, i found only the last one could mount successfully and the first two failed. Check the dmesg got two errors 

[  100.703285] NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO
[  100.710115] NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO.


The confusing part is: One mount successfully. I setup 3 totally the same nfsd servers.

Notes: mount -a could mount all 3 nfs targes successfully.


Version-Release number of selected component (if applicable):

Kernel: 3.10.0-514.el7.x86_64
nfs-utils-1.3.0-0.33.el7.x86_64


How reproducible:


Steps to Reproduce:
1. Put 3 nfs mount entries in /etc/fstab
2. reboot system

Actual results:

Mount failed!

Expected results:

All mount run successfully.

Additional info:

Error messages in /var/log/messages

Feb 27 13:32:13 ip-10-1-34-23 systemd: Started LSB: Initializes the correct repo depending on the region..
Feb 27 13:32:13 ip-10-1-34-23 systemd: Mounted /mnt/efs-2.

.....

Feb 27 13:33:43 ip-10-1-34-23 systemd: mnt-efs\x2d1.mount mounting timed out. Stopping.
Feb 27 13:33:43 ip-10-1-34-23 systemd: mnt-efs.mount mounting timed out. Stopping.
Feb 27 13:33:43 ip-10-1-34-23 kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO
Feb 27 13:33:43 ip-10-1-34-23 kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO
Feb 27 13:33:43 ip-10-1-34-23 systemd: Mounted /mnt/efs-1.
Feb 27 13:33:43 ip-10-1-34-23 systemd: Mounted /mnt/efs.

Comment 2 Steve Dickson 2017-04-08 13:02:38 UTC


*** This bug has been marked as a duplicate of bug 1428939 ***

Comment 3 M1ke 2018-01-26 11:15:05 UTC

This bug has been marked as a duplicate but the bug it relates to has been restricted access; so for those without access what was the solution to this?

Comment 4 Steve Dickson 2018-01-29 20:26:37 UTC

(In reply to M1ke from comment #3)
> This bug has been marked as a duplicate but the bug it relates to has been
> restricted access; so for those without access what was the solution to this?

I just added you to the cc list so you should have access to everything

Comment 5 cloud 2018-02-08 13:06:33 UTC

I have the same issue but the related bug is still restricted.

Comment 6 Gyde 2018-02-08 14:26:06 UTC

I have same issue on:
Kernal: 3.10.0-693.17.1.el7.x86_64
nfs-utils-1.3.0-0.48.el7_4.1.x86_64
Can I also gain cc?

Comment 7 f.dreesen 2018-02-12 11:22:25 UTC

We are running into the same issue.
Is there already a solution to this and could I maybe also be added to the cc list ?

kernel: 3.10.0-693.17.1.el7.x86_64
nfs-utils: 1.3.0-0.48.el7_4.1.x86_64

Comment 8 Yongcheng Yang 2018-02-12 13:20:01 UTC

Here just to share the latest progress of this issue.

This is actually a problem in the kernel side and has been resolved via Bug 1506382 in kernel-3.10.0-825.el7 now.

What's more, backporting to RHEL-7.4.z of kernel-3.10.0-693.19.1.el7 is available now and 7.3.z is still on the way.

Hope this can help.

Comment 9 Tom K 2018-04-09 04:20:24 UTC

Wanted to report, despite the solution here, https://access.redhat.com/solutions/3146191 I still get this on one host.  ( Reporting here in case having me run more tests could be beneficial. )

Issue happens only on ovirt01 when running systemctl restart autofs.  It's been reproducible each time autofs is restarted so far:

[ ovirt01 (client) - ERROR APPEARS HERE ]
[root@ovirt01 log]# uname -a
Linux ovirt01.nix.mds.xyz 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@ovirt01 log]#
[root@ovirt01 log]# rpm -aq|grep -Ei nfs
libnfsidmap-0.25-17.el7.x86_64
glusterfs-gnfs-3.12.6-1.el7.x86_64
nfs-utils-1.3.0-0.48.el7_4.2.x86_64
[root@ovirt01 log]#



[ ipaclient01 (client) - NO -512 ERROR HERE ]
[root@ipaclient01 yum.repos.d]# uname -a
Linux ipaclient01.nix.mds.xyz 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@ipaclient01 yum.repos.d]#
[root@ipaclient01 ~]# rpm -aq|grep -Ei nfs
libnfsidmap-0.25-17.el7.x86_64
nfs-utils-1.3.0-0.48.el7_4.2.x86_64
[root@ipaclient01 ~]#



[ nfs01 (server cluster w/ nfs02, NFS Ganesha v2.6) ]
[root@nfs01 ganesha]# uname -a
Linux nfs01.nix.mds.xyz 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@nfs01 ganesha]#

[ nfs02 (server cluster w/ nfs01, NFS Ganesha v2.6) ]
[root@nfs02 ~]# uname -a
Linux nfs02.nix.mds.xyz 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@nfs02 ~]#



Host ovirt01, where this reoccurs, has the least amount of memory:

[root@ovirt01 log]# cat /proc/meminfo
MemTotal:        3881708 kB
MemFree:          909704 kB
MemAvailable:    1256024 kB
Buffers:            2116 kB
Cached:           603680 kB
SwapCached:            0 kB
Active:          2302224 kB
Inactive:         407032 kB
Active(anon):    2138108 kB
Inactive(anon):    43776 kB
Active(file):     164116 kB
Inactive(file):   363256 kB
Unevictable:       48608 kB
Mlocked:           48640 kB
SwapTotal:       4063228 kB
SwapFree:        4063228 kB
Dirty:                72 kB
Writeback:             0 kB
AnonPages:       2152068 kB
Mapped:           165560 kB
Shmem:             52264 kB
Slab:              93056 kB
SReclaimable:      47420 kB
SUnreclaim:        45636 kB
KernelStack:       10080 kB
PageTables:        27800 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6004080 kB
Committed_AS:    4986628 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      157608 kB
VmallocChunk:   34359341052 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1695744 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       90048 kB
DirectMap2M:     4104192 kB
[root@ovirt01 log]#

Additional logs (/var/log/messages):

Apr  8 23:46:07 ovirt01 automount[1590]: st_expire: state 1 path /nix.my.dom
Apr  8 23:46:07 ovirt01 automount[1590]: expire_proc: exp_proc = 139973160154880 path /nix.my.dom
Apr  8 23:46:07 ovirt01 automount[1590]: expire_cleanup: got thid 139973160154880 path /nix.my.dom stat 0
Apr  8 23:46:07 ovirt01 automount[1590]: expire_cleanup: sigchld: exp 139973160154880 finished, switching from 2 to 1
Apr  8 23:46:07 ovirt01 automount[1590]: st_ready: st_ready(): state = 2 path /nix.my.dom
Apr  8 23:46:10 ovirt01 ovsdb-server: ovs|10232|stream_ssl|ERR|Private key must be configured to use SSL
Apr  8 23:46:10 ovirt01 ovsdb-server: ovs|10233|stream_ssl|ERR|Certificate must be configured to use SSL
Apr  8 23:46:11 ovirt01 automount[1590]: st_expire: state 1 path /n
Apr  8 23:46:11 ovirt01 automount[1590]: expire_proc: exp_proc = 139973160154880 path /n
Apr  8 23:46:11 ovirt01 automount[1590]: expire_cleanup: got thid 139973160154880 path /n stat 1
Apr  8 23:46:11 ovirt01 automount[1590]: expire_cleanup: sigchld: exp 139973160154880 finished, switching from 2 to 1
Apr  8 23:46:11 ovirt01 automount[1590]: st_ready: st_ready(): state = 2 path /n
Apr  8 23:46:12 ovirt01 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked.
Apr  8 23:46:12 ovirt01 systemd: Stopping Automounts filesystems on demand...
Apr  8 23:46:12 ovirt01 kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO
Apr  8 23:46:12 ovirt01 automount[1590]: do_notify_state: signal 15
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /misc from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /net from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /n from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: st_prepare_shutdown: state 1 path /n
Apr  8 23:46:12 ovirt01 automount[1590]: expire_proc: exp_proc = 139973157791488 path /n
Apr  8 23:46:12 ovirt01 automount[1590]: st_prepare_shutdown: state 1 path /net
Apr  8 23:46:12 ovirt01 automount[1590]: expire_proc: exp_proc = 139973155428096 path /net
Apr  8 23:46:12 ovirt01 automount[1590]: st_prepare_shutdown: state 1 path /misc
Apr  8 23:46:12 ovirt01 automount[1590]: mount(nfs): nfs: mount failure nfs-c01.nix.my.dom:/n/my.dom on /n/my.dom
Apr  8 23:46:12 ovirt01 automount[1590]: expire_proc: exp_proc = 139973153064704 path /misc
Apr  8 23:46:12 ovirt01 automount[1590]: dev_ioctl_send_fail: token = 96
Apr  8 23:46:12 ovirt01 automount[1590]: failed to mount /n/my.dom
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /nix.my.dom from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /- from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /my.dom from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: handle_packet: type = 3
Apr  8 23:46:12 ovirt01 automount[1590]: handle_packet_missing_indirect: token 97, name my.dom, request pid 2917
Apr  8 23:46:12 ovirt01 automount[1590]: dev_ioctl_send_fail: token = 97


Cheers,
TK

Comment 10 Tom K 2018-04-09 04:24:01 UTC

Wanted to report, despite the solution here, https://access.redhat.com/solutions/3146191 I still get this on one host.  ( Reporting here in case having me run more tests could be beneficial. )

Issue happens only on ovirt01 when running systemctl restart autofs.  It's been reproducible each time autofs is restarted so far:

[ ovirt01 (client) - ERROR APPEARS HERE ]
[root@ovirt01 log]# uname -a
Linux ovirt01.nix.my.dom 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@ovirt01 log]#
[root@ovirt01 log]# rpm -aq|grep -Ei nfs
libnfsidmap-0.25-17.el7.x86_64
glusterfs-gnfs-3.12.6-1.el7.x86_64
nfs-utils-1.3.0-0.48.el7_4.2.x86_64
[root@ovirt01 log]#



[ ipaclient01 (client) - NO -512 ERROR HERE ]
[root@ipaclient01 yum.repos.d]# uname -a
Linux ipaclient01.nix.my.dom 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@ipaclient01 yum.repos.d]#
[root@ipaclient01 ~]# rpm -aq|grep -Ei nfs
libnfsidmap-0.25-17.el7.x86_64
nfs-utils-1.3.0-0.48.el7_4.2.x86_64
[root@ipaclient01 ~]#



[ nfs01 (server cluster w/ nfs02, NFS Ganesha v2.6) ]
[root@nfs01 ganesha]# uname -a
Linux nfs01.nix.my.dom 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@nfs01 ganesha]#

[ nfs02 (server cluster w/ nfs01, NFS Ganesha v2.6) ]
[root@nfs02 ~]# uname -a
Linux nfs02.nix.my.dom 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@nfs02 ~]#



Host ovirt01, where this reoccurs, has the least amount of memory:

[root@ovirt01 log]# cat /proc/meminfo
MemTotal:        3881708 kB
MemFree:          909704 kB
MemAvailable:    1256024 kB
Buffers:            2116 kB
Cached:           603680 kB
SwapCached:            0 kB
Active:          2302224 kB
Inactive:         407032 kB
Active(anon):    2138108 kB
Inactive(anon):    43776 kB
Active(file):     164116 kB
Inactive(file):   363256 kB
Unevictable:       48608 kB
Mlocked:           48640 kB
SwapTotal:       4063228 kB
SwapFree:        4063228 kB
Dirty:                72 kB
Writeback:             0 kB
AnonPages:       2152068 kB
Mapped:           165560 kB
Shmem:             52264 kB
Slab:              93056 kB
SReclaimable:      47420 kB
SUnreclaim:        45636 kB
KernelStack:       10080 kB
PageTables:        27800 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6004080 kB
Committed_AS:    4986628 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      157608 kB
VmallocChunk:   34359341052 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1695744 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       90048 kB
DirectMap2M:     4104192 kB
[root@ovirt01 log]#

Additional logs (/var/log/messages):

Apr  8 23:46:07 ovirt01 automount[1590]: st_expire: state 1 path /nix.my.dom
Apr  8 23:46:07 ovirt01 automount[1590]: expire_proc: exp_proc = 139973160154880 path /nix.my.dom
Apr  8 23:46:07 ovirt01 automount[1590]: expire_cleanup: got thid 139973160154880 path /nix.my.dom stat 0
Apr  8 23:46:07 ovirt01 automount[1590]: expire_cleanup: sigchld: exp 139973160154880 finished, switching from 2 to 1
Apr  8 23:46:07 ovirt01 automount[1590]: st_ready: st_ready(): state = 2 path /nix.my.dom
Apr  8 23:46:10 ovirt01 ovsdb-server: ovs|10232|stream_ssl|ERR|Private key must be configured to use SSL
Apr  8 23:46:10 ovirt01 ovsdb-server: ovs|10233|stream_ssl|ERR|Certificate must be configured to use SSL
Apr  8 23:46:11 ovirt01 automount[1590]: st_expire: state 1 path /n
Apr  8 23:46:11 ovirt01 automount[1590]: expire_proc: exp_proc = 139973160154880 path /n
Apr  8 23:46:11 ovirt01 automount[1590]: expire_cleanup: got thid 139973160154880 path /n stat 1
Apr  8 23:46:11 ovirt01 automount[1590]: expire_cleanup: sigchld: exp 139973160154880 finished, switching from 2 to 1
Apr  8 23:46:11 ovirt01 automount[1590]: st_ready: st_ready(): state = 2 path /n
Apr  8 23:46:12 ovirt01 systemd: Cannot add dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked.
Apr  8 23:46:12 ovirt01 systemd: Stopping Automounts filesystems on demand...
Apr  8 23:46:12 ovirt01 kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO
Apr  8 23:46:12 ovirt01 automount[1590]: do_notify_state: signal 15
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /misc from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /net from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /n from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: st_prepare_shutdown: state 1 path /n
Apr  8 23:46:12 ovirt01 automount[1590]: expire_proc: exp_proc = 139973157791488 path /n
Apr  8 23:46:12 ovirt01 automount[1590]: st_prepare_shutdown: state 1 path /net
Apr  8 23:46:12 ovirt01 automount[1590]: expire_proc: exp_proc = 139973155428096 path /net
Apr  8 23:46:12 ovirt01 automount[1590]: st_prepare_shutdown: state 1 path /misc
Apr  8 23:46:12 ovirt01 automount[1590]: mount(nfs): nfs: mount failure nfs-c01.nix.my.dom:/n/my.dom on /n/my.dom
Apr  8 23:46:12 ovirt01 automount[1590]: expire_proc: exp_proc = 139973153064704 path /misc
Apr  8 23:46:12 ovirt01 automount[1590]: dev_ioctl_send_fail: token = 96
Apr  8 23:46:12 ovirt01 automount[1590]: failed to mount /n/my.dom
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /nix.my.dom from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /- from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: master_notify_state_change: sig 15 switching /my.dom from 1 to 5
Apr  8 23:46:12 ovirt01 automount[1590]: handle_packet: type = 3
Apr  8 23:46:12 ovirt01 automount[1590]: handle_packet_missing_indirect: token 97, name my.dom, request pid 2917
Apr  8 23:46:12 ovirt01 automount[1590]: dev_ioctl_send_fail: token = 97


Cheers,
TK

Comment 11 jas 2018-04-17 17:45:50 UTC

Hi.
I recently upgraded 2 of our NFS servers from RHEL6 to RHEL7.4.  Our clients were already RHEL7.4.  Everything is running 3.10.0-693.21.1.el7 kernel and the server has the latest nfs-utils.  I'm still receiving the "NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO" on random machines when I'm booting a whole lab of machines.  When I reboot the host, everything is fine.  This was definately not happen with the RHEL6 server.   

Yongcheng Yang above said "What's more, backporting to RHEL-7.4.z of kernel-3.10.0-693.19.1.el7 is available now and 7.3.z is still on the way." as a fix for this issue.  However, I'm still receiving the error.

Please help as I do not have access to #1428939.