Created attachment 1045126 [details] network communication Description of problem: NFS is showing Remote I/O errors. When disable and re-enable acl, it works. Version-Release number of selected component (if applicable): CentOS Linux release 7.1.1503 (Core) Kernel: Linux 4.0.5-1.el7.centos.x86_64 GlusterFS: glusterfs 3.7.2 built on Jun 24 2015 11:51:59 How reproducible: Always Steps to Reproduce: 1. create and run distributed volume 2. mount gluster volume with nfs v3 3. try to touch a file Actual results: [root@ip-172-31-26-220 ~]# touch /nfs/zzzzz touch: cannot touch ‘/nfs/zzzzz’: Remote I/O error [root@ip-172-31-26-220 ~]# gluster volume set gv0 nfs.acl off volume set: success [root@ip-172-31-26-220 ~]# touch /nfs/zzzzz [root@ip-172-31-26-220 ~]# gluster volume set gv0 nfs.acl on volume set: success [root@ip-172-31-26-220 ~]# touch /nfs/zzzzz2 [root@ip-172-31-26-220 ~]# Expected results: [root@ip-172-31-26-220 ~]# touch /nfs/zzzzz [root@ip-172-31-26-220 ~]# Additional info: # mount -vvv -t nfs 127.0.0.1:/gv0 /nfs mount.nfs: timeout set for Wed Jul 1 14:58:09 2015 mount.nfs: trying text-based options 'vers=4,addr=127.0.0.1,clientaddr=127.0.0.1' mount.nfs: mount(2): Protocol not supported mount.nfs: trying text-based options 'addr=127.0.0.1' mount.nfs: prog 100003, trying vers=3, prot=6 mount.nfs: trying 127.0.0.1 prog 100003 vers 3 prot TCP port 2049 mount.nfs: prog 100005, trying vers=3, prot=17 mount.nfs: portmap query retrying: RPC: Program not registered mount.nfs: prog 100005, trying vers=3, prot=6 mount.nfs: trying 127.0.0.1 prog 100005 vers 3 prot TCP port 38465 mount options: 127.0.0.1:/gv0 on /nfs type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=127.0.0.1) nfs.log: [2015-07-01 13:44:22.441442] W [socket.c:642:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available) [2015-07-01 13:44:23.797116] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (15), shutting down [2015-07-01 13:45:41.267495] I [MSGID: 100030] [glusterfsd.c:2301:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.7.2 (args: /usr/local/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/e09d4d8e1a2e7e95fc758343486c0d55.socket) [2015-07-01 13:45:41.310041] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-07-01 13:45:41.435880] I [rpcsvc.c:2213:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 16 [2015-07-01 13:45:41.442879] W [MSGID: 112153] [mount3.c:3910:mnt3svc_init] 0-nfs-mount: Exports auth has been disabled! [2015-07-01 13:45:41.448427] W [MSGID: 112178] [nlm4.c:2534:nlm4svc_init] 0-nfs-NLM: unable to get pid of rpc.statd from /var/run/rpc.statd.pid [2015-07-01 13:45:41.498169] I [rpc-drc.c:694:rpcsvc_drc_init] 0-rpc-service: DRC is turned OFF [2015-07-01 13:45:41.498190] I [MSGID: 112110] [nfs.c:1494:init] 0-nfs: NFS service started [2015-07-01 13:45:41.501042] W [graph.c:357:_log_if_unknown_option] 0-nfs-server: option 'rpc-auth.auth-glusterfs' is not recognized [2015-07-01 13:45:41.501103] W [graph.c:357:_log_if_unknown_option] 0-nfs-server: option 'rpc-auth-allow-insecure' is not recognized [2015-07-01 13:45:41.501157] W [graph.c:357:_log_if_unknown_option] 0-nfs-server: option 'transport-type' is not recognized [2015-07-01 13:45:41.501222] I [MSGID: 114020] [client.c:2118:notify] 0-gv0-client-0: parent translators are ready, attempting connect on transport [2015-07-01 13:45:41.504008] I [MSGID: 114020] [client.c:2118:notify] 0-gv0-client-1: parent translators are ready, attempting connect on transport Final graph: +------------------------------------------------------------------------------+ 1: volume gv0-client-0 2: type protocol/client 3: option ping-timeout 42 4: option remote-host 172.31.25.38 5: option remote-subvolume /pool0/brick0 6: option transport-type socket 7: option username 5441a511-4d11-4a5e-852b-210895b5ec75 8: option password 82e3aa21-63a4-47d1-8540-9ed797076f19 9: option send-gids true 10: end-volume 11: 12: volume gv0-client-1 13: type protocol/client 14: option ping-timeout 42 15: option remote-host 172.31.26.220 16: option remote-subvolume /pool0/brick0 17: option transport-type socket 18: option username 5441a511-4d11-4a5e-852b-210895b5ec75 19: option password 82e3aa21-63a4-47d1-8540-9ed797076f19 20: option send-gids true 21: end-volume 22: 23: volume gv0-dht 24: type cluster/distribute 25: subvolumes gv0-client-0 gv0-client-1 26: end-volume 27: 28: volume gv0-write-behind 29: type performance/write-behind 30: subvolumes gv0-dht 31: end-volume 32: 33: volume gv0 34: type debug/io-stats 35: option latency-measurement off 36: option count-fop-hits off 37: subvolumes gv0-write-behind 38: end-volume 39: 40: volume nfs-server 41: type nfs/server 42: option rpc-auth.auth-glusterfs on 43: option rpc-auth.auth-unix on 44: option rpc-auth.auth-null on 45: option rpc-auth.ports.insecure on 46: option rpc-auth-allow-insecure on 47: option transport-type socket 48: option transport.socket.listen-port 2049 49: option nfs.dynamic-volumes on 50: option nfs.nlm on 51: option nfs.drc off 52: option rpc-auth.addr.gv0.allow * 53: option nfs3.gv0.volume-id 45f15b5d-5b89-49d8-88bf-8049ad1609da 54: option nfs.acl on 55: subvolumes gv0 56: end-volume 57: +------------------------------------------------------------------------------+ [2015-07-01 13:45:41.506752] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2015-07-01 13:45:41.512862] E [MSGID: 114058] [client-handshake.c:1525:client_query_portmap_cbk] 0-gv0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2015-07-01 13:45:41.512907] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-gv0-client-1: disconnected from gv0-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2015-07-01 13:45:41.513226] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-gv0-client-0: changing port to 49152 (from 0) [2015-07-01 13:45:41.516482] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-gv0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-01 13:45:41.517000] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-gv0-client-0: Connected to gv0-client-0, attached to remote volume '/pool0/brick0'. [2015-07-01 13:45:41.517013] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-gv0-client-0: Server and Client lk-version numbers are not same, reopening the fds [2015-07-01 13:45:41.517847] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-0: Server lk version = 1 [2015-07-01 13:45:41.518444] I [dht-layout.c:697:dht_layout_normalize] 0-gv0-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0 [2015-07-01 13:45:41.518471] W [MSGID: 109005] [dht-selfheal.c:1778:dht_selfheal_directory] 0-gv0-dht: Directory selfheal failed: 1 subvolumes down.Not fixing. path = /, gfid = 00000000-0000-0000-0000-000000000001 [2015-07-01 13:45:45.300071] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-gv0-client-1: changing port to 49152 (from 0) [2015-07-01 13:45:45.302695] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-01 13:45:45.302910] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-gv0-client-1: Connected to gv0-client-1, attached to remote volume '/pool0/brick0'. [2015-07-01 13:45:45.302923] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-07-01 13:45:41.517847] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-0: Server lk version = 1 [2015-07-01 13:45:41.518444] I [dht-layout.c:697:dht_layout_normalize] 0-gv0-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0 [2015-07-01 13:45:41.518471] W [MSGID: 109005] [dht-selfheal.c:1778:dht_selfheal_directory] 0-gv0-dht: Directory selfheal failed: 1 subvolumes down.Not fixing. path = /, gfid = 00000000-0000-0000-0000-000000000001 [2015-07-01 13:45:45.300071] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-gv0-client-1: changing port to 49152 (from 0) [2015-07-01 13:45:45.302695] I [MSGID: 114057] [client-handshake.c:1438:select_server_supported_programs] 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-07-01 13:45:45.302910] I [MSGID: 114046] [client-handshake.c:1214:client_setvolume_cbk] 0-gv0-client-1: Connected to gv0-client-1, attached to remote volume '/pool0/brick0'. [2015-07-01 13:45:45.302923] I [MSGID: 114047] [client-handshake.c:1225:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-07-01 13:45:45.303030] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1 [2015-07-01 14:56:09.740053] W [rpcsvc.c:278:rpcsvc_program_actor] 0-rpc-service: RPC program version not available (req 100003 4) for 127.0.0.1:687 [2015-07-01 14:56:09.740087] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-07-01 14:57:16.723283] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-07-01 14:57:16.737239] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-07-01 14:57:16.737427] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-07-01 14:57:16.737572] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-07-01 14:57:16.738848] I [MSGID: 112108] [nfs.c:1329:nfs_reconfigure_state] 0-nfs: ACL is manually disabled [2015-07-01 14:57:16.739231] I [rpc-drc.c:694:rpcsvc_drc_init] 0-rpc-service: DRC is turned OFF [2015-07-01 14:57:16.739306] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2015-07-01 14:57:16.739330] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2015-07-01 14:57:16.739349] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2015-07-01 14:57:19.523509] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-07-01 14:57:22.145386] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-07-01 14:57:22.159005] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-07-01 14:57:22.159263] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-07-01 14:57:22.159446] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-07-01 14:57:22.160306] I [MSGID: 112108] [nfs.c:1329:nfs_reconfigure_state] 0-nfs: ACL is manually enabled [2015-07-01 14:57:22.160623] I [rpc-drc.c:694:rpcsvc_drc_init] 0-rpc-service: DRC is turned OFF [2015-07-01 14:57:22.160688] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2015-07-01 14:57:22.160712] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2015-07-01 14:57:22.160730] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
I am experiencing a similar issue. For me, any open() or getxaddr() operation on a gluster NFS mount fails with an EREMOTEIO. The interesting thing is, operations succeed if and only if the file already exists. For example: [root@gluster-2 ~]$ mount -t nfs gluster-1.local:/engine /mnt [root@gluster-2 ~]$ cat > /mnt/test -bash: /mnt/test: Remote I/O error [root@gluster-2 ~]$ cat > /mnt/test asdf ^D [root@gluster-2 ~]$ Setting nfs.acl on the volume to off has fixed this issue for me. I have tried a myriad other things all to no avail. This is a blocking bug and, had it not been for my stumbling upon this bug report, I would not be able to use Gluster's NFS exports.
Please note that mounting NFS over localhost on a Gluster server is not expected to work flawlessly. There are several services (mainly locking and ACLs) that have conflicts between the kernel component (needed for the NFS-client) and the Gluster component (needed for the NFS-server). If you need to use Gluster volumes on a Gluster server, you really should mount them with FUSE and not with NFS. Could you confirm that this problem only occurs with mounting over NFS on Gluster servers that have the NFS-server running?
I can no longer reproduce bug with gluster 3.7.11
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.