+++ This bug was initially created as a clone of Bug #902953 +++ (This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla). --- Additional comment from Amar Tumballi on 2013-02-14 04:37:39 EST --- Thanks for the report, but one thing is, if a node is (or lot of nodes) are going down and coming back up, isn't it natural to have the operations fail as the filesystem is network based? --- Additional comment from John Morrissey on 2013-02-15 11:04:24 EST --- Sure, I would expect the operations to fail *while* the Gluster servers are being restarted, but after the servers are running, I would also expect Gluster clients to gracefully reconnect. As the logs above show, they clearly do not do so after several minutes, or (in our experience) even after several hours. --- Additional comment from John Morrissey on 2013-04-01 12:28:12 EDT --- Looks like this isn't limited to native Gluster clients. Some of our nodes mount a Gluster instance via NFS. We noticed that these clients can successfully mount the volume, but any I/O to them returns EIO: [jwm@elided:pts/13 ~> ls -l /path/to/gluster ls: /path/to/gluster: Input/output error The gluster<->nfs process on the gluster server: root 27902 12.1 0.7 406064 179052 ? Ssl Jan22 11601:30 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /tmp/bf018af881a58acb0efa7cefadd6fb1d.socket is spinning on a file descriptor that probably used to be connected to a gluster brick, but is now open to /etc/services: -bash-4.1$ sudo strace -p 27902 Process 27902 attached - interrupt to quit epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=19, u64=107374182419}}}, 258, 4294967295) = 1 getsockopt(19, SOL_SOCKET, SO_ERROR, [182050606976860271], [4]) = 0 shutdown(19, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is not connected) readv(19, [{"\0\0\0\0", 4}], 1) = 0 epoll_ctl(3, EPOLL_CTL_DEL, 19, NULL) = 0 close(19) = 0 epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=19, u64=107374182419}}}, 258, 4294967295) = 1 getsockopt(19, SOL_SOCKET, SO_ERROR, [190986337975795823], [4]) = 0 shutdown(19, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is not connected) readv(19, [{"\0\0\0\0", 4}], 1) = 0 epoll_ctl(3, EPOLL_CTL_DEL, 19, NULL) = 0 close(19) = 0 epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=19, u64=107374182419}}}, 258, 4294967295) = 1 -bash-4.1$ sudo lsof -p 27902 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME [...] glusterfs 27902 root 19u REG 253,0 640999 3801126 /etc/services --- Additional comment from Kaleb KEITHLEY on 2015-10-22 11:46:38 EDT --- because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.
We also met this problem in glusterfs 3.7.6, so clone from Bug 902953 to reopen it.
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.