Description of problem: When the I/O is happening on multiple cifs mounts (linux clients) and the ctdb node goes down, one of the cifs mount hangs forever. It does not recover even after the failback. Version-Release number of selected component (if applicable): RHS 2.0 How reproducible: Tried 3 -4 times, everytime Steps to Reproduce: 1. Do the ctdb setup on the RHS 2.0 cluster 2. Create and start a replicate or a Dis-rep volume. 3. On 2 linux clients, do the cifs mount using the virtual IP (used in the ctdb setup). 4. Initiate i/o on both the mounts 5. Bring down the node corresponding to the virtual IP used in the mount. snip of /var/log/messages on the hung client Oct 5 02:43:20 dhcp159-194 kernel: CIFS VFS: sends on sock ffff880068f8f940 stuck for 15 seconds Oct 5 02:43:20 dhcp159-194 kernel: CIFS VFS: Error -11 sending data on socket to server Actual results: One of the cifs mount hangs and the i/o is stopped. It does not recover at all. Expected results: The mounts will hang for few seconds during the failover and once the failover is done, the i/o continues. Additional info:
Chris, need your help here... any guesses on what would cause this issue?
The SMB mount hangs on RHEL 6.3 client. Looks like it is due to the bug - https://bugzilla.redhat.com/show_bug.cgi?id=848331
RHS2.0, as shipped, has known issues in GlusterFS handling of POSIX byte-range locks that will cause CTDB to fail. CTDB should not crash or hang in these situations, but the underlying problem appears to be in Gluster's handling of POSIX byte-range lock semantics. See bug 869724. Until the CTDB recovery locking bug is fixed in Gluster, we will not be able to isolate the cause of CTDB-related locking issues.
Assigning to QE for regression testing. Possibly fixed by a GlusterFS patch. See bug 869724.
Bug verified with version glusterfs 3.4.0.12rhs.beta1 built on Jun 28 2013 06:41:37
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html