863338 – CTDB: cifs mount hangs during the ctdb failover in case of multiple cifs mounts

Bug 863338 - CTDB: cifs mount hangs during the ctdb failover in case of multiple cifs mounts

Summary: CTDB: cifs mount hangs during the ctdb failover in case of multiple cifs mounts

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Christopher R. Hertel
QA Contact:	Sudhir D
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-10-05 07:23 UTC by Ujjwala
Modified:	2014-09-29 00:21 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:33:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ujjwala 2012-10-05 07:23:01 UTC

Description of problem:
When the I/O is happening on multiple cifs mounts (linux clients) and the ctdb node goes down, one of the cifs mount hangs forever. It does not recover even after the failback.


Version-Release number of selected component (if applicable):
RHS 2.0

How reproducible:
Tried 3 -4 times, everytime

Steps to Reproduce:
1. Do the ctdb setup on the RHS 2.0 cluster
2. Create and start a replicate or a Dis-rep volume.
3. On 2 linux clients, do the cifs mount using the virtual IP (used in the ctdb setup).
4. Initiate i/o on both the mounts
5. Bring down the node corresponding to the virtual IP used in the mount.

snip of /var/log/messages on the hung client
Oct  5 02:43:20 dhcp159-194 kernel: CIFS VFS: sends on sock ffff880068f8f940 stuck for 15 seconds
Oct  5 02:43:20 dhcp159-194 kernel: CIFS VFS: Error -11 sending data on socket to server

  
Actual results:
One of the cifs mount hangs and the i/o is stopped. It does not recover at all.


Expected results:
The mounts will hang for few seconds during the failover and once the failover is done, the i/o continues.


Additional info:

Comment 2 Amar Tumballi 2012-10-06 15:17:50 UTC

Chris, need your help here... any guesses on what would cause this issue?

Comment 3 Ujjwala 2012-11-05 12:58:45 UTC

The SMB mount hangs on RHEL 6.3 client. Looks like it is due to the bug - https://bugzilla.redhat.com/show_bug.cgi?id=848331

Comment 5 Christopher R. Hertel 2012-11-09 23:33:38 UTC

RHS2.0, as shipped, has known issues in GlusterFS handling of POSIX byte-range locks that will cause CTDB to fail.  CTDB should not crash or hang in these situations, but the underlying problem appears to be in Gluster's handling of POSIX byte-range lock semantics.  See bug 869724.

Until the CTDB recovery locking bug is fixed in Gluster, we will not be able to isolate the cause of CTDB-related locking issues.

Comment 6 Christopher R. Hertel 2012-11-30 02:11:57 UTC

Assigning to QE for regression testing.
Possibly fixed by a GlusterFS patch.  See bug 869724.

Comment 7 surabhi 2013-07-03 09:19:07 UTC

Bug verified with version 
glusterfs 3.4.0.12rhs.beta1 built on Jun 28 2013 06:41:37

Comment 10 Scott Haines 2013-09-23 22:33:28 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.