795386 – fcntl locks self healing

Bug 795386 - fcntl locks self healing

Summary: fcntl locks self healing

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	pre-release
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	---
Assignee:	Junaid
QA Contact:	Shwetha Panduranga
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	817967
TreeView+	depends on / blocked

Reported:	2012-02-20 11:50 UTC by Junaid
Modified:	2013-08-06 22:38 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.4.0
Clone Of:
Environment:
Last Closed:	2013-07-24 17:10:35 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Junaid 2012-02-20 11:50:58 UTC

Description of problem:
On a disconnect the server cleans up the transport which inturn closes the fd's and releases the locks acquired on those fd's by that client. On a reconnect, client just reopens the fd's but doesn't reacquire the locks. The application that had previously acquired the locks still is under the assumption that it is the owner of those locks which might have been granted to other clients(if they request) by the server leading to data corruption.

Version-Release number of selected component (if applicable):
3.3, 3.2

How reproducible:

Steps to Reproduce:
1. Start an application which holds a lock on the file.
2. Disconnect the client and server
3. Reconnect the client and server
4. Run the same application from another client.
  
Actual results:
The second client is also granted locks

Expected results:
The second client should not be granted locks because another application is already holding it.

Additional info:

Comment 1 Anand Avati 2012-02-20 12:45:36 UTC

CHANGE: http://review.gluster.com/2766 (protocol/client,server: fcntl lock self healing.) merged in master by Vijay Bellur (vijay)

Comment 2 Anand Avati 2012-03-07 17:19:39 UTC

CHANGE: http://review.gluster.com/2884 (nfs: fcntl lock self healing.) merged in master by Vijay Bellur (vijay)

Comment 3 Anand Avati 2012-03-10 13:28:03 UTC

CHANGE: http://review.gluster.com/2906 (protocol/client: Register a timer(grace-timer) conditionally.) merged in master by Vijay Bellur (vijay)

Comment 4 Anand Avati 2012-03-19 16:06:05 UTC

CHANGE: http://review.gluster.com/2937 (protocol/client: Handle failures in lock self healing gracefully (part 1).) merged in master by Vijay Bellur (vijay)

Comment 5 Anand Avati 2012-03-19 16:16:30 UTC

CHANGE: http://review.gluster.com/2819 (protocol/client: Handle failures in lock self healing gracefully (part2).) merged in master by Vijay Bellur (vijay)

Comment 6 Shwetha Panduranga 2012-06-05 07:33:23 UTC

verified the fix on 3.3.0qa45. Bug is fixed. 

Steps executed:-
----------------
1. gluster volume create dstore replica 2 192.168.2.35:/export_sdb/dir1 192.168.2.36:/export_sdb/dir2 

2. gluster v set dstore lock-heal on
Set volume successful

3. gluster v set dstore grace-timeout 60
Set volume successful

4. gluster v start dstore
Starting volume dstore has been successful


5. gluster v info
 
Volume Name: dstore
Type: Replicate
Volume ID: a444594f-acaf-4d12-8bca-15b572791ff0
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.2.35:/export_sdb/dir1
Brick2: 192.168.2.36:/export_sdb/dir1
Options Reconfigured:
features.grace-timeout: 60
features.lock-heal: on

6. gluster v status
Status of volume: dstore
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 192.168.2.35:/export_sdb/dir1			24009	Y	13858
Brick 192.168.2.36:/export_sdb/dir1			24009	Y	24768
NFS Server on localhost					38467	Y	13913
Self-heal Daemon on localhost				N/A	Y	13919
NFS Server on 192.168.2.36				38467	Y	24774
Self-heal Daemon on 192.168.2.36			N/A	Y	24780

Mount1 :- 
-------------
1. mount -t glusterfs 192.168.2.35:/dstore /mnt/gfsc1

2. cd /mnt/gfsc1

3. run the script "simple.py ./file1"

Verification:-
--------------
The lock on the file "file1" is granted. Verify this by executing the following steps on the storage node:-

a. gluster v statedump dstore
Volume statedump successful

b. grep -i "ACTIVE" /tmp/export_sdb-dir1.13858.dump 
conn.1.bound_xl./export_sdb/dir1.active_size=2
[conn.1.bound_xl./export_sdb/dir1.active.1]
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 6113, owner=34d7de843ea5e8c3, transport=0x1f4ff30, , granted at Tue Jun  5 17:20:14 2012

4. iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination      

5.  iptables -A INPUT -p tcp --sport 24009 -j DROP ; iptables -L 
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  anywhere             anywhere            tcp spt:24009 

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination      

Note:- Adding the above rule will make machine(on which mount1 is present) to drop packets coming from port 24009    

Mount2 :-
-------------
1. mount -t glusterfs 192.168.2.35:/dstore /mnt/gfsc1

2. cd /mnt/gfsc1

3. run the script "simple.py ./file1"

The lock on the file "file1" should not be granted 

Verification:-
---------------
a. gluster v statedump dstore
Volume statedump successful

# The active lock is from mount1 application
b. grep -i "ACTIVE" /tmp/export_sdb-dir1.13858.dump 
[conn.1.bound_xl./export_sdb/dir1.active.1]
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 6113, owner=34d7de843ea5e8c3, transport=0x1f4ff30, , granted at Tue Jun  5 17:20:14 2012

#Blocked lock is from mount2 application
c. grep -i "BLOCK" /tmp/export_sdb-dir1.13858.dump 
posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 23430, owner=bb22e9b6327bc900, transport=0x1f4c8f0, , blocked at Tue Jun  5 17:22:32 2012

Actual Result:-
----------------
Within the grace timeout period, the mount2 application is blocked.

After the grace timeout period, the mount2 application granted the lock. 


posixlk.posixlk[1](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 23430, owner=bb22e9b6327bc900, transport=0x1f4c8f0, , blocked at Tue Jun  5 17:22:32 2012

Note You need to log in before you can comment on or make changes to this bug.