Description of problem: On a disconnect the server cleans up the transport which inturn closes the fd's and releases the locks acquired on those fd's by that client. On a reconnect, client just reopens the fd's but doesn't reacquire the locks. The application that had previously acquired the locks still is under the assumption that it is the owner of those locks which might have been granted to other clients(if they request) by the server leading to data corruption. Version-Release number of selected component (if applicable): 3.3, 3.2 How reproducible: Steps to Reproduce: 1. Start an application which holds a lock on the file. 2. Disconnect the client and server 3. Reconnect the client and server 4. Run the same application from another client. Actual results: The second client is also granted locks Expected results: The second client should not be granted locks because another application is already holding it. Additional info:
CHANGE: http://review.gluster.com/2766 (protocol/client,server: fcntl lock self healing.) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/2884 (nfs: fcntl lock self healing.) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/2906 (protocol/client: Register a timer(grace-timer) conditionally.) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/2937 (protocol/client: Handle failures in lock self healing gracefully (part 1).) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/2819 (protocol/client: Handle failures in lock self healing gracefully (part2).) merged in master by Vijay Bellur (vijay)
verified the fix on 3.3.0qa45. Bug is fixed. Steps executed:- ---------------- 1. gluster volume create dstore replica 2 192.168.2.35:/export_sdb/dir1 192.168.2.36:/export_sdb/dir2 2. gluster v set dstore lock-heal on Set volume successful 3. gluster v set dstore grace-timeout 60 Set volume successful 4. gluster v start dstore Starting volume dstore has been successful 5. gluster v info Volume Name: dstore Type: Replicate Volume ID: a444594f-acaf-4d12-8bca-15b572791ff0 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 192.168.2.35:/export_sdb/dir1 Brick2: 192.168.2.36:/export_sdb/dir1 Options Reconfigured: features.grace-timeout: 60 features.lock-heal: on 6. gluster v status Status of volume: dstore Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.2.35:/export_sdb/dir1 24009 Y 13858 Brick 192.168.2.36:/export_sdb/dir1 24009 Y 24768 NFS Server on localhost 38467 Y 13913 Self-heal Daemon on localhost N/A Y 13919 NFS Server on 192.168.2.36 38467 Y 24774 Self-heal Daemon on 192.168.2.36 N/A Y 24780 Mount1 :- ------------- 1. mount -t glusterfs 192.168.2.35:/dstore /mnt/gfsc1 2. cd /mnt/gfsc1 3. run the script "simple.py ./file1" Verification:- -------------- The lock on the file "file1" is granted. Verify this by executing the following steps on the storage node:- a. gluster v statedump dstore Volume statedump successful b. grep -i "ACTIVE" /tmp/export_sdb-dir1.13858.dump conn.1.bound_xl./export_sdb/dir1.active_size=2 [conn.1.bound_xl./export_sdb/dir1.active.1] posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 6113, owner=34d7de843ea5e8c3, transport=0x1f4ff30, , granted at Tue Jun 5 17:20:14 2012 4. iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination 5. iptables -A INPUT -p tcp --sport 24009 -j DROP ; iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination DROP tcp -- anywhere anywhere tcp spt:24009 Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Note:- Adding the above rule will make machine(on which mount1 is present) to drop packets coming from port 24009 Mount2 :- ------------- 1. mount -t glusterfs 192.168.2.35:/dstore /mnt/gfsc1 2. cd /mnt/gfsc1 3. run the script "simple.py ./file1" The lock on the file "file1" should not be granted Verification:- --------------- a. gluster v statedump dstore Volume statedump successful # The active lock is from mount1 application b. grep -i "ACTIVE" /tmp/export_sdb-dir1.13858.dump [conn.1.bound_xl./export_sdb/dir1.active.1] posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 6113, owner=34d7de843ea5e8c3, transport=0x1f4ff30, , granted at Tue Jun 5 17:20:14 2012 #Blocked lock is from mount2 application c. grep -i "BLOCK" /tmp/export_sdb-dir1.13858.dump posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 23430, owner=bb22e9b6327bc900, transport=0x1f4c8f0, , blocked at Tue Jun 5 17:22:32 2012 Actual Result:- ---------------- Within the grace timeout period, the mount2 application is blocked. After the grace timeout period, the mount2 application granted the lock. posixlk.posixlk[1](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 23430, owner=bb22e9b6327bc900, transport=0x1f4c8f0, , blocked at Tue Jun 5 17:22:32 2012