Bug 1063615

Summary: DHT: REBALANCE- Lock semantics are not preserved after rebalance migration
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: shylesh <shmohan>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED DEFERRED QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: nbalacha, nlevinki, rgowdapp, spalai, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: dht-data-loss
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1286061 (view as bug list) Environment:
Last Closed: 2015-11-27 10:28:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1286061    

Description shylesh 2014-02-11 06:57:46 UTC
Description of problem:
A file is exclusively locked and if this file migrates as part of rebalance migration , after migration lock semantics will not be preserver on the file.

Version-Release number of selected component (if applicable):
3.4.0.59rhs-1.el6rhs.x86_64

How reproducible:
always 

Steps to Reproduce:
1. created a dist-rep volume of 6x2 config
2. create a file on the mount point
3. wrote a sample c program which does write lock  on the file
        fl.l_type = F_WRLCK;
        fl.l_start = 0;
        fl.l_whence = SEEK_SET;
        fl.l_len = 0;
        fl.l_pid = getpid();

        int ret = fcntl (fd, F_SETLKW, &fl);
        if (ret == -1)  {
                printf ("%s", strerror(errno));
                exit (1);
        }
        while (1) {
  
        }   /* stay here but lock is not released */
  
4. open another window go to same mount point and run the same program again to hold the write lock, this process will be waiting  at fcntl because of F_SETLKW, 

5. Terminate the running process created at step 4

6. Now add-brick and rebalance , make sure that above said file got migrated

7. repeat step 4 from different mount point



Actual results:
in step 7 lock will be successfully aquired since lock held in step 3 is not yet released

Expected results:
at step 7 process should wait for the lock till the first lock is released

Additional info:

I verifed with write-behind off and add-brick to make sure that graph change is invalidating locks but graph changes preserves the lock semantics but its lost after migration.

Comment 2 Raghavendra G 2014-02-11 11:48:55 UTC
Its an existing behaviour. Rebalance process doesn't migrate the locks. Locks are migrated only by gfapi or fuse-bridge. since between steps 6 and 7 no operations are performed from client holding the lock, locks are not migrated to new brick. So, when a new process tries to acquire locks on the file which is moved to new brick, locks are granted.

Comment 3 Raghavendra G 2014-02-12 05:24:19 UTC
The problem here is when we should migrate open fds and locks to new graph. Viewed from a single client perspective, its good enough if we do it when we get a new operation on mount point (since then there is no need). However, when locking from multiple clients is considered (such as this case), its better that we do migration ASAP (which is on receiving CHILD_UP event in gfapi/fuse-bridge). However, we need to be aware of races b/w /dev/fuse reader thread and poll thread since notifications (and hence proposed migration) are received on poll thread.

Comment 6 Susant Kumar Palai 2015-11-27 10:28:24 UTC
Cloning this bug in 3.1. To be fixed in future release.