Created attachment 928070 [details] Patch to maintain a list of rpc_clnt structures for each caller using multiple IPs to access server Description of problem: If an NFS client uses two different virtual IP addresses to access mounts from a glusterfs server (as can happen in load-balancing situations, especially if one filesystem is mounted from one server and a second from another server but then the IP from the second server is failed over to the first server), then client lock attempts fail because the "lock granted" reply uses, as its source IP address, the address the client FIRST used to talk to the server. The client then rejects the "lock granted" message because of the source address mismatch. This is because, in nlm4.c, lock granting looks up the caller's name in nlm_client_list and, if it finds an entry there with a non-null rpc_clnt, it uses that rpc_clnt to send the lock reply, without checking whether its source address matches the address to which the client directed the lock request. Version-Release number of selected component (if applicable): 3.5.2 How reproducible: 100% Steps to Reproduce: 1. Have 2 IP addresses on server. 2. On client, run mount -t nfs Address1:/volume /mountpoint 3. Do something on client that locks/unlocks a file in /mountpoint; lock succeeds. 4. On client, umount /mountpoint 5. On client, mount -t nfs Address2:/volume /mountpoint 6. Repeat step 3: this time, lock times out. 7. Repeating umount/mount/lock-attempt with Address 1, lock succeeds again, but with Address 2, it fails. Actual results: see above. Expected results: Locks should succeed regardless of the address used by the client to mount the filesystem. Additional info: The attached patch resolves the problem in the testing that I've done. It replaces the "rpc_clnt" entry in nlm_client_list with a LIST of rpc_clnt entries. Calls that look up rpc_clnt entries take an extra argument (the socket structure corresponding to the client's call that is being responded to) and search the list for an rpc_clnt entry with the correct source IP. Then, if none is found, a new rpc_clnt entry is created and added to the list. It should be double-checked by someone familiar with GlusterFS internals to make sure it doesn't break anything especially where locking or multithreading is concerned.
Hi Philip, thanks for the patch! Would you like to post it to our Gerrit instance for review? You can follow the simplified developer workflow that is linked from http://www.gluster.org/community/documentation/index.php/Developers to do so. Let us know in here, on gluster-devel or in #gluster-dev on Freenode/IRC if you have any issues. Thanks, Niels
Will try if I get the time (no promises though ... debugging other problems now!)
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.