Bug 1131271 - Lock replies use wrong source IP if client access server via 2 different virtual IPs [patch attached]
Summary: Lock replies use wrong source IP if client access server via 2 different virt...
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.5.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-18 20:36 UTC by Philip Spencer
Modified: 2016-06-17 15:58 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-17 15:58:07 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Patch to maintain a list of rpc_clnt structures for each caller using multiple IPs to access server (7.17 KB, patch)
2014-08-18 20:36 UTC, Philip Spencer
no flags Details | Diff

Description Philip Spencer 2014-08-18 20:36:17 UTC
Created attachment 928070 [details]
Patch to maintain a list of rpc_clnt structures for each caller using multiple IPs to access server

Description of problem:

If an NFS client uses two different virtual IP addresses to access mounts from a glusterfs server (as can happen in load-balancing situations, especially if one filesystem is mounted from one server and a second from another server but then the IP from the second server is failed over to the first server), then client lock attempts fail because the "lock granted" reply uses, as its source IP address, the address the client FIRST used to talk to the server. The client then rejects the "lock granted" message because of the source address mismatch.

This is because, in nlm4.c, lock granting looks up the caller's name in  nlm_client_list and, if it finds an entry there with a non-null rpc_clnt,
it uses that rpc_clnt to send the lock reply, without checking whether its source address matches the address to which the client directed the lock request.

Version-Release number of selected component (if applicable): 3.5.2


How reproducible: 100%


Steps to Reproduce:
1. Have 2 IP addresses on server.
2. On client, run mount -t nfs Address1:/volume /mountpoint
3. Do something on client that locks/unlocks a file in /mountpoint; lock succeeds.
4. On client, umount /mountpoint
5. On client, mount -t nfs Address2:/volume /mountpoint
6. Repeat step 3: this time, lock times out.
7. Repeating umount/mount/lock-attempt with Address 1, lock succeeds again,
   but with Address 2, it fails. 

Actual results: see above.


Expected results: Locks should succeed regardless of the address used by the
client to mount the filesystem.

Additional info:

The attached patch resolves the problem in the testing that I've done. It replaces the "rpc_clnt" entry in nlm_client_list with a LIST of rpc_clnt entries. Calls that look up rpc_clnt entries take an extra argument (the socket structure corresponding to the client's call that is being responded to) and
search the list for an rpc_clnt entry with the correct source IP. Then, if none is found, a new rpc_clnt entry is created and added to the list.

It should be double-checked by someone familiar with GlusterFS internals to make sure it doesn't break anything especially where locking or multithreading is concerned.

Comment 1 Niels de Vos 2014-08-28 17:22:10 UTC
Hi Philip,

thanks for the patch! Would you like to post it to our Gerrit instance for review? You can follow the simplified developer workflow that is linked from http://www.gluster.org/community/documentation/index.php/Developers to do so.

Let us know in here, on gluster-devel or in #gluster-dev on Freenode/IRC if you have any issues.

Thanks,
Niels

Comment 2 Philip Spencer 2014-09-18 21:08:50 UTC
Will try if I get the time (no promises though ... debugging other problems now!)

Comment 3 Niels de Vos 2016-06-17 15:58:07 UTC
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.