Bug 1554053 - 4.0 clients may fail to convert iatt in dict when recieving the same from older (< 4.0) servers
Summary: 4.0 clients may fail to convert iatt in dict when recieving the same from old...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1554018 1554077
TreeView+ depends on / blocked
 
Reported: 2018-03-11 04:13 UTC by Shyamsundar
Modified: 2018-06-20 18:01 UTC (History)
3 users (show)

Fixed In Version: glusterfs-v4.1.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1554018
: 1554077 (view as bug list)
Environment:
Last Closed: 2018-06-20 18:01:56 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Shyamsundar 2018-03-11 04:13:18 UTC
+++ This bug was initially created as a clone of Bug #1554018 +++

Description of problem:

Based on review comments in https://review.gluster.org/#/c/19690 as follows,

Du: Do we support the case of older bricks/servers and newer clients? If yes, we would face a similar problem. The only difference would be clients expecting a newer iatt when an old iatt is sent.

Shyam: The problem is similar to the disadvantage/improvement that I had mentioned in the review comments. If a client has the newer iatt implementation and connects to an older server, the mapping will fail again. The reasons this is not a major concern yet is as follows.

Our upgrade procedure and guide states servers first and then clients. So the environment will get servers upgraded first and then the clients.

Now, a server would be upgraded in a rolling fashion, so say 1 server is upgraded, then its service clients (quotad, selfheald, others...) would be the latest, but connect with the older protocol version to the older servers.

Till DHT is not in the stack (as this problem is localized to the iatt in the dict that DHT requests), we are fine, if any of these service daemons use DHT in their graph, it may start facing the same/similar problem (rebalance comes to mind, but we may need to add that to the upgrade guide).

The fix going forward (possibly we need this in 4.1) is to apply a similar fix to the client stack as well.

Resolution:

The client protocol xlator also needs a translation of iatt in dict to actual runtime definition of iatt, based on the protocol dialect in question. This change should be similar to the change in the review mentioned above.

--- Additional comment from Shyamsundar on 2018-03-10 21:14:15 EST ---

Protocol end points (NFS, SAMBA/GFAPI) may get started on the upgraded servers, which involves DHT in the graph, and can have issues if this is not addressed for the clients as well.

Comment 1 Worker Ant 2018-03-11 04:14:33 UTC
REVIEW: https://review.gluster.org/19695 (protocol: Fix 4.0 client, parsing older iatt in dict) posted (#1) for review on master by Shyamsundar Ranganathan

Comment 2 Worker Ant 2018-03-11 11:24:14 UTC
COMMIT: https://review.gluster.org/19695 committed in master by "Shyamsundar Ranganathan" <srangana> with a commit message- protocol: Fix 4.0 client, parsing older iatt in dict

In a mixed mode cluster involving 4.0 and older 3.x bricks, if
clients are newer, then the iatt encoded in the dictionary can be
of the older iatt format, which a newer client will map incorrectly
to the newer structure.

This causes failures in FOPs that depend on this iatt for some
functionality (seen in mkdir operations failing as EIO, when DHT
hits its internal setxattr call).

The fix provided is to convert the iatt in the dict, based on which
RPC version is used to communicate with the server.

IOW, this is the reverse of change in commit "b966c7790e"

Tested using a mixed mode cluster (i.e bricks in 3.12 and 4.0 versions)
and a mixed set of clients, 3.12 and 4.0 clients.

There is no regression test provided, as this needs a mixed mode cluster
to test and validate.

Change-Id: I454e54651ca836b9f7c28f45f51d5956106aefa9
BUG: 1554053
Signed-off-by: ShyamsundarR <srangana>

Comment 3 Shyamsundar 2018-06-20 18:01:56 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v4.1.0, please open a new bug report.

glusterfs-v4.1.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-June/000102.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.