Bug 1544366

Summary: Rolling upgrade to 4.0 is broken
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: protocolAssignee: Anoop C S <anoopcs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.0CC: anoopcs, bugs, pkarampu, srangana
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1544699 (view as bug list) Environment:
Last Closed: 2018-03-15 11:27:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1544699    
Bug Blocks:    

Description Ravishankar N 2018-02-12 10:00:20 UTC
Description of problem:
Was trying to test https://review.gluster.org/#/c/19538/ when I found that rolling upgrade from glusterfs-3.13 to the yet to be released glusterfs 4.0 is broken.  In a  2 node setup, when one node is upgraded, clients mounted on each node can only see the local bricks and not the ones on the other node. I see the following errors in the client logs.:

E [MSGID: 114044] [client-handshake.c:1093:client_setvolume_cbk] 0-testvol-client-1: SETVOLUME on remote-host failed: lock state version not supplied [Invalid argument]


Steps to Reproduce:
1. Create a 2 node 1x2 volume and mount locally on each node, all on glustefs-3.13.
2. Upgrade one of the nodes to 4.0 branch
3. Clients can see only local bricks.

Comment 1 Anoop C S 2018-02-12 11:42:17 UTC
I think I failed to handle the scenario where clients are upgraded before servers.

RCA:(new clients[>=4.0] and old servers[<=3.13])
SETVOLUME request from client post-upgrade does not contain its lk-version in dictionary that is being passed onto server(https://review.gluster.org/#/c/12363/). This means that the server side check for "clnt-lk-version" inside the received dictionary would fail and error is returned back to new client.

Comment 2 Ravishankar N 2018-02-12 14:07:17 UTC
(In reply to Anoop C S from comment #1)
> I think I failed to handle the scenario where clients are upgraded before
> servers.
> 
FWIW, it is not just the mount but things like self-heal-daemon and glfsheal (gfapi based program that is used to display 'heal info') were also affected.

Comment 3 Worker Ant 2018-02-13 10:16:48 UTC
REVIEW: https://review.gluster.org/19560 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#1) for review on master by Anoop C S

Comment 4 Worker Ant 2018-02-13 10:35:15 UTC
REVISION POSTED: https://review.gluster.org/19560 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#2) for review on master by Anoop C S

Comment 5 Worker Ant 2018-02-19 05:46:31 UTC
REVIEW: https://review.gluster.org/19582 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#1) for review on release-4.0 by Anoop C S

Comment 6 Worker Ant 2018-02-19 13:53:10 UTC
COMMIT: https://review.gluster.org/19582 committed in release-4.0 by "Anoop C S" <anoopcs> with a commit message- protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure

With https://review.gluster.org/#/c/12363/ being merged, we no longer
send client's lk-version to server side and the corresponding check on
server is also removed. But when clients are upgraded prior to servers,
the check for lk-version at server side fails and is reported back to
clients resulting in disconnection.

Since we don't have lock-recovery (lk-version and grace-timeout) logic
anymore in code base our best bet would be to add client's default
lk-version i.e, 1, into the dictionary just to make server side check
pass and continue with remaining SETVOLUME operations.

Change-Id: I441b67bd271d1e9ba9a7c08703e651c7a6bd945b
BUG: 1544366
Signed-off-by: Anoop C S <anoopcs>
(cherry picked from commit c096bec4ec3f3ac33cc0787c60978944792e074e)

Comment 7 Shyamsundar 2018-03-15 11:27:04 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report.

glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html
[2] https://www.gluster.org/pipermail/gluster-users/