Bug 1544366 - Rolling upgrade to 4.0 is broken
Summary: Rolling upgrade to 4.0 is broken
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: 4.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Anoop C S
QA Contact:
URL:
Whiteboard:
Depends On: 1544699
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-12 10:00 UTC by Ravishankar N
Modified: 2018-03-15 11:27 UTC (History)
4 users (show)

Fixed In Version: glusterfs-4.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1544699 (view as bug list)
Environment:
Last Closed: 2018-03-15 11:27:04 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Ravishankar N 2018-02-12 10:00:20 UTC
Description of problem:
Was trying to test https://review.gluster.org/#/c/19538/ when I found that rolling upgrade from glusterfs-3.13 to the yet to be released glusterfs 4.0 is broken.  In a  2 node setup, when one node is upgraded, clients mounted on each node can only see the local bricks and not the ones on the other node. I see the following errors in the client logs.:

E [MSGID: 114044] [client-handshake.c:1093:client_setvolume_cbk] 0-testvol-client-1: SETVOLUME on remote-host failed: lock state version not supplied [Invalid argument]


Steps to Reproduce:
1. Create a 2 node 1x2 volume and mount locally on each node, all on glustefs-3.13.
2. Upgrade one of the nodes to 4.0 branch
3. Clients can see only local bricks.

Comment 1 Anoop C S 2018-02-12 11:42:17 UTC
I think I failed to handle the scenario where clients are upgraded before servers.

RCA:(new clients[>=4.0] and old servers[<=3.13])
SETVOLUME request from client post-upgrade does not contain its lk-version in dictionary that is being passed onto server(https://review.gluster.org/#/c/12363/). This means that the server side check for "clnt-lk-version" inside the received dictionary would fail and error is returned back to new client.

Comment 2 Ravishankar N 2018-02-12 14:07:17 UTC
(In reply to Anoop C S from comment #1)
> I think I failed to handle the scenario where clients are upgraded before
> servers.
> 
FWIW, it is not just the mount but things like self-heal-daemon and glfsheal (gfapi based program that is used to display 'heal info') were also affected.

Comment 3 Worker Ant 2018-02-13 10:16:48 UTC
REVIEW: https://review.gluster.org/19560 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#1) for review on master by Anoop C S

Comment 4 Worker Ant 2018-02-13 10:35:15 UTC
REVISION POSTED: https://review.gluster.org/19560 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#2) for review on master by Anoop C S

Comment 5 Worker Ant 2018-02-19 05:46:31 UTC
REVIEW: https://review.gluster.org/19582 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#1) for review on release-4.0 by Anoop C S

Comment 6 Worker Ant 2018-02-19 13:53:10 UTC
COMMIT: https://review.gluster.org/19582 committed in release-4.0 by "Anoop C S" <anoopcs> with a commit message- protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure

With https://review.gluster.org/#/c/12363/ being merged, we no longer
send client's lk-version to server side and the corresponding check on
server is also removed. But when clients are upgraded prior to servers,
the check for lk-version at server side fails and is reported back to
clients resulting in disconnection.

Since we don't have lock-recovery (lk-version and grace-timeout) logic
anymore in code base our best bet would be to add client's default
lk-version i.e, 1, into the dictionary just to make server side check
pass and continue with remaining SETVOLUME operations.

Change-Id: I441b67bd271d1e9ba9a7c08703e651c7a6bd945b
BUG: 1544366
Signed-off-by: Anoop C S <anoopcs>
(cherry picked from commit c096bec4ec3f3ac33cc0787c60978944792e074e)

Comment 7 Shyamsundar 2018-03-15 11:27:04 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report.

glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.