Description of problem: Was trying to test https://review.gluster.org/#/c/19538/ when I found that rolling upgrade from glusterfs-3.13 to the yet to be released glusterfs 4.0 is broken. In a 2 node setup, when one node is upgraded, clients mounted on each node can only see the local bricks and not the ones on the other node. I see the following errors in the client logs.: E [MSGID: 114044] [client-handshake.c:1093:client_setvolume_cbk] 0-testvol-client-1: SETVOLUME on remote-host failed: lock state version not supplied [Invalid argument] Steps to Reproduce: 1. Create a 2 node 1x2 volume and mount locally on each node, all on glustefs-3.13. 2. Upgrade one of the nodes to 4.0 branch 3. Clients can see only local bricks.
I think I failed to handle the scenario where clients are upgraded before servers. RCA:(new clients[>=4.0] and old servers[<=3.13]) SETVOLUME request from client post-upgrade does not contain its lk-version in dictionary that is being passed onto server(https://review.gluster.org/#/c/12363/). This means that the server side check for "clnt-lk-version" inside the received dictionary would fail and error is returned back to new client.
(In reply to Anoop C S from comment #1) > I think I failed to handle the scenario where clients are upgraded before > servers. > FWIW, it is not just the mount but things like self-heal-daemon and glfsheal (gfapi based program that is used to display 'heal info') were also affected.
REVIEW: https://review.gluster.org/19560 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#1) for review on master by Anoop C S
REVISION POSTED: https://review.gluster.org/19560 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#2) for review on master by Anoop C S
REVIEW: https://review.gluster.org/19582 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#1) for review on release-4.0 by Anoop C S
COMMIT: https://review.gluster.org/19582 committed in release-4.0 by "Anoop C S" <anoopcs> with a commit message- protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure With https://review.gluster.org/#/c/12363/ being merged, we no longer send client's lk-version to server side and the corresponding check on server is also removed. But when clients are upgraded prior to servers, the check for lk-version at server side fails and is reported back to clients resulting in disconnection. Since we don't have lock-recovery (lk-version and grace-timeout) logic anymore in code base our best bet would be to add client's default lk-version i.e, 1, into the dictionary just to make server side check pass and continue with remaining SETVOLUME operations. Change-Id: I441b67bd271d1e9ba9a7c08703e651c7a6bd945b BUG: 1544366 Signed-off-by: Anoop C S <anoopcs> (cherry picked from commit c096bec4ec3f3ac33cc0787c60978944792e074e)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report. glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html [2] https://www.gluster.org/pipermail/gluster-users/