Description of problem:
Was trying to test https://review.gluster.org/#/c/19538/ when I found that rolling upgrade from glusterfs-3.13 to the yet to be released glusterfs 4.0 is broken. In a 2 node setup, when one node is upgraded, clients mounted on each node can only see the local bricks and not the ones on the other node. I see the following errors in the client logs.:
E [MSGID: 114044] [client-handshake.c:1093:client_setvolume_cbk] 0-testvol-client-1: SETVOLUME on remote-host failed: lock state version not supplied [Invalid argument]
Steps to Reproduce:
1. Create a 2 node 1x2 volume and mount locally on each node, all on glustefs-3.13.
2. Upgrade one of the nodes to 4.0 branch
3. Clients can see only local bricks.
I think I failed to handle the scenario where clients are upgraded before servers.
RCA:(new clients[>=4.0] and old servers[<=3.13])
SETVOLUME request from client post-upgrade does not contain its lk-version in dictionary that is being passed onto server(https://review.gluster.org/#/c/12363/). This means that the server side check for "clnt-lk-version" inside the received dictionary would fail and error is returned back to new client.
(In reply to Anoop C S from comment #1)
> I think I failed to handle the scenario where clients are upgraded before
FWIW, it is not just the mount but things like self-heal-daemon and glfsheal (gfapi based program that is used to display 'heal info') were also affected.
REVIEW: https://review.gluster.org/19560 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#1) for review on master by Anoop C S
REVISION POSTED: https://review.gluster.org/19560 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#2) for review on master by Anoop C S
REVIEW: https://review.gluster.org/19582 (protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure) posted (#1) for review on release-4.0 by Anoop C S
COMMIT: https://review.gluster.org/19582 committed in release-4.0 by "Anoop C S" <email@example.com> with a commit message- protcol/client: Insert dummy clnt-lk-version to avoid upgrade failure
With https://review.gluster.org/#/c/12363/ being merged, we no longer
send client's lk-version to server side and the corresponding check on
server is also removed. But when clients are upgraded prior to servers,
the check for lk-version at server side fails and is reported back to
clients resulting in disconnection.
Since we don't have lock-recovery (lk-version and grace-timeout) logic
anymore in code base our best bet would be to add client's default
lk-version i.e, 1, into the dictionary just to make server side check
pass and continue with remaining SETVOLUME operations.
Signed-off-by: Anoop C S <firstname.lastname@example.org>
(cherry picked from commit c096bec4ec3f3ac33cc0787c60978944792e074e)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report.
glusterfs-4.0.0 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.