Originally reported on bug #1672318 for oVirt. In oVirt 4.2 we relied on Gluster 3.12. Being it now unmaintained we switched to Gluster 5. During the upgrade of a big datacenter it's impossible to upgrade all the hosts at the same time so at a given time there will be clients using 3.12 and 5 at the same time. In this situation 3.12 servers are still in place. Gluster 5 clients should still be able to work with Gluster 3.12 servers in order to allow clean upgrades.
Sahina, can you help here?
Amar, do you know of any issues with 3.12 clients connecting to gluster 5 servers.
(In reply to Sahina Bose from comment #2) > Amar, do you know of any issues with 3.12 clients connecting to gluster 5 > servers. this is gluster 5 clients connecting to gluster 3 servers.
Assigning to Sanju who's looking into it.
Sandro, Can you please provide mount logs and bricks logs? Thanks, Sanju
(In reply to Sanju from comment #5) > Sandro, > > Can you please provide mount logs and bricks logs? Added you to the community report of this issue so you can interact directly with original reporter. > > Thanks, > Sanju
There are no brick logs at the client and nothing in the brick logs on the glusterfs servers regarding these hosts. [2019-02-04 12:47:02.979349] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-02-04 12:58:10.232517] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=*.*.*.16 --volfile-server=*.*.*.15 --volfile-server=*.*.*.14 --volfile-id=ssd5 /rhev/data-center/mnt/glusterSD/*.*.*.16:ssd5) [2019-02-04 12:58:10.242924] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-02-04 13:02:44.511106] I [glusterfsd-mgmt.c:2424:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: *.*.*.16 [2019-02-04 13:02:44.511157] I [glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting to next volfile server *.*.*.15 [2019-02-04 13:02:44.512757] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f58b4ccffbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f58b4a98e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f58b4a98f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f58b4a9a531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f58b4a9b0d8] ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) op(GETSPEC(2)) called at 2019-02-04 12:47:02.979593 (xid=0x2) [2019-02-04 13:02:44.512779] E [glusterfsd-mgmt.c:2136:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:ssd5)
Sahina, Sandro - could you confirm the exact gluster version (rpm -qa | grep glusterfs) running in the server side?
Let me provide you with this info as I'm the original reporter. glusterfs-fuse-3.12.15-1.el7.x86_64 glusterfs-cli-3.12.15-1.el7.x86_64 glusterfs-events-3.12.15-1.el7.x86_64 glusterfs-libs-3.12.15-1.el7.x86_64 glusterfs-gnfs-3.12.15-1.el7.x86_64 glusterfs-server-3.12.15-1.el7.x86_64 glusterfs-client-xlators-3.12.15-1.el7.x86_64 glusterfs-api-3.12.15-1.el7.x86_64 glusterfs-geo-replication-3.12.15-1.el7.x86_64 glusterfs-3.12.15-1.el7.x86_64
We tried to reproduce this issue but couldn't hit it. If you happen to hit this issue, please provide us all the log files from /var/log/glusterfs (for both glusterfs-server and client from all the machines). Thanks, Sanju
(In reply to Sanju from comment #10) > We tried to reproduce this issue but couldn't hit it. If you happen to hit > this issue, please provide us all the log files from /var/log/glusterfs (for > both glusterfs-server and client from all the machines). > > Thanks, > Sanju Sanju, please look at Bug 1672318 as well.
I encountered this with 5.3 and 5.5 clients connecting to gluster 3.12.15 servers. Might be multiple problems. At first, I encountered https://bugzilla.redhat.com/show_bug.cgi?id=1651246 with 5.3 clients, and 5.5 resolved that problem. I've hit a new one though, so adding my details. Initially, a new 5.5 mount to a 3.12.15 cluster of 3 servers succeeds and everything works well. If you reboot one of the servers, however, all clients no longer connect to it and the other servers are forced to heal everything to the 3rd server. Restarting the clients (new mounts) will cause them to reconnect until you restart a server again. Affects both fuse and glfapi clients. Server brick example from rebooted server (lots of these repeating): [2019-03-25 17:45:37.588519] I [socket.c:3679:socket_submit_reply] 0-socket.mana gement: not connected (priv->connected = -1) [2019-03-25 17:45:37.588571] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi ce: failed to submit message (XID: 0x542ab, Program: GF-DUMP, ProgVers: 1, Proc: 2) to rpc-transport (socket.management) [2019-03-25 17:48:25.944496] I [socket.c:3679:socket_submit_reply] 0-socket.mana gement: not connected (priv->connected = -1) [2019-03-25 17:48:25.944547] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi ce: failed to submit message (XID: 0x38036, Program: GF-DUMP, ProgVers: 1, Proc: 2) to rpc-transport (socket.management) [2019-03-25 17:50:34.306141] I [socket.c:3679:socket_submit_reply] 0-socket.mana gement: not connected (priv->connected = -1) [2019-03-25 17:50:34.306206] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi ce: failed to submit message (XID: 0x1e050e, Program: GF-DUMP, ProgVers: 1, Proc : 2) to rpc-transport (socket.management) [2019-03-25 17:51:58.082944] I [socket.c:3679:socket_submit_reply] 0-socket.mana gement: not connected (priv->connected = -1) [2019-03-25 17:51:58.082999] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi ce: failed to submit message (XID: 0x1ec5, Program: GF-DUMP, ProgVers: 1, Proc: 2) to rpc-transport (socket.management) Client brick example (also lots repeating): [2019-03-26 14:55:50.582757] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected [2019-03-26 14:55:54.582490] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0) [2019-03-26 14:55:54.585627] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:55:54.585283 (xid=0x3ef42) [2019-03-26 14:55:54.585644] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected [2019-03-26 14:55:58.585636] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0) [2019-03-26 14:55:58.588760] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:55:58.588478 (xid=0x3ef47) [2019-03-26 14:55:58.588779] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected [2019-03-26 14:56:02.589009] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0) [2019-03-26 14:56:02.592150] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:56:02.591818 (xid=0x3ef4c) [2019-03-26 14:56:02.592166] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected [2019-03-26 14:56:06.592208] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0) [2019-03-26 14:56:06.595306] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:56:06.594965 (xid=0x3ef51) [2019-03-26 14:56:06.595343] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected [2019-03-26 14:56:10.594781] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0) [2019-03-26 14:56:10.597780] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:56:10.597488 (xid=0x3ef56) [2019-03-26 14:56:10.597796] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected [2019-03-26 14:56:14.597866] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0) Bricks didn't crash, just the clients wouldn't talk to them. Upgrading the currently affected server to 5.5 and rebooting it caused the clients to reconnect to normally.
Darrel, Did you collect any logs? If so, please provide us with all the log files from /var/log/glusterfs (for both glusterfs-server and client from all the machines). That helps us in debugging this issue further. Thanks, Sanju
According to our testing, this is not happening with latest glusterfs-6.x releases. Closing it as WORKSFORME, please open if found again.