Bug 1677160

Summary: Gluster 5 client can't access Gluster 3.12 servers
Product: [Community] GlusterFS Reporter: Sandro Bonazzola <sbonazzo>
Component: coreAssignee: Sanju <srakonde>
Status: CLOSED WORKSFORME QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5CC: alexander, amukherj, atumball, budic, bugs, gianluca.cecchi, info, pasik, sabose, sankarshan, sbonazzo, srakonde, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.x Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-10 06:15:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1672318, 1677319    

Description Sandro Bonazzola 2019-02-14 07:46:06 UTC
Originally reported on bug #1672318 for oVirt.
In oVirt 4.2 we relied on Gluster 3.12. Being it now unmaintained we switched to Gluster 5.
During the upgrade of a big datacenter it's impossible to upgrade all the hosts at the same time so at a given time there will be clients using 3.12 and 5 at the same time.
In this situation 3.12 servers are still in place.
Gluster 5 clients should still be able to work with Gluster 3.12 servers in order to allow clean upgrades.

Comment 1 Sandro Bonazzola 2019-02-14 07:46:58 UTC
Sahina, can you help here?

Comment 2 Sahina Bose 2019-02-14 13:06:06 UTC
Amar, do you know of any issues with 3.12 clients connecting to gluster 5 servers.

Comment 3 Sandro Bonazzola 2019-02-14 13:08:29 UTC
(In reply to Sahina Bose from comment #2)
> Amar, do you know of any issues with 3.12 clients connecting to gluster 5
> servers.

this is gluster 5 clients connecting to gluster 3 servers.

Comment 4 Sahina Bose 2019-02-14 13:37:31 UTC
Assigning to Sanju who's looking into it.

Comment 5 Sanju 2019-02-14 13:50:48 UTC
Sandro,

Can you please provide mount logs and bricks logs?

Thanks,
Sanju

Comment 6 Sandro Bonazzola 2019-02-14 14:00:47 UTC
(In reply to Sanju from comment #5)
> Sandro,
> 
> Can you please provide mount logs and bricks logs?

Added you to the community report of this issue so you can interact directly with original reporter.

> 
> Thanks,
> Sanju

Comment 7 Netbulae 2019-02-14 14:22:45 UTC
There are no brick logs at the client and nothing in the brick logs on the glusterfs servers regarding these hosts.

[2019-02-04 12:47:02.979349] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-02-04 12:58:10.232517] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=*.*.*.16 --volfile-server=*.*.*.15 --volfile-server=*.*.*.14 --volfile-id=ssd5 /rhev/data-center/mnt/glusterSD/*.*.*.16:ssd5)
[2019-02-04 12:58:10.242924] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-02-04 13:02:44.511106] I [glusterfsd-mgmt.c:2424:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: *.*.*.16
[2019-02-04 13:02:44.511157] I [glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting to next volfile server *.*.*.15
[2019-02-04 13:02:44.512757] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f58b4ccffbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f58b4a98e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f58b4a98f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f58b4a9a531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f58b4a9b0d8] ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) op(GETSPEC(2)) called at 2019-02-04 12:47:02.979593 (xid=0x2)
[2019-02-04 13:02:44.512779] E [glusterfsd-mgmt.c:2136:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:ssd5)

Comment 8 Atin Mukherjee 2019-02-20 11:00:28 UTC
Sahina, Sandro - could you confirm the exact gluster version (rpm -qa | grep glusterfs) running in the server side?

Comment 9 Netbulae 2019-02-20 12:34:10 UTC
Let me provide you with this info as I'm the original reporter.

glusterfs-fuse-3.12.15-1.el7.x86_64
glusterfs-cli-3.12.15-1.el7.x86_64
glusterfs-events-3.12.15-1.el7.x86_64
glusterfs-libs-3.12.15-1.el7.x86_64
glusterfs-gnfs-3.12.15-1.el7.x86_64
glusterfs-server-3.12.15-1.el7.x86_64
glusterfs-client-xlators-3.12.15-1.el7.x86_64
glusterfs-api-3.12.15-1.el7.x86_64
glusterfs-geo-replication-3.12.15-1.el7.x86_64
glusterfs-3.12.15-1.el7.x86_64

Comment 10 Sanju 2019-02-21 08:17:25 UTC
We tried to reproduce this issue but couldn't hit it. If you happen to hit this issue, please provide us all the log files from /var/log/glusterfs (for both glusterfs-server and client from all the machines).

Thanks,
Sanju

Comment 11 Sahina Bose 2019-02-25 09:33:11 UTC
(In reply to Sanju from comment #10)
> We tried to reproduce this issue but couldn't hit it. If you happen to hit
> this issue, please provide us all the log files from /var/log/glusterfs (for
> both glusterfs-server and client from all the machines).
> 
> Thanks,
> Sanju

Sanju, please look at Bug 1672318 as well.

Comment 12 Darrell 2019-03-26 16:42:58 UTC
I encountered this with 5.3 and 5.5 clients connecting to gluster 3.12.15 servers. Might be multiple problems.

At first, I encountered https://bugzilla.redhat.com/show_bug.cgi?id=1651246 with 5.3 clients, and 5.5 resolved that problem. I've hit a new one though, so adding my details.

Initially, a new 5.5 mount to a 3.12.15 cluster of 3 servers succeeds and everything works well. If you reboot one of the servers, however, all clients no longer connect to it and the other servers are forced to heal everything to the 3rd server. Restarting the clients (new mounts) will cause them to reconnect until you restart a server again. Affects both fuse and glfapi clients.

Server brick example from rebooted server (lots of these repeating):
[2019-03-25 17:45:37.588519] I [socket.c:3679:socket_submit_reply] 0-socket.mana
gement: not connected (priv->connected = -1)
[2019-03-25 17:45:37.588571] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x542ab, Program: GF-DUMP, ProgVers: 1, Proc:
 2) to rpc-transport (socket.management)
[2019-03-25 17:48:25.944496] I [socket.c:3679:socket_submit_reply] 0-socket.mana
gement: not connected (priv->connected = -1)
[2019-03-25 17:48:25.944547] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x38036, Program: GF-DUMP, ProgVers: 1, Proc:
 2) to rpc-transport (socket.management)
[2019-03-25 17:50:34.306141] I [socket.c:3679:socket_submit_reply] 0-socket.mana
gement: not connected (priv->connected = -1)
[2019-03-25 17:50:34.306206] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x1e050e, Program: GF-DUMP, ProgVers: 1, Proc
: 2) to rpc-transport (socket.management)
[2019-03-25 17:51:58.082944] I [socket.c:3679:socket_submit_reply] 0-socket.mana
gement: not connected (priv->connected = -1)
[2019-03-25 17:51:58.082999] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x1ec5, Program: GF-DUMP, ProgVers: 1, Proc: 
2) to rpc-transport (socket.management)

Client brick example (also lots repeating):
[2019-03-26 14:55:50.582757] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:55:54.582490] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:55:54.585627] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:55:54.585283 (xid=0x3ef42)
[2019-03-26 14:55:54.585644] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:55:58.585636] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:55:58.588760] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:55:58.588478 (xid=0x3ef47)
[2019-03-26 14:55:58.588779] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:56:02.589009] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:56:02.592150] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:56:02.591818 (xid=0x3ef4c)
[2019-03-26 14:56:02.592166] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:56:06.592208] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:56:06.595306] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:56:06.594965 (xid=0x3ef51)
[2019-03-26 14:56:06.595343] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:56:10.594781] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:56:10.597780] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:56:10.597488 (xid=0x3ef56)
[2019-03-26 14:56:10.597796] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:56:14.597866] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)

Bricks didn't crash, just the clients wouldn't talk to them.

Upgrading the currently affected server to 5.5 and rebooting it caused the clients to reconnect to normally.

Comment 13 Sanju 2019-03-27 08:59:59 UTC
Darrel,

Did you collect any logs? If so, please provide us with all the log files from /var/log/glusterfs (for both glusterfs-server and client from all the machines). That helps us in debugging this issue further.

Thanks,
Sanju

Comment 14 Amar Tumballi 2019-07-10 06:15:16 UTC
According to our testing, this is not happening with latest glusterfs-6.x releases. Closing it as WORKSFORME, please open if found again.