1677160 – Gluster 5 client can't access Gluster 3.12 servers

Bug 1677160 - Gluster 5 client can't access Gluster 3.12 servers

Summary: Gluster 5 client can't access Gluster 3.12 servers

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	5
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Sanju
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1672318 Gluster_5_Affecting_oVirt_4.3
TreeView+	depends on / blocked

Reported:	2019-02-14 07:46 UTC by Sandro Bonazzola
Modified:	2019-10-22 20:27 UTC (History)
CC List:	13 users (show)
Fixed In Version:	glusterfs-6.x
Clone Of:
Environment:
Last Closed:	2019-07-10 06:15:16 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sandro Bonazzola 2019-02-14 07:46:06 UTC

Originally reported on bug #1672318 for oVirt.
In oVirt 4.2 we relied on Gluster 3.12. Being it now unmaintained we switched to Gluster 5.
During the upgrade of a big datacenter it's impossible to upgrade all the hosts at the same time so at a given time there will be clients using 3.12 and 5 at the same time.
In this situation 3.12 servers are still in place.
Gluster 5 clients should still be able to work with Gluster 3.12 servers in order to allow clean upgrades.

Comment 1 Sandro Bonazzola 2019-02-14 07:46:58 UTC

Sahina, can you help here?

Comment 2 Sahina Bose 2019-02-14 13:06:06 UTC

Amar, do you know of any issues with 3.12 clients connecting to gluster 5 servers.

Comment 3 Sandro Bonazzola 2019-02-14 13:08:29 UTC

(In reply to Sahina Bose from comment #2)
> Amar, do you know of any issues with 3.12 clients connecting to gluster 5
> servers.

this is gluster 5 clients connecting to gluster 3 servers.

Comment 4 Sahina Bose 2019-02-14 13:37:31 UTC

Assigning to Sanju who's looking into it.

Comment 5 Sanju 2019-02-14 13:50:48 UTC

Sandro,

Can you please provide mount logs and bricks logs?

Thanks,
Sanju

Comment 6 Sandro Bonazzola 2019-02-14 14:00:47 UTC

(In reply to Sanju from comment #5)
> Sandro,
> 
> Can you please provide mount logs and bricks logs?

Added you to the community report of this issue so you can interact directly with original reporter.

> 
> Thanks,
> Sanju

Comment 7 Netbulae 2019-02-14 14:22:45 UTC

There are no brick logs at the client and nothing in the brick logs on the glusterfs servers regarding these hosts.

[2019-02-04 12:47:02.979349] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-02-04 12:58:10.232517] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --process-name fuse --volfile-server=*.*.*.16 --volfile-server=*.*.*.15 --volfile-server=*.*.*.14 --volfile-id=ssd5 /rhev/data-center/mnt/glusterSD/*.*.*.16:ssd5)
[2019-02-04 12:58:10.242924] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-02-04 13:02:44.511106] I [glusterfsd-mgmt.c:2424:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: *.*.*.16
[2019-02-04 13:02:44.511157] I [glusterfsd-mgmt.c:2464:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting to next volfile server *.*.*.15
[2019-02-04 13:02:44.512757] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f58b4ccffbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f58b4a98e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f58b4a98f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f58b4a9a531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f58b4a9b0d8] ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) op(GETSPEC(2)) called at 2019-02-04 12:47:02.979593 (xid=0x2)
[2019-02-04 13:02:44.512779] E [glusterfsd-mgmt.c:2136:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:ssd5)

Comment 8 Atin Mukherjee 2019-02-20 11:00:28 UTC

Sahina, Sandro - could you confirm the exact gluster version (rpm -qa | grep glusterfs) running in the server side?

Comment 9 Netbulae 2019-02-20 12:34:10 UTC

Let me provide you with this info as I'm the original reporter.

glusterfs-fuse-3.12.15-1.el7.x86_64
glusterfs-cli-3.12.15-1.el7.x86_64
glusterfs-events-3.12.15-1.el7.x86_64
glusterfs-libs-3.12.15-1.el7.x86_64
glusterfs-gnfs-3.12.15-1.el7.x86_64
glusterfs-server-3.12.15-1.el7.x86_64
glusterfs-client-xlators-3.12.15-1.el7.x86_64
glusterfs-api-3.12.15-1.el7.x86_64
glusterfs-geo-replication-3.12.15-1.el7.x86_64
glusterfs-3.12.15-1.el7.x86_64

Comment 10 Sanju 2019-02-21 08:17:25 UTC

We tried to reproduce this issue but couldn't hit it. If you happen to hit this issue, please provide us all the log files from /var/log/glusterfs (for both glusterfs-server and client from all the machines).

Thanks,
Sanju

Comment 11 Sahina Bose 2019-02-25 09:33:11 UTC

(In reply to Sanju from comment #10)
> We tried to reproduce this issue but couldn't hit it. If you happen to hit
> this issue, please provide us all the log files from /var/log/glusterfs (for
> both glusterfs-server and client from all the machines).
> 
> Thanks,
> Sanju

Sanju, please look at Bug 1672318 as well.

Comment 12 Darrell 2019-03-26 16:42:58 UTC

I encountered this with 5.3 and 5.5 clients connecting to gluster 3.12.15 servers. Might be multiple problems.

At first, I encountered https://bugzilla.redhat.com/show_bug.cgi?id=1651246 with 5.3 clients, and 5.5 resolved that problem. I've hit a new one though, so adding my details.

Initially, a new 5.5 mount to a 3.12.15 cluster of 3 servers succeeds and everything works well. If you reboot one of the servers, however, all clients no longer connect to it and the other servers are forced to heal everything to the 3rd server. Restarting the clients (new mounts) will cause them to reconnect until you restart a server again. Affects both fuse and glfapi clients.

Server brick example from rebooted server (lots of these repeating):
[2019-03-25 17:45:37.588519] I [socket.c:3679:socket_submit_reply] 0-socket.mana
gement: not connected (priv->connected = -1)
[2019-03-25 17:45:37.588571] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x542ab, Program: GF-DUMP, ProgVers: 1, Proc:
 2) to rpc-transport (socket.management)
[2019-03-25 17:48:25.944496] I [socket.c:3679:socket_submit_reply] 0-socket.mana
gement: not connected (priv->connected = -1)
[2019-03-25 17:48:25.944547] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x38036, Program: GF-DUMP, ProgVers: 1, Proc:
 2) to rpc-transport (socket.management)
[2019-03-25 17:50:34.306141] I [socket.c:3679:socket_submit_reply] 0-socket.mana
gement: not connected (priv->connected = -1)
[2019-03-25 17:50:34.306206] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x1e050e, Program: GF-DUMP, ProgVers: 1, Proc
: 2) to rpc-transport (socket.management)
[2019-03-25 17:51:58.082944] I [socket.c:3679:socket_submit_reply] 0-socket.mana
gement: not connected (priv->connected = -1)
[2019-03-25 17:51:58.082999] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x1ec5, Program: GF-DUMP, ProgVers: 1, Proc: 
2) to rpc-transport (socket.management)

Client brick example (also lots repeating):
[2019-03-26 14:55:50.582757] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:55:54.582490] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:55:54.585627] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:55:54.585283 (xid=0x3ef42)
[2019-03-26 14:55:54.585644] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:55:58.585636] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:55:58.588760] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:55:58.588478 (xid=0x3ef47)
[2019-03-26 14:55:58.588779] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:56:02.589009] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:56:02.592150] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:56:02.591818 (xid=0x3ef4c)
[2019-03-26 14:56:02.592166] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:56:06.592208] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:56:06.595306] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:56:06.594965 (xid=0x3ef51)
[2019-03-26 14:56:06.595343] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:56:10.594781] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)
[2019-03-26 14:56:10.597780] E [rpc-clnt.c:346:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a5164efbb] (--> /lib64/libgfrpc.so.0(+0xce11)[0x7f4a51417e11] (--> /lib64/libgfrpc.so.0(+0xcf2e)[0x7f4a51417f2e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4a51419531] (--> /lib64/libgfrpc.so.0(+0xf0d8)[0x7f4a5141a0d8] ))))) 0-gv1-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-03-26 14:56:10.597488 (xid=0x3ef56)
[2019-03-26 14:56:10.597796] W [rpc-clnt-ping.c:215:rpc_clnt_ping_cbk] 0-gv1-client-1: socket disconnected
[2019-03-26 14:56:14.597866] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gv1-client-1: changing port to 50155 (from 0)

Bricks didn't crash, just the clients wouldn't talk to them.

Upgrading the currently affected server to 5.5 and rebooting it caused the clients to reconnect to normally.

Comment 13 Sanju 2019-03-27 08:59:59 UTC

Darrel,

Did you collect any logs? If so, please provide us with all the log files from /var/log/glusterfs (for both glusterfs-server and client from all the machines). That helps us in debugging this issue further.

Thanks,
Sanju

Comment 14 Amar Tumballi 2019-07-10 06:15:16 UTC

According to our testing, this is not happening with latest glusterfs-6.x releases. Closing it as WORKSFORME, please open if found again.

Note You need to log in before you can comment on or make changes to this bug.