Bug 1609799

Summary:	IPv6 setup broken after updating to 4.1
Product:	[Community] GlusterFS	Reporter:	Pavel Znamensky <kompastver>
Component:	transport	Assignee:	bugs <bugs>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.1	CC:	atumball, bugs, y.zhao
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-6.x	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-05-11 00:25:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Znamensky 2018-07-30 13:33:02 UTC

Description of problem:

After updating existing cluster from 3.10 to 4.1 our setup stopped working.
Our cluster worked in IPv6-only environment.
And as I can see in tcpdump glusterfs 4.1 is trying to get only A (without AAAA) record for the other cluster member.

Version-Release number of selected component (if applicable):
glusterfs 4.1.1

How reproducible:
Setup glusterfs 4.1 in IPv6-only environment.


Actual results:
Cluster doesn't work because it doesn't see other nodes

Expected results:
Cluster works

Additional info:
In /var/log/glusterfs/glusterd.log:
[2018-07-30 13:20:05.088216] E [name.c:267:af_inet_client_get_remote_sockaddr]
0-management: DNS resolution failed on host srv1.prod                                                                             
[2018-07-30 13:20:05.088216] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host srv1.prod

~ # gluster pool list
UUID                                    Hostname                State
997fa0f6-c8d0-4207-8cef-95f25d1b9634    srv1.prod               Disconnected
c0a17e44-ea23-491f-805e-495cbd09bdf8    localhost               Connected


~ # gluster volume info test-volume

Volume Name: test-volume
Type: Distribute
Volume ID: 0a0be90a-5dd0-4d8d-98bc-0a2d9cfaf9f1
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: srv1.prod:/gl
Brick2: srv2.prod:/gl2
Options Reconfigured:
transport.address-family: inet6
nfs.disable: on

Comment 1 Pavel Znamensky 2018-07-30 13:40:42 UTC

I guess, related issues:
https://bugzilla.redhat.com/show_bug.cgi?id=1191072
https://bugzilla.redhat.com/show_bug.cgi?id=1277054

Comment 2 Pavel Znamensky 2018-08-01 08:44:09 UTC

With TRACE log level:

[2018-08-01 08:33:55.990804] T [rpc-clnt.c:404:rpc_clnt_reconnect] 0-management: attempting reconnect
[2018-08-01 08:33:55.991100] T [socket.c:3283:socket_connect] 0-management: connecting 0x563eb6dd3b70, state=0 gen=0 sock=-1
[2018-08-01 08:33:55.991413] D [dict.c:1126:data_to_uint16] (-->/usr/lib64/glusterfs/4.1.2/rpc-transport/socket.so(+0xafd0) [0x7f108f10efd0] -->/usr/lib64/glusterfs/4.1.2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x111) [0x7f108f112531] -->/lib64/libglusterfs.so.0(data_to_uint16+0x161) [0x7f109d436051] ) 0-dict: key null, unsigned integer type asked, has integer type [Invalid argument]
[2018-08-01 08:33:55.991724] D [logging.c:1983:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk
[2018-08-01 08:33:52.990678] T [MSGID: 0] [syncop.c:1031:__synclock_unlock] 0-: Unlock success 2373232384, remaining locks=0
[2018-08-01 08:33:55.991723] T [MSGID: 0] [common-utils.c:299:gf_resolve_ip6] 0-resolver: DNS cache not present, freshly probing hostname: srv1.prod
[2018-08-01 08:33:55.992678] E [MSGID: 101075] [common-utils.c:317:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known)
[2018-08-01 08:33:55.992817] E [name.c:274:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host srv1.prod

Also tried a build with --with-ipv6default flag and setting transport.socket.source-addr to a IPv6 address plus transport.address-family - inet6.
Nothing helped.
And at least gf_resolve_ip6 still uses AF_INET family.

Comment 3 Pavel Znamensky 2018-08-06 09:39:48 UTC

Actually IPv6 is broken since 3.11.

Comment 4 Pavel Znamensky 2018-08-10 08:47:02 UTC

I'm sorry. I was wrong about --with-ipv6default. There is a typo in the rpm spec: https://src.fedoraproject.org/rpms/glusterfs/blob/master/f/glusterfs.spec#_63
The correct flag is --with-ipv6-default.
But it doesn't help if you're using EL7 because of old libtirpc
https://src.fedoraproject.org/rpms/glusterfs/blob/f28/f/glusterfs.spec#_71
So I built newer libtirpc and rebuild glusterfs with --with-ipv6-default and glusterfs get worked!

Comment 5 Yan 2018-10-03 19:17:18 UTC

Have tested in 4.1.4 release and the IPv6 is still not working 

# gluster --version
glusterfs 4.1.4
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>


# gluster peer probe gluster-1
peer probe: failed: Probe returned with Transport endpoint is not connected
# ping6 gluster-1
PING gluster-1(gluster-1 (3010::13:199:0:0:42)) 56 data bytes
64 bytes from gluster-1 (3010::13:199:0:0:42): icmp_seq=1 ttl=64 time=1.54 ms
64 bytes from gluster-1 (3010::13:199:0:0:42): icmp_seq=2 ttl=64 time=0.439 ms


[2018-10-03 19:06:25.009874] I [MSGID: 106487] [glusterd-handler.c:1244:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req gluster-1 24007
[2018-10-03 19:06:25.010729] I [MSGID: 106128] [glusterd-handler.c:3635:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: gluster-1 (24007)
[2018-10-03 19:06:25.028897] W [MSGID: 106061] [glusterd-handler.c:3408:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2018-10-03 19:06:25.029031] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2018-10-03 19:06:25.033267] E [MSGID: 101075] [common-utils.c:312:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known)
[2018-10-03 19:06:25.033366] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host gluster-1
[2018-10-03 19:06:25.033538] I [MSGID: 106498] [glusterd-handler.c:3561:glusterd_friend_add] 0-management: connect returned 0
[2018-10-03 19:06:25.033657] I [MSGID: 106004] [glusterd-handler.c:6382:__glusterd_peer_rpc_notify] 0-management: Peer <gluster-1> (<00000000-0000-0000-0000-000000000000>), in state <Establishing Connection>, has disconnected from glusterd.

Comment 6 Pavel Znamensky 2018-10-03 19:30:27 UTC

Yan, did you try to build glusterfs with `--with-ipv6-default` flag?
For me, it works fine with this flag.

Comment 7 Yan 2018-10-03 19:33:19 UTC

I didn't rebuild the product. But have seen below change included in 4.1 and assuming it does the same. Otherwise, would expect a new fix. 

#1562052: build: revert configure --without-ipv6-default behaviour

Comment 8 Amar Tumballi 2019-05-11 00:25:14 UTC

We did fix few things with IPv6 with glusterfs-6.0 (now 6.1 is out), please upgrade. (https://bugzilla.redhat.com/1635863)