Description of problem: ----------------------- This issue is seen with the RHHI-V usecase. VM images are stored in the gluster volumes and geo-replicated to the secondary site, for DR use case. When IPv6 is used, the additional mount option is required --xlator-option=transport.address-family=inet6". But when geo-rep check for slave space with gverify.sh, these mount options are not considered and it fails to mount either master or slave volume Version-Release number of selected component (if applicable): -------------------------------------------------------------- RHGS 3.4.4 ( glusterfs-3.12.2-47 ) How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. Create geo-rep session from the master to slave Actual results: -------------- Creation of geo-rep session fails at gverify.sh Expected results: ----------------- Creation of geo-rep session should be successful Additional info:
[root@ ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 2620:52:0:4624:5054:ff:fee9:57f8 master.lab.eng.blr.redhat.com 2620:52:0:4624:5054:ff:fe6d:d816 slave.lab.eng.blr.redhat.com [root@ ~]# gluster volume info Volume Name: master Type: Distribute Volume ID: 9cf0224f-d827-4028-8a45-37f7bfaf1c78 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: master.lab.eng.blr.redhat.com:/gluster/brick1/master Options Reconfigured: performance.client-io-threads: on server.event-threads: 4 client.event-threads: 4 user.cifs: off features.shard: on network.remote-dio: enable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet6 nfs.disable: on [root@localhost ~]# gluster volume geo-replication master slave.lab.eng.blr.redhat.com::slave create push-pem Unable to mount and fetch slave volume details. Please check the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log geo-replication command failed Snip from gverify-slavemnt.log <snip> [2019-03-13 11:46:28.746494] I [MSGID: 100030] [glusterfsd.c:2646:main] 0-glusterfs: Started running glusterfs version 3.12.2 (args: glusterfs --xlator-option=*dht.lookup-unhashed=off --volfile-server slave.lab.eng.blr.redhat.com --volfile-id slave -l /var/log/glusterfs/geo-replication/gverify-slavemnt.log /tmp/gverify.sh.y1TCoY) [2019-03-13 11:46:28.750595] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction [2019-03-13 11:46:28.753702] E [MSGID: 101075] [common-utils.c:482:gf_resolve_ip6] 0-resolver: getaddrinfo failed (family:2) (Name or service not known) [2019-03-13 11:46:28.753725] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host slave.lab.eng.blr.redhat.com [2019-03-13 11:46:28.753953] I [glusterfsd-mgmt.c:2337:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: slave.lab.eng.blr.redhat.com [2019-03-13 11:46:28.753980] I [glusterfsd-mgmt.c:2358:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2019-03-13 11:46:28.753998] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2019-03-13 11:46:28.754073] I [MSGID: 101190] [event-epoll.c:676:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-03-13 11:46:28.754154] W [glusterfsd.c:1462:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(rpc_clnt_notify+0xab) [0x7fc39d379bab] -->glusterfs(+0x11fcd) [0x56427db95fcd] -->glusterfs(cleanup_and_exit+0x6b) [0x56427db8eb2b] ) 0-: received signum (1), shutting down [2019-03-13 11:46:28.754197] I [fuse-bridge.c:6611:fini] 0-fuse: Unmounting '/tmp/gverify.sh.y1TCoY'. [2019-03-13 11:46:28.760213] I [fuse-bridge.c:6616:fini] 0-fuse: Closing fuse connection to '/tmp/gverify.sh.y1TCoY'. </snip>
Following changes done to Geo-replication to support IPv6. - Added ipv6 mount option whenever Gluster volume is mounted by Geo-rep - Local gluster CLI connects to Glusterd using Unix socket, that is why all Gluster CLI commands are working fine. Geo-rep uses gluster cli with `--remote-host=` option to get the details from remote Glusterd. Fixed Ipv6 handling when remote-host is used. Known limitations: - Only FQDN are supported, Geo-rep fails if IPv6 IP is specified instead of FQDN - IPv6 enabled state is taken from Master Glusterd, Geo-rep will fail if IPv6 is enabled in Master and IPv6 is disabled in Slave(and vice versa) Upstream Patch: https://review.gluster.org/#/c/glusterfs/+/22363/ Downstream Patch: https://code.engineering.redhat.com/gerrit/#/c/165434/
Tested the fix with the scratch-build and the fix works great. 1. Able to create geo-rep session 2. Files from master volume synced to slave volume over IPV6 Checked for data integrity too and there are no problems observed. This issue will be marked down as a known_issue with RHHI-V, till the fix is included in the build. Removing the release_blocker for RHGS 3.4.4 set on this bug
Aravinda, can you ack this bug?
Tested with RHGS node with IPV6 turned on and no IPV4, with RHGS 3.5.0 interim build ( glusterfs-6.0-13.el7rhgs ) Setup Details: 1. 3 node master gluster cluster ( Trusted Storage Pool ) and 3 node slave cluster 2. Replica 3 volume as master and slave 3. Glusterd volfile is edited to turn on transport-family inet6 and glusterd service restarted 4. Static IPV6 used with hostnames assigned in /etc/hosts Steps 1. All the gluster commands used the IPV6 hostnames in /etc/hosts 2. Geo-rep session is estabilished, and enabled gluster shared storage 3. Fuse mount is done on the master node, with additional mount option "xlator-option='transport.address-family=inet6'" 4. Few files are written 5. Geo-rep checkpoint is set and session is started 6. Once the checkpoint is reached, computed sha256sum of all the files on the slave side 7. Checksums on the master side matches with the one on the slave side
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days