Bug 1807007 - The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not.
Summary: The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 5
Hardware: All
OS: Linux
high
urgent
Target Milestone: ---
Assignee: Mohit Agrawal
QA Contact:
URL:
Whiteboard:
Depends On: 1739320 1747746 1749664 1750241
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-25 13:08 UTC by Mohit Agrawal
Modified: 2020-03-02 08:13 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1747746
Environment:
Last Closed: 2020-03-02 08:13:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 24175 0 None Merged rpc: Update address family if it is not provide in cmd-line arguments 2020-03-02 08:13:47 UTC

Description Mohit Agrawal 2020-02-25 13:08:27 UTC
+++ This bug was initially created as a clone of Bug #1747746 +++

+++ This bug was initially created as a clone of Bug #1739320 +++

Description of problem:

When creating a volume using IPv6, failed with an error that bricks are on same hostname while they are not.

The result (hostname) of getnameinfo for all bricks (ipv6 addresses)  are the same, while they are not. 

Version-Release number of selected component (if applicable):
6.3-1 and 6.4-1

How reproducible:


Steps to Reproduce:
1. Create a volume with replica 3 using the command:
gluster --mode=script volume create vol_b6b4f444031cb86c969f3fc744f2e999 replica 3  2001:db8:1234::10:/root/test/a 2001:db8:1234::5:/root/test/a  2001:db8:1234::14:/root/test/a
2. Error happens that all bricks on the same hostname
3. check those addresses using nslookup which shows the opposite, those IP belongs to different hostnames

Actual results:
===============
# gluster --mode=script volume create vol_b6b4f444031cb86c969f3fc744f2e999 replica 3  2001:db8:1234::10:/root/test/a 2001:db8:1234::5:/root/test/a  2001:db8:1234::14:/root/test/a
volume create: vol_b6b4f444031cb86c969f3fc744f2e999: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Bricks should be on different nodes to have best fault tolerant configuration. Use 'force' at the end of the command if you want to override this behavior.

# nslookup 2001:db8:1234::10
Server:         2001:db8:1234::5
Address:        2001:db8:1234::5#53

0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.4.3.2.1.8.b.d.0.1.0.0.2.ip6.arpa        name = roger-1812-we-01.

# nslookup 2001:db8:1234::5
Server:         2001:db8:1234::5
Address:        2001:db8:1234::5#53

5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.4.3.2.1.8.b.d.0.1.0.0.2.ip6.arpa        name = roger-1903-we-01.

# nslookup 2001:db8:1234::14
Server:         2001:db8:1234::5
Address:        2001:db8:1234::5#53

4.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.4.3.2.1.8.b.d.0.1.0.0.2.ip6.arpa        name = roger-1812-cwes-01.


Expected results:
Volume should succeed

Additional info:

Here are  the code snippets from glusterfs 6.4, which has the problem
For some reason,  the result (hostname) of getnameinfo for all bricks (ipv6 addresses)  are the same, while actually they are not. 
================
xlators/mgmt/glusterd/src/glusterd-volume-ops.c

gf_ai_compare_t
glusterd_compare_addrinfo(struct addrinfo *first, struct addrinfo *next)
{
    int ret = -1;
    struct addrinfo *tmp1 = NULL;
    struct addrinfo *tmp2 = NULL;
    char firstip[NI_MAXHOST] = {0.};
    char nextip[NI_MAXHOST] = {
        0,
    };

    for (tmp1 = first; tmp1 != NULL; tmp1 = tmp1->ai_next) {
        ret = getnameinfo(tmp1->ai_addr, tmp1->ai_addrlen, firstip, NI_MAXHOST,
                          NULL, 0, NI_NUMERICHOST);
        if (ret)
            return GF_AI_COMPARE_ERROR;
        for (tmp2 = next; tmp2 != NULL; tmp2 = tmp2->ai_next) {
            ret = getnameinfo(tmp2->ai_addr, tmp2->ai_addrlen, nextip,
                              NI_MAXHOST, NULL, 0, NI_NUMERICHOST);
            if (ret)
                return GF_AI_COMPARE_ERROR;
            if (!strcmp(firstip, nextip)) {
                return GF_AI_COMPARE_MATCH;
            }
        }
    }
    return GF_AI_COMPARE_NO_MATCH;
}

...
            if (GF_AI_COMPARE_MATCH == ret)
                goto found_bad_brick_order;

...

found_bad_brick_order:
    gf_msg(this->name, GF_LOG_INFO, 0, GD_MSG_BAD_BRKORDER,
           "Bad brick order found");
    if (type == GF_CLUSTER_TYPE_DISPERSE) {
        snprintf(err_str, sizeof(found_string), found_string, "disperse");
    } else {
        snprintf(err_str, sizeof(found_string), found_string, "replicate");
    }
....
   const char found_string[2048] =
        "Multiple bricks of a %s "
        "volume are present on the same server. This "
        "setup is not optimal. Bricks should be on "
        "different nodes to have best fault tolerant "
        "configuration. Use 'force' at the end of the "
        "command if you want to override this "
        "behavior. ";

--- Additional comment from Amgad on 2019-08-11 04:34:04 UTC ---

Any response?

--- Additional comment from Ravishankar N on 2019-08-12 04:21:10 UTC ---

CC'ing glusterd maintainer to take a look.

--- Additional comment from Amgad on 2019-08-14 05:12:19 UTC ---

can someone provide a pointer to the "getnameinfo" source code while looking at the issue

--- Additional comment from Atin Mukherjee on 2019-08-14 05:19:45 UTC ---

Aravinda - can you please help here?

--- Additional comment from Aravinda VK on 2019-08-14 06:15:06 UTC ---

I think it is failing while doing strcmp comparison.

```
            if (!strcmp(firstip, nextip)) {
                return GF_AI_COMPARE_MATCH;
            }
```

Wrote a small script to compare the hostnames

```
#include <stdio.h>
#include <string.h>

int main()
{
    char* first = "roger-1812-we-01";
    char* second = "roger-1903-we-01";
    char* third = "roger-1812-cwes-01";
    printf("First(%s)  vs Second(%s): %d\n", first, second, strcmp(first, second));
    printf("First(%s)  vs Third(%s): %d\n", first, third, strcmp(first, third));
    printf("Second(%s) vs Third(%s): %d\n", second, third, strcmp(second, third));
}

```

And the output is


First(roger-1812-we-01)  vs Second(roger-1903-we-01): -1
First(roger-1812-we-01)  vs Third(roger-1812-cwes-01): 20
Second(roger-1903-we-01) vs Third(roger-1812-cwes-01): 1


We should change the comparison to 

```
if (strcmp(firstip, nextip) == 0) {
    return GF_AI_COMPARE_MATCH;
}
```

--- Additional comment from Aravinda VK on 2019-08-14 06:25:06 UTC ---

Ignore my previous comment. I was wrong. Thanks Amar for pointing that. `!-1` is `0`

--- Additional comment from Amgad on 2019-08-16 14:50:41 UTC ---

I did some testing on "getnameinfo" and it works fine. When you pass the IPv6 address, it returns the right IP address.

I used the following test program:
#include <arpa/inet.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>

#define SIZE 1024

int main(int argc, char *argv[])
{
    char host[SIZE];
    char service[SIZE];

    struct sockaddr_in6 sa;
    sa.sin6_family = AF_INET6;
    inet_pton(AF_INET6, argv[1], &sa.sin6_addr);

    int res = getnameinfo((struct sockaddr*)&sa, sizeof(sa), host, sizeof(host), service, sizeof(service), 0);

    if(res)
    {
        exit(1);
    }
    else
    {
        printf("Hostname: %s\n", host);
        printf("Service: %s\n", service);
    }

    return 0;
}

So I think the problem in what is passed to:
glusterd_compare_addrinfo(struct addrinfo *first, struct addrinfo *next) --

--- Additional comment from Amgad on 2019-08-16 20:52:30 UTC ---

dig more in the glusterd-volume-ops.c file where the "glusterd_compare_addrinfo" function is called by "glusterd_check_brick_order",
The following code which prepares what is passed to "glusterd_compare_addrinfo", "getaddrinfo" doesn't seem to return the right address.


    brick_list_dup = brick_list_ptr = gf_strdup(brick_list);
    /* Resolve hostnames and get addrinfo */
    while (i < brick_count) {
        ++i;
        brick = strtok_r(brick_list_dup, " \n", &tmpptr);
        brick_list_dup = tmpptr;
        if (brick == NULL)
            goto check_failed;
        brick = strtok_r(brick, ":", &tmpptr);
        if (brick == NULL)
            goto check_failed;
        ret = getaddrinfo(brick, NULL, NULL, &ai_info);
        if (ret != 0) {
            ret = 0;
            gf_msg(this->name, GF_LOG_ERROR, 0, GD_MSG_HOSTNAME_RESOLVE_FAIL,
                   "unable to resolve "
                   "host name");
            goto out;
        }
        ai_list_tmp1 = MALLOC(sizeof(addrinfo_list_t));
        if (ai_list_tmp1 == NULL) {
            ret = 0;
            gf_msg(this->name, GF_LOG_ERROR, ENOMEM, GD_MSG_NO_MEMORY,
                   "failed to allocate "
                   "memory");
            freeaddrinfo(ai_info);
            goto out;
        }
        ai_list_tmp1->info = ai_info;
        cds_list_add_tail(&ai_list_tmp1->list, &ai_list->list);
        ai_list_tmp1 = NULL;
    }

I wrote a small program to call it and it always returns --> "0.0.0.0", so maybe that's why later the code assumes it's the same host.
It works though for IPv4. Also, have to loop thru the list to get the right address.


I'll dig more, but I hope that gives some direction to other developers to check

--- Additional comment from Amgad on 2019-08-28 19:55:02 UTC ---

Hi Amar /GlusterFS team

I was busy addressing other development issues - back to the this IPv6 one.
In this problem, the volume is created thru heketi and failed at the "glusterd-volume-ops.c" file when "glusterd_compare_addrinfo" is called.

In a different test (system is configured with pure IPv6), where volumes were generated using gluster CLI, the volumes are created at different servers, but "glustershd" failed to come up with the following 
errors:

[2019-08-28 19:11:36.645541] I [MSGID: 100030] [glusterfsd.c:2847:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 6.5 (args: /usr/sbin/glusterfs -s 2001:db8:1234::8 --volfile-id gluster/glustershd -p 
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/3a1e3977fd7318f2.socket --xlator-option *replicate*.node-uuid=8e2b40a7-098c-4f0a-b323-2e764bd315f3 --process-name glustershd --client-pid=-6)
[2019-08-28 19:11:36.646207] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 26375
[2019-08-28 19:11:36.655872] I [socket.c:902:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
[2019-08-28 19:11:36.656708] E [MSGID: 101075] [common-utils.c:508:gf_resolve_ip6] 0-resolver: getaddrinfo failed (family:2) (Address family for hostname not supported)
[2019-08-28 19:11:36.656730] E [name.c:258:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host 2001:db8:1234::8
[2019-08-28 19:11:36.658459] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-08-28 19:11:36.658744] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: 2001:db8:1234::8
[2019-08-28 19:11:36.658766] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2019-08-28 19:11:36.658832] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-08-28 19:11:36.659376] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(+0xf1d3) [0x7f61e883a1d3] -->/usr/sbin/glusterfs(+0x12fef) [0x5653bb1c9fef] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x5653bb1c201b] ) 0-: received signum (1), shutting down

It indcates that function "gf_resolve_ip6" in the "common-utils.c" is failing becuase of (family:2) -- since the IP is IPv6, the family should be 10, not 2 and thus it failed as family:2 not supported.
same for "af_inet_client_get_remote_sockaddr". 

Any suggestion what could be passing the family as "2" (IPv4) rather than "10" (IPv6)?

Regards,
Amgad

--- Additional comment from Amgad on 2019-08-28 21:46:03 UTC ---

GlusterFS team:

Can someone check urgently if "hints.ai_family" in these function calls is set to "AF_INET6" and not "AF_UNSPEC" to force version?

Regards,
Amgad

--- Additional comment from Amgad on 2019-08-30 03:15:12 UTC ---


*** I verified that "ai_family" passed in "af_inet_client_get_remote_sockaddr" to "gf_resolve_ip6" [rpc/rpc-transport/socket/src/name.c], is for IPv4 "2" and not IPv6 (should be "10"):

af_inet_client_get_remote_sockaddr(rpc_transport_t *this,
                                   struct sockaddr *sockaddr,
                                   socklen_t *sockaddr_len)
.......

    /* TODO: gf_resolve is a blocking call. kick in some
       non blocking dns techniques */
    ret = gf_resolve_ip6(remote_host, remote_port, sockaddr->sa_family,
                         &this->dnscache, &addr_info);
    gf_log(this->name, GF_LOG_ERROR, "CSTO-DEBUG: Family Address is %d", sockaddr->sa_family);              ==> my added debug msg

AND

*** in [libglusterfs/src/common-utils.c] where "gf_resolve_ip6" is defined where the IPv6 is passed to "getaddrinfo" as host name, and it failed because the ai_family is not right:

int32_t
gf_resolve_ip6(const char *hostname, uint16_t port, int family, void **dnscache,
               struct addrinfo **addr_info)
{
...
        if ((ret = getaddrinfo(hostname, port_str, &hints, &cache->first)) !=
            0) {
            gf_msg("resolver", GF_LOG_ERROR, 0, LG_MSG_GETADDRINFO_FAILED,
                   "getaddrinfo failed (family:%d) (%s)", family,
                   gai_strerror(ret));

            gf_msg("resolver", GF_LOG_ERROR, 0, LG_MSG_GETADDRINFO_FAILED,                                ==> my added debug msg
                   "CSTO-DEBUG: getaddrinfo failed (hostname:%s) (%s)", hostname,
                   gai_strerror(ret));

.........
/var/log/glusterfs/glustershd.log output:
.....
[2019-08-30 01:03:51.871225] E [MSGID: 101075] [common-utils.c:512:gf_resolve_ip6] 0-resolver: CSTO-DEBUG: getaddrinfo failed (hostname:2001:db8:1234::8) (Address family for hostname not supported)

[2019-08-30 01:03:51.871239] E [name.c:256:af_inet_client_get_remote_sockaddr] 0-glusterfs: CSTO-DEBUG: Family Address is 2 ==>
[2019-08-30 01:03:51.871249] E [name.c:260:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host 2001:db8:1234::8
........

That's why failed DNS resolution and caused glustershd not to come up.

--- Additional comment from Mohit Agrawal on 2019-08-30 08:59:40 UTC ---

Hi,

To enable ipv6 for gluster processes you need to change "transport.address-family" in /etc/glusterfs/glusterd.vol and restart glusterd

The issue has been fixed in upstream from the below patch
https://review.gluster.org/#/c/glusterfs/+/21948/

By default transport address family is inet and the value is commented in file /etc/glusterfs/glusterd.vol.

To enable ipv6 please change the value to inet6 and uncomment the line as below 

cat /etc/glusterfs/glusterd.vol 
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option transport.socket.listen-port 24007
    option ping-timeout 0
    option event-threads 1
#   option lock-timer 180
    option transport.address-family inet6
#   option base-port 49152
    option max-port  60999
end-volume


Thanks,
Mohit Agrawal

--- Additional comment from Amgad on 2019-08-30 13:43:24 UTC ---

Hi Mohit:

Our "/etc/glusterfs/glusterd.vol" is set with IPv6 - so this is not the case see below:

Regards,
Amgad

volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option transport.socket.listen-port 24007
    option transport.rdma.listen-port 24008
    option transport.address-family inet6
    option transport.socket.bind-address 2001:db8:1234::8
    option transport.tcp.bind-address 2001:db8:1234::8
    option transport.rdma.bind-address 2001:db8:1234::8
    option ping-timeout 0
    option event-threads 1
    option transport.listen-backlog 1024
#   option base-port 49152
end-volume

--- Additional comment from Mohit Agrawal on 2019-08-30 14:03:31 UTC ---

Hi,

Can you please share complete dump of /var/log/gluster directory.

Thanks,
Mohit Agrawal

--- Additional comment from Amgad on 2019-08-30 20:58:32 UTC ---

I'm attaching the tar file. I just reverted the private version with my debugging statements to 6.5-1. Keep in mind this was upgraded back and forth several times, so the logs have different versions, but the latest is 6.5-1

--- Additional comment from Amgad on 2019-08-30 21:00:59 UTC ---



--- Additional comment from Mohit Agrawal on 2019-08-31 03:06:04 UTC ---

Hi,

 As per currently shared logs it seems now you are facing a different issue, issue related to "DNS resolution failed" is resolved already.
 It seems earlier correct transport-type was not mentioned in volfile so brick was not coming up(throwing an error Address family not supported) but now brick is failing because brick is not able to connect with glusterd because glusterd is not up.

>>>>>>>>>>>>>>>>>>>>>>
.....
.....
[2019-08-30 01:03:41.480435] W [socket.c:721:__socket_rwv] 0-glusterfs: readv on 2001:db8:1234::8:24007 failed (No data available)
[2019-08-30 01:03:41.480554] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: ceph-cs-01.storage.bcmt.cluster.local
[2019-08-30 01:03:41.480573] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
.....
....
>>>>>>>>>>>>>>>>>>>>>>>>>>

I am seeing similar messages in other brick logs file also.
glusterd is not coming up because it is throwing an error "Address is already in use".

>>>>>>>>>>>>>>>>>

[2019-08-30 01:03:43.493787] I [socket.c:904:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 10
[2019-08-30 01:03:43.499501] I [MSGID: 106513] [glusterd-store.c:2394:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 60000
[2019-08-30 01:03:43.503539] I [MSGID: 106544] [glusterd.c:152:glusterd_uuid_init] 0-management: retrieved UUID: 8e2b40a7-098c-4f0a-b323-2e764bd315f3
[2019-08-30 01:03:43.855699] I [MSGID: 106498] [glusterd-handler.c:3687:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2019-08-30 01:03:43.860181] I [MSGID: 106498] [glusterd-handler.c:3687:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2019-08-30 01:03:43.860245] W [MSGID: 106061] [glusterd-handler.c:3490:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2019-08-30 01:03:43.860284] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-08-30 01:03:43.966588] E [name.c:256:af_inet_client_get_remote_sockaddr] 0-management: CSTO-DEBUG: Family Address is 10
[2019-08-30 01:03:43.967196] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-08-30 01:03:43.969757] E [name.c:256:af_inet_client_get_remote_sockaddr] 0-management: CSTO-DEBUG: Family Address is 10
[2019-08-30 01:03:44.681604] E [socket.c:923:__socket_server_bind] 0-socket.management: binding to  failed: Address already in use
[2019-08-30 01:03:44.681645] E [socket.c:925:__socket_server_bind] 0-socket.management: Port is already in use
[2019-08-30 01:03:45.681776] E [socket.c:923:__socket_server_bind] 0-socket.management: binding to  failed: Address already in use
[2019-08-30 01:03:45.681883] E [socket.c:925:__socket_server_bind] 0-socket.management: Port is already in use
[2019-08-30 01:03:46.681992] E [socket.c:923:__socket_server_bind] 0-socket.management: binding to  failed: Address already in use
[2019-08-30 01:03:46.682027] E [socket.c:925:__socket_server_bind] 0-socket.management: Port is already in use
[2019-08-30 01:03:47.682249] E [socket.c:925:__socket_server_bind] 0-socket.management: Port is already in use
[2019-08-30 01:03:47.682191] E [socket.c:923:__socket_server_bind] 0-socket.management: binding to  failed: Address already in use
[2019-08-30 01:03:43.967187] W [MSGID: 106061] [glusterd-handler.c:3490:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2019-08-30 01:03:48.598585] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f4af4b0fdd5] -->glusterd(glusterfs_sigwaiter+0xe5) [0x5584ef3131b5] -->glusterd(cleanup_and_exit+0x6b) [0x5584ef31301b] ) 0-: received signum (15), shutting down
>>>>>>>>>>>>>>>>>>>>>>

We have fixed the same in release-6 recently
https://review.gluster.org/#/c/glusterfs/+/23268/

Kindly apply this patch or install the build after merged this patch.

Regards,
Mohit Agrawal

--- Additional comment from Amgad on 2019-09-01 04:09:00 UTC ---

Hi Mohit:

The patch is already applied and glusterd is running. Glustershd is the one not running. I re-applied the latest rpm from the 6.x branch and verified the code change - I called it 6.5-1.7.el7.x86_64 (please check starting 2019-09-1 03:44:34 the newly loaded tar file)

Here's the glusterd status:
# systemctl status glusterd
â glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/etc/systemd/system/glusterd.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2019-09-01 03:57:43 UTC; 5min ago
  Process: 20220 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 20221 (glusterd)
    Tasks: 113
   Memory: 140.7M (limit: 3.8G)
   CGroup: /system.slice/glusterd.service
           ââ20221 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level ER...
           ââ29021 /usr/sbin/glusterfsd -s ceph-cs-01.storage.bcmt.cluster.lo...
           ââ29033 /usr/sbin/glusterfsd -s ceph-cs-01.storage.bcmt.cluster.lo...
           ââ29044 /usr/sbin/glusterfsd -s ceph-cs-01.storage.bcmt.cluster.lo...
           ââ29055 /usr/sbin/glusterfsd -s ceph-cs-01.storage.bcmt.cluster.lo...
           ââ29067 /usr/sbin/glusterfsd -s ceph-cs-01.storage.bcmt.cluster.lo...

Sep 01 03:57:42 ceph-cs-01 systemd[1]: Starting GlusterFS, a clustered file.....
Sep 01 03:57:43 ceph-cs-01 systemd[1]: Started GlusterFS, a clustered file-...r.
Hint: Some lines were ellipsized, use -l to show in full.

# gluster volume status
Status of volume: bcmt-glusterfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ceph-cs-02.storage.bcmt.cluster.local
:/data0/glusterfs                           49152     0          Y       2381 
Brick ceph-cs-03.storage.bcmt.cluster.local
:/data0/glusterfs                           49152     0          Y       3607 
Brick ceph-cs-01.storage.bcmt.cluster.local
:/data0/glusterfs                           49152     0          Y       29021
Self-heal Daemon on localhost               N/A       N/A        N       N/A  
Self-heal Daemon on ceph-cs-03.storage.bcmt
.cluster.local                              N/A       N/A        N       N/A  
Self-heal Daemon on ceph-cs-02              N/A       N/A        N       N/A  
 
.......

Regards,
Amgad

--- Additional comment from Amgad on 2019-09-01 04:10:42 UTC ---



--- Additional comment from Amgad on 2019-09-01 04:12:42 UTC ---

I did restart at 2019-09-01 03:57:43

--- Additional comment from Mohit Agrawal on 2019-09-01 05:49:40 UTC ---

Hi Amgad,

  Thanks for sharing the reply. I have checked the code. 
  We need to fix the issue to update transport-address family in case of client process.

  For as of now I can provide a workaround to start shd with ipv6.
  You can copy the existing argument from the shd log file and add a string --xlator-options 
  
  As per last shared logs last shd was spawned with below arguments
  
  /usr/sbin/glusterfs version 6.5 (args: /usr/sbin/glusterfs -s 2001:db8:1234::8 --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/3a1e3977fd7318f2.socket --xlator-option *replicate*.node-uuid=8e2b40a7-098c-4f0a-b323-2e764bd315f3 --process-name glustershd --client-pid=-6

  so start shd with enable ipv6 you can run like below after add --xlator-option in command-line 
  arguments

  >>>>>>>>>>>>>>>

  /usr/sbin/glusterfs -s 2001:db8:1234::8 --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/3a1e3977fd7318f2.socket --xlator-option *replicate*.node-uuid=8e2b40a7-098c-4f0a-b323-2e764bd315f3 --xlator-option transport.address-family=inet6 --process-name glustershd --client-pid=-6 

  >>>>>>>>>>>>.

  I will upload a patch to resolve the same. 
  Let me know if you can test patch in your environment.
 
Thanks,
Mohit Agrawal

--- Additional comment from Worker Ant on 2019-09-01 06:39:29 UTC ---

REVIEW: https://review.gluster.org/23340 (rpc: Update transport-address family if it is not provide in      command-line arguments) posted (#1) for review on master by MOHIT AGRAWAL

--- Additional comment from Worker Ant on 2019-09-04 14:22:34 UTC ---

REVIEW: https://review.gluster.org/23340 (rpc: Update address family if it is not provide in cmd-line arguments) merged (#2) on master by MOHIT AGRAWAL

--- Additional comment from Mohit Agrawal on 2019-09-05 06:09:08 UTC ---

We have fixed two issues from this bug
1) client(shd) was not started due to wrong address-family configured while ipv6 is enabled
Solution: Update transport address family INET6 for ipv6 host address from below patch
  https://review.gluster.org/23340
2) glusterd was not able to create volume for IPV6 hostname
Solution: Update the code to parse IPV6 host address correctly from below patch
   https://review.gluster.org/#/c/glusterfs/+/23341/

T

--- Additional comment from Worker Ant on 2019-09-05 06:14:36 UTC ---

REVIEW: https://review.gluster.org/23341 (glusterd: IPV6 hostname address is not parsed correctly) posted (#2) for review on master by MOHIT AGRAWAL

--- Additional comment from Worker Ant on 2019-09-06 04:37:29 UTC ---

REVIEW: https://review.gluster.org/23341 (glusterd: IPV6 hostname address is not parsed correctly) merged (#4) on master by Atin Mukherjee

Comment 1 Worker Ant 2020-02-25 13:12:33 UTC
REVIEW: https://review.gluster.org/24175 (rpc: Update address family if it is not provide in cmd-line arguments) posted (#2) for review on release-5 by MOHIT AGRAWAL

Comment 2 Worker Ant 2020-03-02 08:13:48 UTC
REVIEW: https://review.gluster.org/24175 (rpc: Update address family if it is not provide in cmd-line arguments) merged (#3) on release-5 by hari gowtham


Note You need to log in before you can comment on or make changes to this bug.