Bug 1807785

Summary: seeing error message in glustershd.log on volume start(or may be as part of shd graph regeneration) inet_pton failed with return code 0 [Invalid argument]
Product: [Community] GlusterFS Reporter: Xavi Hernandez <jahernan>
Component: eventsapiAssignee: Xavi Hernandez <jahernan>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: 7CC: amukherj, avishwan, nchilaka, rhs-bugs, sanandpa, sheggodu, storage-qa-internal, vdas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1752330 Environment:
Last Closed: 2020-03-12 14:22:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1752330    
Bug Blocks:    

Description Xavi Hernandez 2020-02-27 09:19:04 UTC
+++ This bug was initially created as a clone of Bug #1752330 +++

+++ This bug was initially created as a clone of Bug #1732443 +++

Description of problem:
=======================
I am seeing below error message in shd.log when I start a new volume(or may be as part of shd graph regeneration event).
The message is not so helpful for an admin to understand what it means and hence can be a bit annoying



[2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event] 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]

###########

[2019-07-23 12:09:53.154339] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-distrep-client-7: changing port to 49153 (from 0)
[2019-07-23 12:09:53.154388] I [socket.c:871:__socket_shutdown] 0-distrep-client-7: intentional socket shutdown(14)
[2019-07-23 12:09:53.154870] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd1-client-2: changing port to 49154 (from 0)
[2019-07-23 12:09:53.155279] I [socket.c:871:__socket_shutdown] 0-shd1-client-2: intentional socket shutdown(24)
[2019-07-23 12:09:53.160810] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd2-client-0: changing port to 49155 (from 0)
[2019-07-23 12:09:53.160885] I [socket.c:871:__socket_shutdown] 0-shd2-client-0: intentional socket shutdown(16)
[2019-07-23 12:09:53.161088] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd2-client-1: changing port to 49155 (from 0)
[2019-07-23 12:09:53.161208] I [socket.c:871:__socket_shutdown] 0-shd2-client-1: intentional socket shutdown(26)
[2019-07-23 12:09:53.165383] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd2-client-2: changing port to 49155 (from 0)
[2019-07-23 12:09:53.165715] I [socket.c:871:__socket_shutdown] 0-shd2-client-2: intentional socket shutdown(22)
[2019-07-23 12:09:53.168402] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-distrep-client-6: Connected to distrep-client-6, attached to remote volume '/gluster/brick2/distrep-sv3'.
[2019-07-23 12:09:53.168455] I [MSGID: 108002] [afr-common.c:5589:afr_notify] 0-distrep-replicate-2: Client-quorum is met
[2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event] 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.172515] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd1-client-1: Connected to shd1-client-1, attached to remote volume '/gluster/brick1/shd1'.
[2019-07-23 12:09:53.172559] I [MSGID: 108002] [afr-common.c:5589:afr_notify] 0-shd1-replicate-0: Client-quorum is met
[2019-07-23 12:09:53.173251] E [MSGID: 101172] [events.c:89:_gf_event] 0-shd1-replicate-0: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.173591] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-distrep-client-7: Connected to distrep-client-7, attached to remote volume '/gluster/brick2/distrep-sv3'.
[2019-07-23 12:09:53.175599] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd1-client-2: Connected to shd1-client-2, attached to remote volume '/gluster/brick1/shd1'.
[2019-07-23 12:09:53.179349] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd2-client-1: Connected to shd2-client-1, attached to remote volume '/gluster/brick1/shd2'.
[2019-07-23 12:09:53.179391] I [MSGID: 108005] [afr-common.c:5235:__afr_handle_child_up_event] 0-shd2-replicate-0: Subvolume 'shd2-client-1' came back up; going online.
[2019-07-23 12:09:53.179656] E [MSGID: 101172] [events.c:89:_gf_event] 0-shd2-replicate-0: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.180373] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd2-client-0: Connected to shd2-client-0, attached to remote volume '/gluster/brick1/shd2'.
[2019-07-23 12:09:53.180422] I [MSGID: 108002] [afr-common.c:5589:afr_notify] 0-shd2-replicate-0: Client-quorum is met
[2019-07-23 12:09:53.180698] E [MSGID: 101172] [events.c:89:_gf_event] 0-shd2-replicate-0: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.181257] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd2-client-2: Connected to shd2-client-2, attached to remote volume '/gluster/brick1/shd2'.

Version-Release number of selected component (if applicable):
==============
6.0.8 (shd-mux is reverted)

How reproducible:
================
always

Steps to Reproduce:
1.create a 2x3 volume and start it
2. create a new 1x3 volume and start it 


Actual results:
=============
glustershd.log has above error message logged when we start a new volume

Expected results:
===================
either the message can be removed if it is not of much value, or else it can be rephrased to something more meaningful


Additional info:
==============
not attaching logs as it is straightforward to reproduce

--- Additional comment from Vivek Das on 2019-09-10 09:47:33 UTC ---

This is marked blocker based on 'Regression' keyword.

I am seeing the error messages in samba client logs as well.
We have not seen functional impact as of now. But would like to know what introduced this in rhgs-3.5.0. An error message is what we need to avoid to be seen by customers.

Sunil your take on this please.

--- Additional comment from Atin Mukherjee on 2019-09-12 02:33:11 UTC ---

From _gf_event () :

   /* Socket Configurations */                                                 
    server.sin_family = AF_INET;                                                
    server.sin_port = htons(EVENT_PORT);                                        
    ret = inet_pton(server.sin_family, host, &server.sin_addr);                                                                                                  
    if (ret <= 0) {                                                             
        gf_msg(this->name, GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,            
               "inet_pton failed with return code %d", ret);                    
        goto out;                                                               
    }             


RETURN VALUE 
       inet_pton() returns 1 on success (network address was successfully
       converted).  0 is returned if src does not contain a character string
       representing a valid network address in the specified address family.
       If af does not contain a valid address family, -1 is returned and
       errno is set to EAFNOSUPPORT.

[2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event] 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]

So based on the above log it looks like in certain cases host is NULL and that's why we're bailing out. This seems like a wrong functionality?

Aravinda - could you take a look at this?

--- Additional comment from Aravinda VK on 2019-09-12 05:47:06 UTC ---

(In reply to Atin Mukherjee from comment #5)
> From _gf_event () :
> 
>    /* Socket Configurations */                                              
> 
>     server.sin_family = AF_INET;                                            
> 
>     server.sin_port = htons(EVENT_PORT);                                    
> 
>     ret = inet_pton(server.sin_family, host, &server.sin_addr);             
> 
>     if (ret <= 0) {                                                         
> 
>         gf_msg(this->name, GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,        
> 
>                "inet_pton failed with return code %d", ret);                
> 
>         goto out;                                                           
> 
>     }             
> 
> 
> RETURN VALUE 
>        inet_pton() returns 1 on success (network address was successfully
>        converted).  0 is returned if src does not contain a character string
>        representing a valid network address in the specified address family.
>        If af does not contain a valid address family, -1 is returned and
>        errno is set to EAFNOSUPPORT.
> 
> [2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event]
> 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]
> 
> So based on the above log it looks like in certain cases host is NULL and
> that's why we're bailing out. This seems like a wrong functionality?
> 
> Aravinda - could you take a look at this?

I will look into this issue today.

--- Additional comment from Aravinda VK on 2019-09-12 07:26:20 UTC ---

`-s localhost` passed in the argument is converted to `::1`. But `server.sin_family = AF_INET;` is used instead of `AF_INET6`. Due to this `inet_pton` is failing.

Example `ps ax | grep shd`

13449 ?        Ssl    0:00 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv1 -p /var/run/gluster/shd/gv1/gv1-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/5d6ae69815eaaafd.socket --xlator-option *replicate*.node-uuid=855ed18b-5c6f-4889-b523-5b7864d2a842 --process-name glustershd --client-pid=-6

I will explore and update the proper fix to handle this issue.

--- Additional comment from Atin Mukherjee on 2019-09-16 02:29:07 UTC ---

(In reply to Aravinda VK from comment #7)
> `-s localhost` passed in the argument is converted to `::1`. But
> `server.sin_family = AF_INET;` is used instead of `AF_INET6`. Due to this
> `inet_pton` is failing.
> 
> Example `ps ax | grep shd`
> 
> 13449 ?        Ssl    0:00 /usr/sbin/glusterfs -s localhost --volfile-id
> shd/gv1 -p /var/run/gluster/shd/gv1/gv1-shd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/5d6ae69815eaaafd.socket --xlator-option
> *replicate*.node-uuid=855ed18b-5c6f-4889-b523-5b7864d2a842 --process-name
> glustershd --client-pid=-6
> 
> I will explore and update the proper fix to handle this issue.

Aravinda - IIUC, the impact would be that we'd miss to generate event in such cases given gf_event API is bailing out, correct? And if so then this should be a blocker?

Comment 1 Worker Ant 2020-02-27 09:27:28 UTC
REVIEW: https://review.gluster.org/24180 (eventsapi: Set IPv4/IPv6 family based on input IP) posted (#1) for review on release-7 by Xavi Hernandez

Comment 2 Worker Ant 2020-03-12 14:22:39 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1031, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 3 Worker Ant 2020-03-16 08:11:42 UTC
REVIEW: https://review.gluster.org/24180 (eventsapi: Set IPv4/IPv6 family based on input IP) merged (#2) on release-7 by Rinku Kothiya