Bug 1807785 - seeing error message in glustershd.log on volume start(or may be as part of shd graph regeneration) inet_pton failed with return code 0 [Invalid argument]
Summary: seeing error message in glustershd.log on volume start(or may be as part of s...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: eventsapi
Version: 7
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
Depends On: 1752330
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-27 09:19 UTC by Xavi Hernandez
Modified: 2020-03-16 08:11 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1752330
Environment:
Last Closed: 2020-03-12 14:22:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 24180 0 None Merged eventsapi: Set IPv4/IPv6 family based on input IP 2020-03-16 08:11:40 UTC

Description Xavi Hernandez 2020-02-27 09:19:04 UTC
+++ This bug was initially created as a clone of Bug #1752330 +++

+++ This bug was initially created as a clone of Bug #1732443 +++

Description of problem:
=======================
I am seeing below error message in shd.log when I start a new volume(or may be as part of shd graph regeneration event).
The message is not so helpful for an admin to understand what it means and hence can be a bit annoying



[2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event] 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]

###########

[2019-07-23 12:09:53.154339] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-distrep-client-7: changing port to 49153 (from 0)
[2019-07-23 12:09:53.154388] I [socket.c:871:__socket_shutdown] 0-distrep-client-7: intentional socket shutdown(14)
[2019-07-23 12:09:53.154870] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd1-client-2: changing port to 49154 (from 0)
[2019-07-23 12:09:53.155279] I [socket.c:871:__socket_shutdown] 0-shd1-client-2: intentional socket shutdown(24)
[2019-07-23 12:09:53.160810] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd2-client-0: changing port to 49155 (from 0)
[2019-07-23 12:09:53.160885] I [socket.c:871:__socket_shutdown] 0-shd2-client-0: intentional socket shutdown(16)
[2019-07-23 12:09:53.161088] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd2-client-1: changing port to 49155 (from 0)
[2019-07-23 12:09:53.161208] I [socket.c:871:__socket_shutdown] 0-shd2-client-1: intentional socket shutdown(26)
[2019-07-23 12:09:53.165383] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd2-client-2: changing port to 49155 (from 0)
[2019-07-23 12:09:53.165715] I [socket.c:871:__socket_shutdown] 0-shd2-client-2: intentional socket shutdown(22)
[2019-07-23 12:09:53.168402] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-distrep-client-6: Connected to distrep-client-6, attached to remote volume '/gluster/brick2/distrep-sv3'.
[2019-07-23 12:09:53.168455] I [MSGID: 108002] [afr-common.c:5589:afr_notify] 0-distrep-replicate-2: Client-quorum is met
[2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event] 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.172515] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd1-client-1: Connected to shd1-client-1, attached to remote volume '/gluster/brick1/shd1'.
[2019-07-23 12:09:53.172559] I [MSGID: 108002] [afr-common.c:5589:afr_notify] 0-shd1-replicate-0: Client-quorum is met
[2019-07-23 12:09:53.173251] E [MSGID: 101172] [events.c:89:_gf_event] 0-shd1-replicate-0: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.173591] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-distrep-client-7: Connected to distrep-client-7, attached to remote volume '/gluster/brick2/distrep-sv3'.
[2019-07-23 12:09:53.175599] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd1-client-2: Connected to shd1-client-2, attached to remote volume '/gluster/brick1/shd1'.
[2019-07-23 12:09:53.179349] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd2-client-1: Connected to shd2-client-1, attached to remote volume '/gluster/brick1/shd2'.
[2019-07-23 12:09:53.179391] I [MSGID: 108005] [afr-common.c:5235:__afr_handle_child_up_event] 0-shd2-replicate-0: Subvolume 'shd2-client-1' came back up; going online.
[2019-07-23 12:09:53.179656] E [MSGID: 101172] [events.c:89:_gf_event] 0-shd2-replicate-0: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.180373] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd2-client-0: Connected to shd2-client-0, attached to remote volume '/gluster/brick1/shd2'.
[2019-07-23 12:09:53.180422] I [MSGID: 108002] [afr-common.c:5589:afr_notify] 0-shd2-replicate-0: Client-quorum is met
[2019-07-23 12:09:53.180698] E [MSGID: 101172] [events.c:89:_gf_event] 0-shd2-replicate-0: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.181257] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd2-client-2: Connected to shd2-client-2, attached to remote volume '/gluster/brick1/shd2'.

Version-Release number of selected component (if applicable):
==============
6.0.8 (shd-mux is reverted)

How reproducible:
================
always

Steps to Reproduce:
1.create a 2x3 volume and start it
2. create a new 1x3 volume and start it 


Actual results:
=============
glustershd.log has above error message logged when we start a new volume

Expected results:
===================
either the message can be removed if it is not of much value, or else it can be rephrased to something more meaningful


Additional info:
==============
not attaching logs as it is straightforward to reproduce

--- Additional comment from Vivek Das on 2019-09-10 09:47:33 UTC ---

This is marked blocker based on 'Regression' keyword.

I am seeing the error messages in samba client logs as well.
We have not seen functional impact as of now. But would like to know what introduced this in rhgs-3.5.0. An error message is what we need to avoid to be seen by customers.

Sunil your take on this please.

--- Additional comment from Atin Mukherjee on 2019-09-12 02:33:11 UTC ---

From _gf_event () :

   /* Socket Configurations */                                                 
    server.sin_family = AF_INET;                                                
    server.sin_port = htons(EVENT_PORT);                                        
    ret = inet_pton(server.sin_family, host, &server.sin_addr);                                                                                                  
    if (ret <= 0) {                                                             
        gf_msg(this->name, GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,            
               "inet_pton failed with return code %d", ret);                    
        goto out;                                                               
    }             


RETURN VALUE 
       inet_pton() returns 1 on success (network address was successfully
       converted).  0 is returned if src does not contain a character string
       representing a valid network address in the specified address family.
       If af does not contain a valid address family, -1 is returned and
       errno is set to EAFNOSUPPORT.

[2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event] 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]

So based on the above log it looks like in certain cases host is NULL and that's why we're bailing out. This seems like a wrong functionality?

Aravinda - could you take a look at this?

--- Additional comment from Aravinda VK on 2019-09-12 05:47:06 UTC ---

(In reply to Atin Mukherjee from comment #5)
> From _gf_event () :
> 
>    /* Socket Configurations */                                              
> 
>     server.sin_family = AF_INET;                                            
> 
>     server.sin_port = htons(EVENT_PORT);                                    
> 
>     ret = inet_pton(server.sin_family, host, &server.sin_addr);             
> 
>     if (ret <= 0) {                                                         
> 
>         gf_msg(this->name, GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,        
> 
>                "inet_pton failed with return code %d", ret);                
> 
>         goto out;                                                           
> 
>     }             
> 
> 
> RETURN VALUE 
>        inet_pton() returns 1 on success (network address was successfully
>        converted).  0 is returned if src does not contain a character string
>        representing a valid network address in the specified address family.
>        If af does not contain a valid address family, -1 is returned and
>        errno is set to EAFNOSUPPORT.
> 
> [2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event]
> 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]
> 
> So based on the above log it looks like in certain cases host is NULL and
> that's why we're bailing out. This seems like a wrong functionality?
> 
> Aravinda - could you take a look at this?

I will look into this issue today.

--- Additional comment from Aravinda VK on 2019-09-12 07:26:20 UTC ---

`-s localhost` passed in the argument is converted to `::1`. But `server.sin_family = AF_INET;` is used instead of `AF_INET6`. Due to this `inet_pton` is failing.

Example `ps ax | grep shd`

13449 ?        Ssl    0:00 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv1 -p /var/run/gluster/shd/gv1/gv1-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/5d6ae69815eaaafd.socket --xlator-option *replicate*.node-uuid=855ed18b-5c6f-4889-b523-5b7864d2a842 --process-name glustershd --client-pid=-6

I will explore and update the proper fix to handle this issue.

--- Additional comment from Atin Mukherjee on 2019-09-16 02:29:07 UTC ---

(In reply to Aravinda VK from comment #7)
> `-s localhost` passed in the argument is converted to `::1`. But
> `server.sin_family = AF_INET;` is used instead of `AF_INET6`. Due to this
> `inet_pton` is failing.
> 
> Example `ps ax | grep shd`
> 
> 13449 ?        Ssl    0:00 /usr/sbin/glusterfs -s localhost --volfile-id
> shd/gv1 -p /var/run/gluster/shd/gv1/gv1-shd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/5d6ae69815eaaafd.socket --xlator-option
> *replicate*.node-uuid=855ed18b-5c6f-4889-b523-5b7864d2a842 --process-name
> glustershd --client-pid=-6
> 
> I will explore and update the proper fix to handle this issue.

Aravinda - IIUC, the impact would be that we'd miss to generate event in such cases given gf_event API is bailing out, correct? And if so then this should be a blocker?

Comment 1 Worker Ant 2020-02-27 09:27:28 UTC
REVIEW: https://review.gluster.org/24180 (eventsapi: Set IPv4/IPv6 family based on input IP) posted (#1) for review on release-7 by Xavi Hernandez

Comment 2 Worker Ant 2020-03-12 14:22:39 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1031, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 3 Worker Ant 2020-03-16 08:11:42 UTC
REVIEW: https://review.gluster.org/24180 (eventsapi: Set IPv4/IPv6 family based on input IP) merged (#2) on release-7 by Rinku Kothiya


Note You need to log in before you can comment on or make changes to this bug.