Bug 1732443 - seeing error message in glustershd.log on volume start(or may be as part of shd graph regeneration) inet_pton failed with return code 0 [Invalid argument]
Summary: seeing error message in glustershd.log on volume start(or may be as part of s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: eventsapi
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: RHGS 3.5.0
Assignee: Aravinda VK
QA Contact: Vivek Das
URL:
Whiteboard:
Depends On:
Blocks: 1696809 1752330
TreeView+ depends on / blocked
 
Reported: 2019-07-23 12:15 UTC by Nag Pavan Chilakam
Modified: 2020-02-27 09:21 UTC (History)
6 users (show)

Fixed In Version: glusterfs-6.0-14
Doc Type: Bug Fix
Doc Text:
Previously, the network family was not set correctly during events socket initialization. This resulted in an invalid argument error and meant that events were not sent to consumers. Network family is now set correctly and events work as expected.
Clone Of:
: 1752330 (view as bug list)
Environment:
Last Closed: 2019-10-30 12:22:31 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3249 0 None None None 2019-10-30 12:22:54 UTC

Description Nag Pavan Chilakam 2019-07-23 12:15:57 UTC
Description of problem:
=======================
I am seeing below error message in shd.log when I start a new volume(or may be as part of shd graph regeneration event).
The message is not so helpful for an admin to understand what it means and hence can be a bit annoying



[2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event] 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]

###########

[2019-07-23 12:09:53.154339] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-distrep-client-7: changing port to 49153 (from 0)
[2019-07-23 12:09:53.154388] I [socket.c:871:__socket_shutdown] 0-distrep-client-7: intentional socket shutdown(14)
[2019-07-23 12:09:53.154870] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd1-client-2: changing port to 49154 (from 0)
[2019-07-23 12:09:53.155279] I [socket.c:871:__socket_shutdown] 0-shd1-client-2: intentional socket shutdown(24)
[2019-07-23 12:09:53.160810] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd2-client-0: changing port to 49155 (from 0)
[2019-07-23 12:09:53.160885] I [socket.c:871:__socket_shutdown] 0-shd2-client-0: intentional socket shutdown(16)
[2019-07-23 12:09:53.161088] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd2-client-1: changing port to 49155 (from 0)
[2019-07-23 12:09:53.161208] I [socket.c:871:__socket_shutdown] 0-shd2-client-1: intentional socket shutdown(26)
[2019-07-23 12:09:53.165383] I [rpc-clnt.c:2035:rpc_clnt_reconfig] 0-shd2-client-2: changing port to 49155 (from 0)
[2019-07-23 12:09:53.165715] I [socket.c:871:__socket_shutdown] 0-shd2-client-2: intentional socket shutdown(22)
[2019-07-23 12:09:53.168402] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-distrep-client-6: Connected to distrep-client-6, attached to remote volume '/gluster/brick2/distrep-sv3'.
[2019-07-23 12:09:53.168455] I [MSGID: 108002] [afr-common.c:5589:afr_notify] 0-distrep-replicate-2: Client-quorum is met
[2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event] 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.172515] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd1-client-1: Connected to shd1-client-1, attached to remote volume '/gluster/brick1/shd1'.
[2019-07-23 12:09:53.172559] I [MSGID: 108002] [afr-common.c:5589:afr_notify] 0-shd1-replicate-0: Client-quorum is met
[2019-07-23 12:09:53.173251] E [MSGID: 101172] [events.c:89:_gf_event] 0-shd1-replicate-0: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.173591] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-distrep-client-7: Connected to distrep-client-7, attached to remote volume '/gluster/brick2/distrep-sv3'.
[2019-07-23 12:09:53.175599] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd1-client-2: Connected to shd1-client-2, attached to remote volume '/gluster/brick1/shd1'.
[2019-07-23 12:09:53.179349] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd2-client-1: Connected to shd2-client-1, attached to remote volume '/gluster/brick1/shd2'.
[2019-07-23 12:09:53.179391] I [MSGID: 108005] [afr-common.c:5235:__afr_handle_child_up_event] 0-shd2-replicate-0: Subvolume 'shd2-client-1' came back up; going online.
[2019-07-23 12:09:53.179656] E [MSGID: 101172] [events.c:89:_gf_event] 0-shd2-replicate-0: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.180373] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd2-client-0: Connected to shd2-client-0, attached to remote volume '/gluster/brick1/shd2'.
[2019-07-23 12:09:53.180422] I [MSGID: 108002] [afr-common.c:5589:afr_notify] 0-shd2-replicate-0: Client-quorum is met
[2019-07-23 12:09:53.180698] E [MSGID: 101172] [events.c:89:_gf_event] 0-shd2-replicate-0: inet_pton failed with return code 0 [Invalid argument]
[2019-07-23 12:09:53.181257] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-shd2-client-2: Connected to shd2-client-2, attached to remote volume '/gluster/brick1/shd2'.

Version-Release number of selected component (if applicable):
==============
6.0.8 (shd-mux is reverted)

How reproducible:
================
always

Steps to Reproduce:
1.create a 2x3 volume and start it
2. create a new 1x3 volume and start it 


Actual results:
=============
glustershd.log has above error message logged when we start a new volume

Expected results:
===================
either the message can be removed if it is not of much value, or else it can be rephrased to something more meaningful


Additional info:
==============
not attaching logs as it is straightforward to reproduce

Comment 5 Atin Mukherjee 2019-09-12 02:33:11 UTC
From _gf_event () :

   /* Socket Configurations */                                                 
    server.sin_family = AF_INET;                                                
    server.sin_port = htons(EVENT_PORT);                                        
    ret = inet_pton(server.sin_family, host, &server.sin_addr);                                                                                                  
    if (ret <= 0) {                                                             
        gf_msg(this->name, GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,            
               "inet_pton failed with return code %d", ret);                    
        goto out;                                                               
    }             


RETURN VALUE 
       inet_pton() returns 1 on success (network address was successfully
       converted).  0 is returned if src does not contain a character string
       representing a valid network address in the specified address family.
       If af does not contain a valid address family, -1 is returned and
       errno is set to EAFNOSUPPORT.

[2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event] 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]

So based on the above log it looks like in certain cases host is NULL and that's why we're bailing out. This seems like a wrong functionality?

Aravinda - could you take a look at this?

Comment 6 Aravinda VK 2019-09-12 05:47:06 UTC
(In reply to Atin Mukherjee from comment #5)
> From _gf_event () :
> 
>    /* Socket Configurations */                                              
> 
>     server.sin_family = AF_INET;                                            
> 
>     server.sin_port = htons(EVENT_PORT);                                    
> 
>     ret = inet_pton(server.sin_family, host, &server.sin_addr);             
> 
>     if (ret <= 0) {                                                         
> 
>         gf_msg(this->name, GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,        
> 
>                "inet_pton failed with return code %d", ret);                
> 
>         goto out;                                                           
> 
>     }             
> 
> 
> RETURN VALUE 
>        inet_pton() returns 1 on success (network address was successfully
>        converted).  0 is returned if src does not contain a character string
>        representing a valid network address in the specified address family.
>        If af does not contain a valid address family, -1 is returned and
>        errno is set to EAFNOSUPPORT.
> 
> [2019-07-23 12:09:53.168761] E [MSGID: 101172] [events.c:89:_gf_event]
> 0-distrep-replicate-2: inet_pton failed with return code 0 [Invalid argument]
> 
> So based on the above log it looks like in certain cases host is NULL and
> that's why we're bailing out. This seems like a wrong functionality?
> 
> Aravinda - could you take a look at this?

I will look into this issue today.

Comment 7 Aravinda VK 2019-09-12 07:26:20 UTC
`-s localhost` passed in the argument is converted to `::1`. But `server.sin_family = AF_INET;` is used instead of `AF_INET6`. Due to this `inet_pton` is failing.

Example `ps ax | grep shd`

13449 ?        Ssl    0:00 /usr/sbin/glusterfs -s localhost --volfile-id shd/gv1 -p /var/run/gluster/shd/gv1/gv1-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/5d6ae69815eaaafd.socket --xlator-option *replicate*.node-uuid=855ed18b-5c6f-4889-b523-5b7864d2a842 --process-name glustershd --client-pid=-6

I will explore and update the proper fix to handle this issue.

Comment 8 Atin Mukherjee 2019-09-16 02:29:07 UTC
(In reply to Aravinda VK from comment #7)
> `-s localhost` passed in the argument is converted to `::1`. But
> `server.sin_family = AF_INET;` is used instead of `AF_INET6`. Due to this
> `inet_pton` is failing.
> 
> Example `ps ax | grep shd`
> 
> 13449 ?        Ssl    0:00 /usr/sbin/glusterfs -s localhost --volfile-id
> shd/gv1 -p /var/run/gluster/shd/gv1/gv1-shd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/5d6ae69815eaaafd.socket --xlator-option
> *replicate*.node-uuid=855ed18b-5c6f-4889-b523-5b7864d2a842 --process-name
> glustershd --client-pid=-6
> 
> I will explore and update the proper fix to handle this issue.

Aravinda - IIUC, the impact would be that we'd miss to generate event in such cases given gf_event API is bailing out, correct? And if so then this should be a blocker?

Comment 9 Atin Mukherjee 2019-09-16 04:43:39 UTC
upstream patch : https://review.gluster.org/23423

Comment 14 Vivek Das 2019-10-13 08:33:40 UTC
I am not seeing this error messages any more after upgrading to the fixed in version.

Comment 17 errata-xmlrpc 2019-10-30 12:22:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249


Note You need to log in before you can comment on or make changes to this bug.