Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1524058

Summary: gluster peer command stops working with unhelpful error messages when DNS doens't work
Product: [Community] GlusterFS Reporter: nh2 <nh2-redhatbugzilla>
Component: glusterdAssignee: Vishal Pandey <vpandey>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bugs, moagrawa, srakonde
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-18 09:06:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description nh2 2017-12-09 17:44:38 UTC
Description of problem:

Gluster 3.12.3 on Linux.

Consider the following outputs. None of that makes any sense to me:


[root@node-1:~]# time gluster peer probe status
peer probe: success. Host status port 24007 already in peer list

real  0m10.060s


[root@node-1:~]# time gluster peer status
peer status: failed

real  0m0.051s


[root@node-1:~]# time gluster pool list
pool list: failed

real  0m0.050s


[root@node-1:~]# gluster peer probe 10.0.0.1
peer probe: success. Probe on localhost not needed

[root@node-1:~]# gluster peer probe 10.0.0.2
peer probe: success. Host 10.0.0.2 port 24007 already in peer list


[root@node-1:~]# gluster peer detach status
peer detach: failed: One of the peers is probably down. Check with 'peer status'


[root@node-1:~]# gluster peer status
peer status: failed



First, when I run `gluster peer probe status` (which is not a reasonable command, as it now thinks that `status` is a hostname), why does it say "peer probe: success. Host status port 24007 already in peer list"? That makes no sense, there is no host called "status" in my network.

Next, `gluster peer status` fails; the error message in extremely unhelpful "peer status: failed" as it contains no information on the failure.

Later probes of e.g. `10.0.0.1` suggest that there's already a working "peer list" with some contents, but apparently I have no way at all to list those peers.

When I try to detach the apparently-attached garbage peer called "status", I get told to run `peer status`, but it doesn't work.

What's going on here?

The glusterd log (/var/log/glusterfs/glusterd.log) gives some insight:

[2017-12-09 17:34:21.858454] I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2017-12-09 17:34:21.858517] W [dict.c:912:str_to_data] (-->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x104db4) [0x7f6f54fdadb4] -->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/libglusterfs.so.0(dict_set_str+0x16) [0x7f6f60919be6] -->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/libglusterfs.so.0(str_to_data+0x82) [0x7f6f60918122] ) 0-dict: value is NULL [Invalid argument]
[2017-12-09 17:34:23.103687] E [MSGID: 101075] [common-utils.c:320:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Temporary failure in name resolution)
[2017-12-09 17:34:23.103714] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host status

Looks like the real error is `getaddrinfo failed`, probably some DNS problem on my system.

So:

* Could `gluster peer status` tell me directly about this problem, instead of saying "failed"?
* Why do `gluster peer status` and `gluster pool list` fail if DNS doesn't work? I'd assume if there is a list of hosts, I should be able to view it, any time.
* What's going on with the weird success message of adding a non-existant host?
* What's up with `0-dict: value is NULL [Invalid argument]`?

Comment 1 nh2 2017-12-09 18:14:32 UTC
One way to reproduce this is to configure name servers in your system, but only those that are unreachable.

Comment 2 Atin Mukherjee 2019-07-08 04:26:42 UTC
I can confirm that with latest releases, this isn't reproducible. The only possibility I could think of here is that 'status' as a name was some how placed as a hostname in DNS config or etc/hosts because of which glusterd interpreted it to be a hostname instead of a bad cli command. I'd like to close the bug. If you can confirm that this hypothesis isn't true, please write back.

Comment 3 Vishal Pandey 2019-08-09 08:47:48 UTC
@nh2 Can you please try to reproduct this issue again on the latest releases ?

Comment 4 Vishal Pandey 2019-08-21 09:34:23 UTC
Can we reach to a decision on this issue ? The reporter has not yet addressed the needinfo and this issue has been in a dormant state for a long time now.

Comment 5 Vishal Pandey 2019-08-27 07:48:19 UTC
@nh2 Can you please try to reproduct this issue again on the latest releases ?

Comment 6 Vishal Pandey 2019-09-10 13:20:00 UTC
@nh2 Can you address the needinfo or else I will have to close the bug considering that its no more reproducible.

Comment 7 Vishal Pandey 2019-09-18 08:05:53 UTC
@nh2 Can you address the needinfo or else I will have to close the bug considering that its no more reproducible.

Comment 8 Sanju 2019-09-18 09:06:22 UTC
As it's no more reproducible, I'm closing the bug. Please feel free to reopen the bug, if the issue persists.

Thanks,
Sanju

Comment 9 Red Hat Bugzilla 2023-09-14 04:13:56 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days