Bug 1524058 - gluster peer command stops working with unhelpful error messages when DNS doens't work
Summary: gluster peer command stops working with unhelpful error messages when DNS doe...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Vishal Pandey
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-09 17:44 UTC by nh2
Modified: 2023-09-14 04:13 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-09-18 09:06:22 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description nh2 2017-12-09 17:44:38 UTC
Description of problem:

Gluster 3.12.3 on Linux.

Consider the following outputs. None of that makes any sense to me:


[root@node-1:~]# time gluster peer probe status
peer probe: success. Host status port 24007 already in peer list

real  0m10.060s


[root@node-1:~]# time gluster peer status
peer status: failed

real  0m0.051s


[root@node-1:~]# time gluster pool list
pool list: failed

real  0m0.050s


[root@node-1:~]# gluster peer probe 10.0.0.1
peer probe: success. Probe on localhost not needed

[root@node-1:~]# gluster peer probe 10.0.0.2
peer probe: success. Host 10.0.0.2 port 24007 already in peer list


[root@node-1:~]# gluster peer detach status
peer detach: failed: One of the peers is probably down. Check with 'peer status'


[root@node-1:~]# gluster peer status
peer status: failed



First, when I run `gluster peer probe status` (which is not a reasonable command, as it now thinks that `status` is a hostname), why does it say "peer probe: success. Host status port 24007 already in peer list"? That makes no sense, there is no host called "status" in my network.

Next, `gluster peer status` fails; the error message in extremely unhelpful "peer status: failed" as it contains no information on the failure.

Later probes of e.g. `10.0.0.1` suggest that there's already a working "peer list" with some contents, but apparently I have no way at all to list those peers.

When I try to detach the apparently-attached garbage peer called "status", I get told to run `peer status`, but it doesn't work.

What's going on here?

The glusterd log (/var/log/glusterfs/glusterd.log) gives some insight:

[2017-12-09 17:34:21.858454] I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2017-12-09 17:34:21.858517] W [dict.c:912:str_to_data] (-->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x104db4) [0x7f6f54fdadb4] -->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/libglusterfs.so.0(dict_set_str+0x16) [0x7f6f60919be6] -->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/libglusterfs.so.0(str_to_data+0x82) [0x7f6f60918122] ) 0-dict: value is NULL [Invalid argument]
[2017-12-09 17:34:23.103687] E [MSGID: 101075] [common-utils.c:320:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Temporary failure in name resolution)
[2017-12-09 17:34:23.103714] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host status

Looks like the real error is `getaddrinfo failed`, probably some DNS problem on my system.

So:

* Could `gluster peer status` tell me directly about this problem, instead of saying "failed"?
* Why do `gluster peer status` and `gluster pool list` fail if DNS doesn't work? I'd assume if there is a list of hosts, I should be able to view it, any time.
* What's going on with the weird success message of adding a non-existant host?
* What's up with `0-dict: value is NULL [Invalid argument]`?

Comment 1 nh2 2017-12-09 18:14:32 UTC
One way to reproduce this is to configure name servers in your system, but only those that are unreachable.

Comment 2 Atin Mukherjee 2019-07-08 04:26:42 UTC
I can confirm that with latest releases, this isn't reproducible. The only possibility I could think of here is that 'status' as a name was some how placed as a hostname in DNS config or etc/hosts because of which glusterd interpreted it to be a hostname instead of a bad cli command. I'd like to close the bug. If you can confirm that this hypothesis isn't true, please write back.

Comment 3 Vishal Pandey 2019-08-09 08:47:48 UTC
@nh2 Can you please try to reproduct this issue again on the latest releases ?

Comment 4 Vishal Pandey 2019-08-21 09:34:23 UTC
Can we reach to a decision on this issue ? The reporter has not yet addressed the needinfo and this issue has been in a dormant state for a long time now.

Comment 5 Vishal Pandey 2019-08-27 07:48:19 UTC
@nh2 Can you please try to reproduct this issue again on the latest releases ?

Comment 6 Vishal Pandey 2019-09-10 13:20:00 UTC
@nh2 Can you address the needinfo or else I will have to close the bug considering that its no more reproducible.

Comment 7 Vishal Pandey 2019-09-18 08:05:53 UTC
@nh2 Can you address the needinfo or else I will have to close the bug considering that its no more reproducible.

Comment 8 Sanju 2019-09-18 09:06:22 UTC
As it's no more reproducible, I'm closing the bug. Please feel free to reopen the bug, if the issue persists.

Thanks,
Sanju

Comment 9 Red Hat Bugzilla 2023-09-14 04:13:56 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.