Description of problem: Gluster 3.12.3 on Linux. Consider the following outputs. None of that makes any sense to me: [root@node-1:~]# time gluster peer probe status peer probe: success. Host status port 24007 already in peer list real 0m10.060s [root@node-1:~]# time gluster peer status peer status: failed real 0m0.051s [root@node-1:~]# time gluster pool list pool list: failed real 0m0.050s [root@node-1:~]# gluster peer probe 10.0.0.1 peer probe: success. Probe on localhost not needed [root@node-1:~]# gluster peer probe 10.0.0.2 peer probe: success. Host 10.0.0.2 port 24007 already in peer list [root@node-1:~]# gluster peer detach status peer detach: failed: One of the peers is probably down. Check with 'peer status' [root@node-1:~]# gluster peer status peer status: failed First, when I run `gluster peer probe status` (which is not a reasonable command, as it now thinks that `status` is a hostname), why does it say "peer probe: success. Host status port 24007 already in peer list"? That makes no sense, there is no host called "status" in my network. Next, `gluster peer status` fails; the error message in extremely unhelpful "peer status: failed" as it contains no information on the failure. Later probes of e.g. `10.0.0.1` suggest that there's already a working "peer list" with some contents, but apparently I have no way at all to list those peers. When I try to detach the apparently-attached garbage peer called "status", I get told to run `peer status`, but it doesn't work. What's going on here? The glusterd log (/var/log/glusterfs/glusterd.log) gives some insight: [2017-12-09 17:34:21.858454] I [MSGID: 106487] [glusterd-handler.c:1485:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2017-12-09 17:34:21.858517] W [dict.c:912:str_to_data] (-->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/glusterfs/3.12.3/xlator/mgmt/glusterd.so(+0x104db4) [0x7f6f54fdadb4] -->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/libglusterfs.so.0(dict_set_str+0x16) [0x7f6f60919be6] -->/nix/store/y9qg9jan88wnsszmb1badhyfak2znpz7-glusterfs-3.12.3/lib/libglusterfs.so.0(str_to_data+0x82) [0x7f6f60918122] ) 0-dict: value is NULL [Invalid argument] [2017-12-09 17:34:23.103687] E [MSGID: 101075] [common-utils.c:320:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Temporary failure in name resolution) [2017-12-09 17:34:23.103714] E [name.c:267:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host status Looks like the real error is `getaddrinfo failed`, probably some DNS problem on my system. So: * Could `gluster peer status` tell me directly about this problem, instead of saying "failed"? * Why do `gluster peer status` and `gluster pool list` fail if DNS doesn't work? I'd assume if there is a list of hosts, I should be able to view it, any time. * What's going on with the weird success message of adding a non-existant host? * What's up with `0-dict: value is NULL [Invalid argument]`?
One way to reproduce this is to configure name servers in your system, but only those that are unreachable.
I can confirm that with latest releases, this isn't reproducible. The only possibility I could think of here is that 'status' as a name was some how placed as a hostname in DNS config or etc/hosts because of which glusterd interpreted it to be a hostname instead of a bad cli command. I'd like to close the bug. If you can confirm that this hypothesis isn't true, please write back.
@nh2 Can you please try to reproduct this issue again on the latest releases ?
Can we reach to a decision on this issue ? The reporter has not yet addressed the needinfo and this issue has been in a dormant state for a long time now.
@nh2 Can you address the needinfo or else I will have to close the bug considering that its no more reproducible.
As it's no more reproducible, I'm closing the bug. Please feel free to reopen the bug, if the issue persists. Thanks, Sanju
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days