Bug 976948

Summary: Complete lack of informative information when volume creation fails due to using wrong IP of a known gluster host
Product: [Community] GlusterFS Reporter: Justin Clift <jclift>
Component: cliAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: pre-releaseCC: bugs, gluster-bugs, kwade
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 15:40:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Justin Clift 2013-06-22 02:17:06 UTC
Description of problem:

  Attempted to create a simple volume with the 3.4.0 beta 3 rpms, but it kept failing and giving no useful reason as to why:

    # gluster volume create test1 transport rdma gluster1-1:/export/brick2/glusterfs/
    volume create: test1: failed
    #

  The /var/log/glusterfs/cli.log didn't indicate anything useful to say why it failed:

    [2013-06-22 00:42:49.890399] D [rpc-clnt.c:964:rpc_clnt_connection_init] 0-glusterfs: defaulting frame-timeout to 30mins
    [2013-06-22 00:42:49.890453] W [rpc-transport.c:175:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
    [2013-06-22 00:42:49.890475] D [rpc-transport.c:249:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.4.0beta3/rpc-transport/socket.so
    [2013-06-22 00:42:49.891996] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
    [2013-06-22 00:42:49.892009] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
    [2013-06-22 00:42:49.894327] D [common-utils.c:237:gf_resolve_ip6] 0-resolver: returning ip-127.0.0.1 (port-24007) for hostname: localhost and port: 24007
    [2013-06-22 00:42:49.896346] D [common-utils.c:257:gf_resolve_ip6] 0-resolver: next DNS query will return: ip-127.0.0.1 port-24007
    [2013-06-22 00:42:49.897054] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897074] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897080] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897086] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897092] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897099] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897108] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897116] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897124] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897130] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897135] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897141] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897148] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897154] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939793] D [cli-cmd-volume.c:1296:cli_check_gsync_present] 0-cli: Returning 0
    [2013-06-22 00:42:49.939828] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939835] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939843] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939857] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939866] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939873] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939881] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939887] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939895] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939903] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939909] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939914] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939919] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939927] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939933] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939939] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939944] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939950] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939955] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939961] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939966] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939971] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939976] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939986] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939993] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.940000] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.940005] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.940011] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.940017] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.943601] I [cli-rpc-ops.c:798:gf_cli_create_volume_cbk] 0-cli: Received resp to create volume
    [2013-06-22 00:42:49.943658] D [cli-cmd.c:383:cli_cmd_submit] 0-cli: Returning -1
    [2013-06-22 00:42:49.943670] D [cli-rpc-ops.c:2906:gf_cli_create_volume] 0-cli: Returning -1
    [2013-06-22 00:42:49.943688] D [cli-cmd-volume.c:408:cli_cmd_volume_create_cbk] 0-cli: frame->local is not NULL (0x7f092c000a10)
    [2013-06-22 00:42:49.943709] I [input.c:36:cli_batch] 0-: Exiting with: -1

  Nothing useful in the main /var/log/glusterfs/etc-glusterfs-glusterd.vol.log log file either.  Absolutely nothing in any log files to even indicate a problem is occuring. :(

  What the problem turned out to be, was that I'd peer probed the gluster1
  box with a different interface address ("gluster1-2") than I was using
  in my volume creation command ("gluster1-1").  Both gluster1-1 and gluster1-2
  are the same "gluster1" host, but different ports of it.

  After many hair pulling hours and diagnoses by others on IRC, we realised
  the problem when looking through the foreground --debug logging from glusterd
  itself:

    [2013-06-22 01:25:16.423897] D [glusterd-utils.c:751:glusterd_volinfo_delete] 0-management: Returning 0
    [2013-06-22 01:25:16.423906] D [glusterd-utils.c:777:glusterd_brickinfo_new] 0-management: Returning 0
    [2013-06-22 01:25:16.423915] D [glusterd-utils.c:839:glusterd_brickinfo_new_from_brick] 0-management: Returning 0
    [2013-06-22 01:25:16.424107] D [glusterd-utils.c:4807:glusterd_friend_find_by_hostname] 0-management: Unable to find friend: gluster1-1
    [2013-06-22 01:25:16.424193] D [glusterd-utils.c:290:glusterd_is_local_addr] 0-management: 172.16.101.1 
    [2013-06-22 01:25:16.424260] D [glusterd-utils.c:257:glusterd_interface_search] 0-management: 172.16.101.1 is local address at interface ib0
    [2013-06-22 01:25:16.424274] D [glusterd-utils.c:4842:glusterd_hostname_to_uuid] 0-management: returning 0
    [2013-06-22 01:25:16.424283] D [glusterd-utils.c:793:glusterd_resolve_brick] 0-management: Returning 0
    [2013-06-22 01:25:16.425567] D [glusterd-utils.c:4993:glusterd_new_brick_validate] 0-management: returning 0 
    [2013-06-22 01:25:16.425692] D [glusterd-utils.c:4807:glusterd_friend_find_by_hostname] 0-management: Unable to find friend: gluster1-1
    [2013-06-22 01:25:16.425757] D [glusterd-utils.c:290:glusterd_is_local_addr] 0-management: 172.16.101.1 
    [2013-06-22 01:25:16.425808] D [glusterd-utils.c:257:glusterd_interface_search] 0-management: 172.16.101.1 is local address at interface ib0
    [2013-06-22 01:25:16.425820] D [glusterd-utils.c:4842:glusterd_hostname_to_uuid] 0-management: returning 0
    [2013-06-22 01:25:16.425829] D [glusterd-utils.c:793:glusterd_resolve_brick] 0-management: Returning 0
    [2013-06-22 01:25:16.425917] D [glusterd-volume-ops.c:783:glusterd_op_stage_create_volume] 0-management: Returning 0
    [2013-06-22 01:25:16.425929] D [glusterd-op-sm.c:3791:glusterd_op_stage_validate] 0-management: Returning 0
    [2013-06-22 01:25:16.427229] D [glusterd-rpc-ops.c:183:glusterd_op_send_cli_response] 0-management: Returning 0
    [2013-06-22 01:25:16.427705] D [socket.c:486:__socket_rwv] 0-socket.management: EOF on socket
    [2013-06-22 01:25:16.427714] D [socket.c:2236:socket_event_handler] 0-transport: disconnecting now
    [2013-06-22 01:37:18.642531] I [glusterd-handler.c:952:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
    [2013-06-22 01:37:18.643134] D [socket.c:486:__socket_rwv] 0-socket.management: EOF on socket
    [2013-06-22 01:37:18.643154] D [socket.c:2236:socket_event_handler] 0-transport: disconnecting now

  What we should do is recognise this failure scenario, and directly report back to the user that they're using an incorrect IP address for one of the peers.


Version-Release number of selected component (if applicable):

  # rpm -qa|grep -i glusterfs|sort
  glusterfs-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-api-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-debuginfo-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-devel-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-fuse-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-rdma-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-server-3.4.0-0.6.beta3.el6.x86_64


How reproducible:

  Every time.

Steps to Reproduce:
1. Install the official upstream GlusterFS 3.4.0 beta 3 rpms on RHEL 6.4 x86_64
2. Setup the peer probing, so the hosts recognise each other
3. Attempt to create a new volume, using an alternative IP for one of the gluster nodes (eg IP of different eth interface)


Actual results:

  Creation of volume fails, with no reason given for the failure, and no message logged anywhere.

Expected results:

  Volume creation to fail, but tell the user why.


Additional info:

Comment 2 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.