Bug 976948 - Complete lack of informative information when volume creation fails due to using wrong IP of a known gluster host
Summary: Complete lack of informative information when volume creation fails due to us...
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: cli
Version: pre-release
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-22 02:17 UTC by Justin Clift
Modified: 2015-10-22 15:40 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-10-22 15:40:20 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Justin Clift 2013-06-22 02:17:06 UTC
Description of problem:

  Attempted to create a simple volume with the 3.4.0 beta 3 rpms, but it kept failing and giving no useful reason as to why:

    # gluster volume create test1 transport rdma gluster1-1:/export/brick2/glusterfs/
    volume create: test1: failed
    #

  The /var/log/glusterfs/cli.log didn't indicate anything useful to say why it failed:

    [2013-06-22 00:42:49.890399] D [rpc-clnt.c:964:rpc_clnt_connection_init] 0-glusterfs: defaulting frame-timeout to 30mins
    [2013-06-22 00:42:49.890453] W [rpc-transport.c:175:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
    [2013-06-22 00:42:49.890475] D [rpc-transport.c:249:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.4.0beta3/rpc-transport/socket.so
    [2013-06-22 00:42:49.891996] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
    [2013-06-22 00:42:49.892009] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
    [2013-06-22 00:42:49.894327] D [common-utils.c:237:gf_resolve_ip6] 0-resolver: returning ip-127.0.0.1 (port-24007) for hostname: localhost and port: 24007
    [2013-06-22 00:42:49.896346] D [common-utils.c:257:gf_resolve_ip6] 0-resolver: next DNS query will return: ip-127.0.0.1 port-24007
    [2013-06-22 00:42:49.897054] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897074] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897080] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897086] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897092] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897099] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897108] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897116] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897124] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897130] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897135] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897141] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897148] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.897154] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939793] D [cli-cmd-volume.c:1296:cli_check_gsync_present] 0-cli: Returning 0
    [2013-06-22 00:42:49.939828] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939835] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939843] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939857] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939866] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939873] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939881] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939887] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939895] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939903] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939909] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939914] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939919] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939927] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939933] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939939] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939944] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939950] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939955] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939961] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939966] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939971] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939976] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939986] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.939993] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.940000] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.940005] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.940011] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.940017] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0
    [2013-06-22 00:42:49.943601] I [cli-rpc-ops.c:798:gf_cli_create_volume_cbk] 0-cli: Received resp to create volume
    [2013-06-22 00:42:49.943658] D [cli-cmd.c:383:cli_cmd_submit] 0-cli: Returning -1
    [2013-06-22 00:42:49.943670] D [cli-rpc-ops.c:2906:gf_cli_create_volume] 0-cli: Returning -1
    [2013-06-22 00:42:49.943688] D [cli-cmd-volume.c:408:cli_cmd_volume_create_cbk] 0-cli: frame->local is not NULL (0x7f092c000a10)
    [2013-06-22 00:42:49.943709] I [input.c:36:cli_batch] 0-: Exiting with: -1

  Nothing useful in the main /var/log/glusterfs/etc-glusterfs-glusterd.vol.log log file either.  Absolutely nothing in any log files to even indicate a problem is occuring. :(

  What the problem turned out to be, was that I'd peer probed the gluster1
  box with a different interface address ("gluster1-2") than I was using
  in my volume creation command ("gluster1-1").  Both gluster1-1 and gluster1-2
  are the same "gluster1" host, but different ports of it.

  After many hair pulling hours and diagnoses by others on IRC, we realised
  the problem when looking through the foreground --debug logging from glusterd
  itself:

    [2013-06-22 01:25:16.423897] D [glusterd-utils.c:751:glusterd_volinfo_delete] 0-management: Returning 0
    [2013-06-22 01:25:16.423906] D [glusterd-utils.c:777:glusterd_brickinfo_new] 0-management: Returning 0
    [2013-06-22 01:25:16.423915] D [glusterd-utils.c:839:glusterd_brickinfo_new_from_brick] 0-management: Returning 0
    [2013-06-22 01:25:16.424107] D [glusterd-utils.c:4807:glusterd_friend_find_by_hostname] 0-management: Unable to find friend: gluster1-1
    [2013-06-22 01:25:16.424193] D [glusterd-utils.c:290:glusterd_is_local_addr] 0-management: 172.16.101.1 
    [2013-06-22 01:25:16.424260] D [glusterd-utils.c:257:glusterd_interface_search] 0-management: 172.16.101.1 is local address at interface ib0
    [2013-06-22 01:25:16.424274] D [glusterd-utils.c:4842:glusterd_hostname_to_uuid] 0-management: returning 0
    [2013-06-22 01:25:16.424283] D [glusterd-utils.c:793:glusterd_resolve_brick] 0-management: Returning 0
    [2013-06-22 01:25:16.425567] D [glusterd-utils.c:4993:glusterd_new_brick_validate] 0-management: returning 0 
    [2013-06-22 01:25:16.425692] D [glusterd-utils.c:4807:glusterd_friend_find_by_hostname] 0-management: Unable to find friend: gluster1-1
    [2013-06-22 01:25:16.425757] D [glusterd-utils.c:290:glusterd_is_local_addr] 0-management: 172.16.101.1 
    [2013-06-22 01:25:16.425808] D [glusterd-utils.c:257:glusterd_interface_search] 0-management: 172.16.101.1 is local address at interface ib0
    [2013-06-22 01:25:16.425820] D [glusterd-utils.c:4842:glusterd_hostname_to_uuid] 0-management: returning 0
    [2013-06-22 01:25:16.425829] D [glusterd-utils.c:793:glusterd_resolve_brick] 0-management: Returning 0
    [2013-06-22 01:25:16.425917] D [glusterd-volume-ops.c:783:glusterd_op_stage_create_volume] 0-management: Returning 0
    [2013-06-22 01:25:16.425929] D [glusterd-op-sm.c:3791:glusterd_op_stage_validate] 0-management: Returning 0
    [2013-06-22 01:25:16.427229] D [glusterd-rpc-ops.c:183:glusterd_op_send_cli_response] 0-management: Returning 0
    [2013-06-22 01:25:16.427705] D [socket.c:486:__socket_rwv] 0-socket.management: EOF on socket
    [2013-06-22 01:25:16.427714] D [socket.c:2236:socket_event_handler] 0-transport: disconnecting now
    [2013-06-22 01:37:18.642531] I [glusterd-handler.c:952:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
    [2013-06-22 01:37:18.643134] D [socket.c:486:__socket_rwv] 0-socket.management: EOF on socket
    [2013-06-22 01:37:18.643154] D [socket.c:2236:socket_event_handler] 0-transport: disconnecting now

  What we should do is recognise this failure scenario, and directly report back to the user that they're using an incorrect IP address for one of the peers.


Version-Release number of selected component (if applicable):

  # rpm -qa|grep -i glusterfs|sort
  glusterfs-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-api-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-debuginfo-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-devel-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-fuse-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-rdma-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-server-3.4.0-0.6.beta3.el6.x86_64


How reproducible:

  Every time.

Steps to Reproduce:
1. Install the official upstream GlusterFS 3.4.0 beta 3 rpms on RHEL 6.4 x86_64
2. Setup the peer probing, so the hosts recognise each other
3. Attempt to create a new volume, using an alternative IP for one of the gluster nodes (eg IP of different eth interface)


Actual results:

  Creation of volume fails, with no reason given for the failure, and no message logged anywhere.

Expected results:

  Volume creation to fail, but tell the user why.


Additional info:

Comment 2 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.


Note You need to log in before you can comment on or make changes to this bug.