Description of problem: Attempted to create a simple volume with the 3.4.0 beta 3 rpms, but it kept failing and giving no useful reason as to why: # gluster volume create test1 transport rdma gluster1-1:/export/brick2/glusterfs/ volume create: test1: failed # The /var/log/glusterfs/cli.log didn't indicate anything useful to say why it failed: [2013-06-22 00:42:49.890399] D [rpc-clnt.c:964:rpc_clnt_connection_init] 0-glusterfs: defaulting frame-timeout to 30mins [2013-06-22 00:42:49.890453] W [rpc-transport.c:175:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket" [2013-06-22 00:42:49.890475] D [rpc-transport.c:249:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.4.0beta3/rpc-transport/socket.so [2013-06-22 00:42:49.891996] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled [2013-06-22 00:42:49.892009] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread [2013-06-22 00:42:49.894327] D [common-utils.c:237:gf_resolve_ip6] 0-resolver: returning ip-127.0.0.1 (port-24007) for hostname: localhost and port: 24007 [2013-06-22 00:42:49.896346] D [common-utils.c:257:gf_resolve_ip6] 0-resolver: next DNS query will return: ip-127.0.0.1 port-24007 [2013-06-22 00:42:49.897054] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897074] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897080] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897086] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897092] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897099] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897108] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897116] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897124] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897130] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897135] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897141] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897148] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.897154] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939793] D [cli-cmd-volume.c:1296:cli_check_gsync_present] 0-cli: Returning 0 [2013-06-22 00:42:49.939828] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939835] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939843] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939857] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939866] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939873] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939881] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939887] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939895] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939903] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939909] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939914] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939919] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939927] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939933] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939939] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939944] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939950] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939955] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939961] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939966] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939971] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939976] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939986] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.939993] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.940000] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.940005] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.940011] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.940017] D [registry.c:408:cli_cmd_register] 0-cli: Returning 0 [2013-06-22 00:42:49.943601] I [cli-rpc-ops.c:798:gf_cli_create_volume_cbk] 0-cli: Received resp to create volume [2013-06-22 00:42:49.943658] D [cli-cmd.c:383:cli_cmd_submit] 0-cli: Returning -1 [2013-06-22 00:42:49.943670] D [cli-rpc-ops.c:2906:gf_cli_create_volume] 0-cli: Returning -1 [2013-06-22 00:42:49.943688] D [cli-cmd-volume.c:408:cli_cmd_volume_create_cbk] 0-cli: frame->local is not NULL (0x7f092c000a10) [2013-06-22 00:42:49.943709] I [input.c:36:cli_batch] 0-: Exiting with: -1 Nothing useful in the main /var/log/glusterfs/etc-glusterfs-glusterd.vol.log log file either. Absolutely nothing in any log files to even indicate a problem is occuring. :( What the problem turned out to be, was that I'd peer probed the gluster1 box with a different interface address ("gluster1-2") than I was using in my volume creation command ("gluster1-1"). Both gluster1-1 and gluster1-2 are the same "gluster1" host, but different ports of it. After many hair pulling hours and diagnoses by others on IRC, we realised the problem when looking through the foreground --debug logging from glusterd itself: [2013-06-22 01:25:16.423897] D [glusterd-utils.c:751:glusterd_volinfo_delete] 0-management: Returning 0 [2013-06-22 01:25:16.423906] D [glusterd-utils.c:777:glusterd_brickinfo_new] 0-management: Returning 0 [2013-06-22 01:25:16.423915] D [glusterd-utils.c:839:glusterd_brickinfo_new_from_brick] 0-management: Returning 0 [2013-06-22 01:25:16.424107] D [glusterd-utils.c:4807:glusterd_friend_find_by_hostname] 0-management: Unable to find friend: gluster1-1 [2013-06-22 01:25:16.424193] D [glusterd-utils.c:290:glusterd_is_local_addr] 0-management: 172.16.101.1 [2013-06-22 01:25:16.424260] D [glusterd-utils.c:257:glusterd_interface_search] 0-management: 172.16.101.1 is local address at interface ib0 [2013-06-22 01:25:16.424274] D [glusterd-utils.c:4842:glusterd_hostname_to_uuid] 0-management: returning 0 [2013-06-22 01:25:16.424283] D [glusterd-utils.c:793:glusterd_resolve_brick] 0-management: Returning 0 [2013-06-22 01:25:16.425567] D [glusterd-utils.c:4993:glusterd_new_brick_validate] 0-management: returning 0 [2013-06-22 01:25:16.425692] D [glusterd-utils.c:4807:glusterd_friend_find_by_hostname] 0-management: Unable to find friend: gluster1-1 [2013-06-22 01:25:16.425757] D [glusterd-utils.c:290:glusterd_is_local_addr] 0-management: 172.16.101.1 [2013-06-22 01:25:16.425808] D [glusterd-utils.c:257:glusterd_interface_search] 0-management: 172.16.101.1 is local address at interface ib0 [2013-06-22 01:25:16.425820] D [glusterd-utils.c:4842:glusterd_hostname_to_uuid] 0-management: returning 0 [2013-06-22 01:25:16.425829] D [glusterd-utils.c:793:glusterd_resolve_brick] 0-management: Returning 0 [2013-06-22 01:25:16.425917] D [glusterd-volume-ops.c:783:glusterd_op_stage_create_volume] 0-management: Returning 0 [2013-06-22 01:25:16.425929] D [glusterd-op-sm.c:3791:glusterd_op_stage_validate] 0-management: Returning 0 [2013-06-22 01:25:16.427229] D [glusterd-rpc-ops.c:183:glusterd_op_send_cli_response] 0-management: Returning 0 [2013-06-22 01:25:16.427705] D [socket.c:486:__socket_rwv] 0-socket.management: EOF on socket [2013-06-22 01:25:16.427714] D [socket.c:2236:socket_event_handler] 0-transport: disconnecting now [2013-06-22 01:37:18.642531] I [glusterd-handler.c:952:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2013-06-22 01:37:18.643134] D [socket.c:486:__socket_rwv] 0-socket.management: EOF on socket [2013-06-22 01:37:18.643154] D [socket.c:2236:socket_event_handler] 0-transport: disconnecting now What we should do is recognise this failure scenario, and directly report back to the user that they're using an incorrect IP address for one of the peers. Version-Release number of selected component (if applicable): # rpm -qa|grep -i glusterfs|sort glusterfs-3.4.0-0.6.beta3.el6.x86_64 glusterfs-api-3.4.0-0.6.beta3.el6.x86_64 glusterfs-debuginfo-3.4.0-0.6.beta3.el6.x86_64 glusterfs-devel-3.4.0-0.6.beta3.el6.x86_64 glusterfs-fuse-3.4.0-0.6.beta3.el6.x86_64 glusterfs-rdma-3.4.0-0.6.beta3.el6.x86_64 glusterfs-server-3.4.0-0.6.beta3.el6.x86_64 How reproducible: Every time. Steps to Reproduce: 1. Install the official upstream GlusterFS 3.4.0 beta 3 rpms on RHEL 6.4 x86_64 2. Setup the peer probing, so the hosts recognise each other 3. Attempt to create a new volume, using an alternative IP for one of the gluster nodes (eg IP of different eth interface) Actual results: Creation of volume fails, with no reason given for the failure, and no message logged anywhere. Expected results: Volume creation to fail, but tell the user why. Additional info:
pre-release version is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.