Bug 763733 (GLUSTER-2001)

Summary: When try to create a new volume, this don´t work and don´t show any error
Product: [Community] GlusterFS Reporter: Cristian <cristian.merz>
Component: glusterdAssignee: Vijay Bellur <vbellur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 3.1.1CC: gluster-bugs, naoki, pkarampu, rabhat, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: ---
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Cristian 2010-10-21 11:32:28 UTC
uname -a
Linux cluster01 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:50:31 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 1 Cristian 2010-10-21 14:32:11 UTC
When try to create a new volume, this don´t work and don´t show any error:

gluster volume create test cluster03.internal:/data/export cluster02.internal:/data/export cluster01.internal:/data/export

Creation of volume test has been unsuccessful

Check peers:

gluster peer status
Number of Peers: 3

Hostname: cluster02.internal
Uuid: 3792e612-dbdd-4d58-a635-9131e880bc94
State: Peer in Cluster (Connected)

Hostname: cluster03.internal
Uuid: 1fc0edbb-b902-4524-b238-420b2eadfb77
State: Peer in Cluster (Connected)

Hostname: cluster01.internal
Uuid: eb41d659-501e-462b-8be3-93ed6c3ac502


Logs:

[2010-10-21 10:59:42.716987] E [socket.c:1657:socket_connect_finish] management: connection to  failed (No route to host)
[2010-10-21 10:59:42.717098] E [socket.c:1657:socket_connect_finish] management: connection to  failed (No route to host)
[2010-10-21 11:00:38.998202] I [glusterd.c:274:init] management: Using /etc/glusterd as working directory
[2010-10-21 11:00:38.999427] E [socket.c:322:__socket_server_bind] socket.management: binding to  failed: Address already in use
[2010-10-21 11:00:38.999466] E [socket.c:325:__socket_server_bind] socket.management: Port is already in use
[2010-10-21 11:00:38.999502] E [glusterd.c:355:init] management: creation of listener failed
[2010-10-21 11:00:38.999524] E [xlator.c:904:xlator_init] management: Initialization of volume 'management' failed, review your volfile again
[2010-10-21 11:00:38.999547] E [graph.c:331:glusterfs_graph_init] management: initializing translator failed
[2010-10-21 11:00:38.999612] I [glusterfsd.c:666:cleanup_and_exit] glusterfsd: shutting down
[2010-10-21 11:00:52.170482] I [glusterd-handler.c:775:glusterd_handle_create_volume] glusterd: Received create volume req
[2010-10-21 11:00:52.183725] I [glusterd-utils.c:2127:glusterd_friend_find_by_hostname] glusterd: Friend cluster01.internal found.. state: 3
[2010-10-21 11:00:52.183777] I [glusterd-utils.c:223:glusterd_lock] glusterd: Cluster lock held by eb41d659-501e-462b-8be3-93ed6c3ac502
[2010-10-21 11:00:52.183809] I [glusterd-handler.c:2653:glusterd_op_txn_begin] glusterd: Acquired local lock
[2010-10-21 11:00:52.183876] I [glusterd-op-sm.c:5061:glusterd_op_sm_inject_event] glusterd: Enqueuing event: 'GD_OP_EVENT_START_LOCK'
[2010-10-21 11:00:52.183934] I [glusterd-op-sm.c:5109:glusterd_op_sm] : Dequeued event of type: 'GD_OP_EVENT_START_LOCK'
[2010-10-21 11:00:52.184046] I [glusterd3_1-mops.c:1105:glusterd3_1_cluster_lock] glusterd: Sent lock req to 1 peers
[2010-10-21 11:00:52.184082] I [glusterd-op-sm.c:4740:glusterd_op_sm_transition_state] : Transitioning from 'Default' to 'Lock sent' due to event 'GD_OP_EVENT_START_LOCK'
[2010-10-21 11:00:52.184159] I [glusterd-handler.c:425:glusterd_handle_cluster_lock] glusterd: Received LOCK from uuid: eb41d659-501e-462b-8be3-93ed6c3ac502
[2010-10-21 11:00:52.184189] I [glusterd-op-sm.c:5061:glusterd_op_sm_inject_event] glusterd: Enqueuing event: 'GD_OP_EVENT_LOCK'
[2010-10-21 11:00:52.184213] I [glusterd-op-sm.c:5109:glusterd_op_sm] : Dequeued event of type: 'GD_OP_EVENT_LOCK'
[2010-10-21 11:00:52.184236] I [glusterd-op-sm.c:4740:glusterd_op_sm_transition_state] : Transitioning from 'Lock sent' to 'Lock sent' due to event 'GD_OP_EVENT_LOCK'
[2010-10-21 11:03:40.393583] I [glusterd-handler.c:664:glusterd_handle_cli_list_friends] glusterd: Received cli list req
[2010-10-21 11:18:48.727516] E [glusterd3_1-mops.c:1370:glusterd_handle_rpc_msg] : Unable to set cli op: 16

Comment 2 Naoki 2010-10-25 03:10:34 UTC
Same issue:

# gluster volume create test-volume replica 2 transport tcp srv10:/data_testvol srv11:/data_testvol srv12:/data_testvol srv13:/data_testvol srv14:/data_testvol srv15:/data_testvol
GCreation of volume test-volume has been unsuccessful


# tail -15 /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
[2010-10-25 14:36:58.8785] D [glusterd-handler.c:441:glusterd_handle_cluster_lock] : Returning 0
[2010-10-25 14:36:58.8806] I [glusterd-op-sm.c:5109:glusterd_op_sm] : Dequeued event of type: 'GD_OP_EVENT_LOCK'
[2010-10-25 14:36:58.8827] D [glusterd-op-sm.c:3871:glusterd_op_ac_none] : Returning with 0
[2010-10-25 14:36:58.8848] I [glusterd-op-sm.c:4740:glusterd_op_sm_transition_state] : Transitioning from 'Lock sent' to 'Lock sent' due to event 'GD_OP_EVENT_LOCK'
[2010-10-25 14:48:53.82249] D [glusterd-op-sm.c:5204:glusterd_op_set_cli_op] : Returning 16
[2010-10-25 14:48:53.83068] E [glusterd3_1-mops.c:1370:glusterd_handle_rpc_msg] : Unable to set cli op: 16
[2010-10-25 14:48:53.83249] D [glusterd-op-sm.c:4525:glusterd_op_send_cli_response] : Returning 0
[2010-10-25 15:00:11.173555] I [glusterd-handler.c:664:glusterd_handle_cli_list_friends] glusterd: Received cli list req
[2010-10-25 15:00:16.64792] D [glusterd-op-sm.c:5204:glusterd_op_set_cli_op] : Returning 16
[2010-10-25 15:00:16.64839] E [glusterd3_1-mops.c:1370:glusterd_handle_rpc_msg] : Unable to set cli op: 16
[2010-10-25 15:00:16.64914] D [glusterd-op-sm.c:4525:glusterd_op_send_cli_response] : Returning 0
[2010-10-25 15:02:15.139658] I [glusterd-handler.c:664:glusterd_handle_cli_list_friends] glusterd: Received cli list req
[2010-10-25 15:04:11.72444] D [glusterd-op-sm.c:5204:glusterd_op_set_cli_op] : Returning 16
[2010-10-25 15:04:11.73065] E [glusterd3_1-mops.c:1370:glusterd_handle_rpc_msg] : Unable to set cli op: 16
[2010-10-25 15:04:11.73200] D [glusterd-op-sm.c:4525:glusterd_op_send_cli_response] : Returning 0

Peer status shows all connected:

# gluster peer status
Number of Peers: 6

Hostname: srv11
Uuid: 3330ff92-fd33-4414-bb0e-42a20dc433d4
State: Peer in Cluster (Connected)

Hostname: srv12
Uuid: af4007ac-804c-433f-a6e9-0a7d6cafe432
State: Peer in Cluster (Connected)

Hostname: srv13
Uuid: 5e4ef932-cca7-4440-a8e3-4415b84c2307
State: Peer in Cluster (Connected)

Hostname: srv14
Uuid: df6c1fb1-cf5d-4a6b-ba15-fb26319e37db
State: Peer in Cluster (Connected)

Hostname: srv15
Uuid: 2cc51957-2c92-4d24-a53c-c7ded19e80b7
State: Peer in Cluster (Connected)

Hostname: srv10
Uuid: 1b6130e7-f83e-4ec6-90f7-46706dbffc4f
State: Peer in Cluster (Connected)

However I have noticed that while there should be six nodes (peers) the other servers only list 5. They do not list themselves. Unsure what, if anything, that means.

The docs (http://www.gluster.com/community/documentation/index.php/Gluster_3.1:_Creating_Trusted_Storage_Pools) state that "When you start the first server, the storage pool consists of that server alone" but I don't have any peers from that point, I have to probe the local server for it to be added. On the other peers the same is true but if I attempt to probe the local machine I'm told it's part of another cluster.

Comment 3 Naoki 2010-10-25 03:11:27 UTC
Forgot to add all machines are:

# uname -a
Linux pdbsearch10.pm.prod 2.6.35.6-45.fc14.x86_64 #1 SMP Mon Oct 18 23:57:44 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 4 Pranith Kumar K 2010-10-27 08:54:07 UTC
In creating the storage pool, there should not be a self-probe to the localhost. Ideally CLI gives error in this case but due to the bug 763587, glusterd went into this situation. 

Documentation is also fixed to reflect that localhost should not be self-probed.

The fix is available with glusterfs-3.1.1qa1
http://ftp.gluster.com/pub/gluster/glusterfs/qa-releases/glusterfs-3.1.1qa1.tar.gz

Thanks
Pranith

Comment 5 Naoki 2010-10-27 23:41:03 UTC
Champion!

[root@srv10 ~]# gluster peer status
No peers present
[root@srv10 ~]# gluster peer probe srv11
Probe successful
[root@srv10 ~]# gluster peer probe srv12
Probe successful
[root@srv10 ~]# gluster peer probe srv13
Probe successful
[root@srv10 ~]# gluster peer probe srv14
Probe successful
[root@srv10 ~]# gluster peer probe srv15
Probe successful
[root@srv10 ~]# gluster peer status
Number of Peers: 5

Hostname: srv11
Uuid: 3330ff92-fd33-4414-bb0e-42a20dc433d4
State: Peer in Cluster (Connected)

Hostname: srv12
Uuid: af4007ac-804c-433f-a6e9-0a7d6cafe432
State: Peer in Cluster (Connected)

Hostname: srv13
Uuid: 5e4ef932-cca7-4440-a8e3-4415b84c2307
State: Peer in Cluster (Connected)

Hostname: srv14
Uuid: df6c1fb1-cf5d-4a6b-ba15-fb26319e37db
State: Peer in Cluster (Connected)

Hostname: srv15
Uuid: 2cc51957-2c92-4d24-a53c-c7ded19e80b7
State: Peer in Cluster (Connected)
[root.prod ~]# gluster volume create test-volume  transport tcp srv10:/data_testvol srv11:/data_testvol srv12:/data_testvol srv13:/data_testvol srv14:/data_testvol srv15:/data_testvol
Creation of volume test-volume has been successful
[root.prod ~]# gluster volume info

Volume Name: test-volume
Type: Distribute
Status: Created
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: srv10:/data_testvol
Brick2: srv11:/data_testvol
Brick3: srv12:/data_testvol
Brick4: srv13:/data_testvol
Brick5: srv14:/data_testvol
Brick6: srv15:/data_testvol