Bug 858732 - glusterd does not start anymore on one node [NEEDINFO]
glusterd does not start anymore on one node
Status: CLOSED EOL
Product: GlusterFS
Classification: Community
Component: glusterd (Show other bugs)
mainline
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: bugs@gluster.org
: Triaged
Depends On:
Blocks: 1190099 1269929
  Show dependency treegraph
 
Reported: 2012-09-19 10:15 EDT by daniel de baerdemaeker
Modified: 2015-10-22 11:46 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1190099 1269929 (view as bug list)
Environment:
Last Closed: 2015-10-22 11:46:38 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
ndevos: needinfo? (kparthas)


Attachments (Terms of Use)

  None (edit)
Description daniel de baerdemaeker 2012-09-19 10:15:40 EDT
Description of problem:
After a few months or running gluster on 2 nodes scientific linux, i had to reboot then becouse the / partion became full without any reason
after rebooting the first node, everything went fin
after rebooting the second it does not start anymore, i get get folowing info
from glusterd --debug on the second node 
[2012-09-19 16:11:38.770585] I [glusterfsd.c:1666:main] 0-glusterd: Started running glusterd version 3.3.0
[2012-09-19 16:11:38.770826] D [glusterfsd.c:454:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol
[2012-09-19 16:11:38.772458] I [glusterd.c:807:init] 0-management: Using /var/lib/glusterd as working directory
[2012-09-19 16:11:38.772523] D [glusterd.c:243:glusterd_rpcsvc_options_build] 0-: listen-backlog value: 128
[2012-09-19 16:11:38.772754] D [rpcsvc.c:1872:rpcsvc_init] 0-rpc-service: RPC service inited.
[2012-09-19 16:11:38.772780] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ve
r: 1, Port: 0
[2012-09-19 16:11:38.772805] D [rpc-transport.c:248:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.3.0/
rpc-transport/socket.so
[2012-09-19 16:11:38.772978] D [name.c:555:server_fill_address_family] 0-socket.management: option address-family not specified, defaulti
ng to inet/inet6
[2012-09-19 16:11:38.773161] D [rpc-transport.c:248:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.3.0/
rpc-transport/rdma.so
[2012-09-19 16:11:38.773472] C [rdma.c:3960:gf_rdma_init] 0-rpc-transport/rdma: Failed to get IB devices
[2012-09-19 16:11:38.773519] E [rdma.c:4842:init] 0-rdma.management: Failed to initialize IB Device
[2012-09-19 16:11:38.773535] E [rpc-transport.c:316:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2012-09-19 16:11:38.773545] W [rpcsvc.c:1356:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport faile
d
[2012-09-19 16:11:38.773556] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc peer, Num: 123
8437, Ver: 2, Port: 0
[2012-09-19 16:11:38.773565] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli, Num: 1238
463, Ver: 2, Port: 0
[2012-09-19 16:11:38.773575] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 123
8433, Ver: 2, Port: 0
[2012-09-19 16:11:38.773584] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Portmap, Num: 34123
456, Ver: 1, Port: 0
[2012-09-19 16:11:38.773593] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterFS Handshake, Num: 1
4398633, Ver: 2, Port: 0
[2012-09-19 16:11:38.773611] D [glusterd-utils.c:4658:glusterd_sm_tr_log_init] 0-: returning 0
[2012-09-19 16:11:38.773644] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:38.773655] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:38.773701] D [glusterd-store.c:1202:glusterd_store_retrieve_value] 0-: key UUID read
[2012-09-19 16:11:38.773713] D [glusterd-store.c:1205:glusterd_store_retrieve_value] 0-: key UUID found
[2012-09-19 16:11:38.773727] D [glusterd-store.c:1452:glusterd_retrieve_uuid] 0-: Returning 0
[2012-09-19 16:11:38.773744] I [glusterd.c:95:glusterd_uuid_init] 0-glusterd: retrieved UUID: 82ae7c4b-d32c-4781-90a3-d3f5004aea74
[2012-09-19 16:11:38.840378] D [glusterd.c:298:glusterd_check_gsync_present] 0-glusterd: Returning 0
[2012-09-19 16:11:38.840463] D [glusterd.c:404:glusterd_crt_georep_folders] 0-: Returning 0
[2012-09-19 16:11:39.702574] D [glusterd-utils.c:585:glusterd_volinfo_new] 0-: Returning 0
[2012-09-19 16:11:39.702626] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.702637] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.702658] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.702719] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702735] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702745] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702756] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702774] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702786] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702796] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702806] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702817] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702830] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702843] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702854] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702876] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.rpc-auth-unix,valu
e:off
[2012-09-19 16:11:39.702888] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702900] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.disable,value:on
[2012-09-19 16:11:39.702910] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702921] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.enable-ino32,value
:on
[2012-09-19 16:11:39.702932] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702942] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.export-dir,value:/
mnt/data2/share/
[2012-09-19 16:11:39.702956] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702967] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.export-volumes,val
ue:on
[2012-09-19 16:11:39.702978] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702989] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.trusted-sync,value:on
[2012-09-19 16:11:39.702999] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703008] E [glusterd-store.c:2080:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0
[2012-09-19 16:11:39.703019] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703027] E [glusterd-store.c:2080:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1
[2012-09-19 16:11:39.703040] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.703077] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.703093] D [glusterd-utils.c:727:glusterd_brickinfo_new] 0-: Returning 0
[2012-09-19 16:11:39.703110] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703123] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703133] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703143] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703153] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703163] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703173] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703183] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703193] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703203] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703213] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703229] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703240] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703249] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703259] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703269] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703279] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703289] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703307] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.703316] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.703334] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.703350] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703361] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703372] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902377] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902429] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902442] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.902470] D [glusterd-utils.c:727:glusterd_brickinfo_new] 0-: Returning 0
[2012-09-19 16:11:39.902485] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902509] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.902518] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.902542] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.902558] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902569] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902579] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902590] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902600] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902610] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.902629] D [glusterd-store.c:1807:glusterd_store_retrieve_bricks] 0-: Returning with 0
[2012-09-19 16:11:39.903010] D [glusterd-utils.c:1567:glusterd_volume_compute_cksum] 0-management: Returning with 0
[2012-09-19 16:11:39.903026] D [glusterd-store.c:2146:glusterd_store_retrieve_volume] 0-: Returning with 0
[2012-09-19 16:11:39.903037] D [glusterd-utils.c:941:glusterd_volinfo_find] 0-: Volume share found
[2012-09-19 16:11:39.903046] D [glusterd-utils.c:949:glusterd_volinfo_find] 0-: Returning 0
[2012-09-19 16:11:39.903062] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.903071] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.903088] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.903106] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.903118] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.903134] D [glusterd-store.c:1893:glusterd_store_retrieve_rbstate] 0-: Returning with 0
[2012-09-19 16:11:39.903146] D [glusterd-utils.c:941:glusterd_volinfo_find] 0-: Volume share found
[2012-09-19 16:11:39.903158] D [glusterd-utils.c:949:glusterd_volinfo_find] 0-: Returning 0
[2012-09-19 16:11:39.903175] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.903184] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.903199] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.903215] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.903225] D [glusterd-store.c:1955:glusterd_store_retrieve_node_state] 0-: Returning with 0
[2012-09-19 16:11:39.903239] D [glusterd-store.c:2215:glusterd_store_retrieve_volumes] 0-: Returning with 0
[2012-09-19 16:11:39.903267] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.903277] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.903292] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.903304] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.903314] D [glusterd-store.c:2563:glusterd_store_retrieve_peers] 0-: Returning with -1
[2012-09-19 16:11:39.903323] D [glusterd-store.c:2620:glusterd_restore] 0-: Returning -1
[2012-09-19 16:11:39.903336] E [xlator.c:385:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2012-09-19 16:11:39.903347] E [graph.c:294:glusterfs_graph_init] 0-management: initializing translator failed
[2012-09-19 16:11:39.903356] E [graph.c:483:glusterfs_graph_activate] 0-graph: init failed
[2012-09-19 16:11:39.903547] W [glusterfsd.c:831:cleanup_and_exit] (-->glusterd(main+0x574) [0x4073b4] (-->glusterd(glusterfs_volumes_init+0x145) [0x405b65] (-->glusterd(glusterfs_process_volfp+0x198) [0x405a18]))) 0-: received signum (0), shutting down
[2012-09-19 16:11:39.903573] D [glusterfsd-mgmt.c:2154:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given


on the first node i have 

[2012-09-19 15:14:24.194781] W [socket.c:195:__socket_rwv] 0-management: readv failed (Connection reset by peer)
[2012-09-19 15:14:24.194840] W [socket.c:1512:__socket_proto_state_machine] 0-management: reading from socket failed. Error (Connection reset by peer), peer (192.168.60.62:24007)
[2012-09-19 15:14:24.195108] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x3f4cc0f7e8] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x3f4cc0f4a0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x3f4cc0ef0e]))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2012-09-19 15:14:23.814379 (xid=0x1x)
[2012-09-19 15:14:24.195145] E [glusterd-handshake.c:430:glusterd_peer_dump_version_cbk] 0-: error through RPC layer, retry again later
[2012-09-19 15:14:24.814197] I [socket.c:1798:socket_event_handler] 0-transport: disconnecting now
[2012-09-19 15:14:26.819528] E [socket.c:1715:socket_connect_finish] 0-management: connection to 192.168.60.62:24007 failed (Connection refused)






Version-Release number of selected component (if applicable):
gluster 3.3

How reproducible:
i do not know how to reproduce it on another machine

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 daniel de baerdemaeker 2012-09-19 10:52:34 EDT
-bash-4.1# cat /etc/glusterfs/glusterd.vol 
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
end-volume
-bash-4.1#
Comment 2 Amar Tumballi 2012-12-21 02:56:34 EST
can you please see if its fixed with 3.3.1 version or 3.4.0qa* (qa6 as of now) version? we are not able to reproduce this in house.
Comment 3 Gareth 2013-02-02 16:22:38 EST
I have managed to reproduce this on 3.3.1
Two bricks in replication. An external program filled up / (including /var).
Upon reboot that system could no longer start glusterd, even after space was made.

To rectify, and I cannot be sure why, I had modified 
/var/lib/glusterd/vols/mythfe3brick/bricks/192.168.1.31\:-glusterfs-brick1
on 192.168.1.31 which was the system whch filled the disk
I change the line from
listen-port=0
to
listen-port=24009
Comment 4 Hakan Ardo 2013-03-11 05:21:51 EDT
I had a similar problem. In my case /var/lib/glusterd/peers/xxx had become empty after the reboot. I resolved it by "rm /var/lib/glusterd/peers/xxx". Then I could start glusterfsd and readd the peer using "gluster peer probe".
Comment 5 Gianluca Cecchi 2013-11-22 05:38:44 EST
Hello,
I have the same problem of / becoming full of one node that is Fedora 19 with gluster 3.4.1.
Under / there are quite all paths (apart /tmp).
I have removed  /var/lib/glusterd/peers/xxx file and rebooted the server.
Now I don't get anymore the error

0-management: Initialization of volume 'management' failed, review your volfile again

but glusterd doesn't start and I get:
 [2013-11-22 09:35:23.843703] W [rpc-transport.c:175:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
[2013-11-22 09:35:23.847153] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-11-22 09:35:23.847213] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2013-11-22 09:35:23.860648] I [cli-cmd-volume.c:1275:cli_check_gsync_present] 0-: geo-replication not installed
[2013-11-22 09:35:23.861496] E [socket.c:2157:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)

# systemctl status glusterd
glusterd.service - GlusterFS an clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
   Active: failed (Result: exit-code) since Fri 2013-11-22 10:40:23 CET; 51min ago
  Process: 1042 ExecStart=/usr/sbin/glusterd -p /run/glusterd.pid (code=exited, status=1/FAILURE)

Nov 22 10:40:22 f18ovn01 systemd[1]: Starting GlusterFS an clustered file-system server...
Nov 22 10:40:23 f18ovn01 systemd[1]: glusterd.service: control process exited, code=exited status=1
Nov 22 10:40:23 f18ovn01 systemd[1]: Failed to start GlusterFS an clustered file-system server.
Nov 22 10:40:23 f18ovn01 systemd[1]: Unit glusterd.service entered failed state.

Any other files to check to solve this further error?

fw seems ok:
# iptables -L -n | grep 24007
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:24007

and port not used:
[root@f18ovn01 glusterfs]# netstat -an|grep 24007
[root@f18ovn01 glusterfs]# 

Comparing on the other host it should be glusterd itself listening on that port on which it receives the connection refused error???

[root@f18ovn03 glusterfs]# ps -ef|grep glusterd.pid
root      1043     1  0 Nov21 ?        00:04:03 /usr/sbin/glusterd -p /run/glusterd.pid

[root@f18ovn03 glusterfs]# lsof -Pp 1043 | grep 24007
glusterd 1043 root    7u     IPv4              21822      0t0    TCP f18ovn03:24007->f18ovn03:1022 (ESTABLISHED)
glusterd 1043 root    9u     IPv4              13859      0t0    TCP *:24007 (LISTEN)
glusterd 1043 root   10u     IPv4              13888      0t0    TCP f18ovn03:24007->f18ovn03:1021 (ESTABLISHED)
glusterd 1043 root   12u     IPv4              13891      0t0    TCP localhost:24007->localhost:1020 (ESTABLISHED)
glusterd 1043 root   13u     IPv4              13893      0t0    TCP localhost:24007->localhost:1019 (ESTABLISHED)
Comment 8 Anatoly Belikov 2015-05-18 07:42:10 EDT
Any update on this bug? We have run into the same problem:

[2015-05-18 10:38:43.843495] W [rdma.c:4197:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device)
[2015-05-18 10:38:43.843513] E [rdma.c:4485:init] 0-rdma.management: Failed to initialize IB Device
[2015-05-18 10:38:43.843522] E [rpc-transport.c:320:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2015-05-18 10:38:43.843562] W [rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed
[2015-05-18 10:38:45.142777] E [run.c:190:runner_log] 0-glusterd: command failed: /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd -c /etc/glusterd/geo-replication/gsyncd.conf --config-set-rx gluster-params xlator-option=*-dht.assert-no-child-down=true .
[2015-05-18 10:38:45.142845] E [xlator.c:390:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2015-05-18 10:38:45.142857] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2015-05-18 10:38:45.142866] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
Comment 9 Niels de Vos 2015-05-19 08:21:04 EDT
(In reply to Anatoly Belikov from comment #8)
> Any update on this bug? We have run into the same problem:
...
> [2015-05-18 10:38:45.142777] E [run.c:190:runner_log] 0-glusterd: command
> failed: /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd -c
> /etc/glusterd/geo-replication/gsyncd.conf --config-set-rx gluster-params
> xlator-option=*-dht.assert-no-child-down=true .
...

This is not the same problem, but it has the same effect. Please file a new bug report against the geo-replication component. This could be a packaging issue, or something else that caused the "gsync" command to fail.


---

The original problem reported in this bug is caused by glusterd being unable to read some of its configuration files. This can (or could?) happen when /var/lib is full or out of inodes. Cleanup and manually restoring the configuration under /var/ilb/glusterd is needed in that case. KP or some of the other GlusterD developers can chime in with more details, and maybe a link to the documentation or email that describes how to restore the configuration.
Comment 10 Kaleb KEITHLEY 2015-10-22 11:46:38 EDT
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Note You need to log in before you can comment on or make changes to this bug.