Bug 858732

Summary: glusterd does not start anymore on one node
Product: [Community] GlusterFS Reporter: daniel de baerdemaeker <debaerd>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: mainlineCC: awbelikov, bugs, cww, djuran, gareth.glaccum, gianluca.cecchi, gluster-bugs, hakan, hamiller, kparthas, mbukatov, ndevos, rwheeler, shyu
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: ndevos: needinfo? (kparthas)
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1190099 1269929 (view as bug list) Environment:
Last Closed: 2015-10-22 11:46:38 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 1190099, 1269929    

Description daniel de baerdemaeker 2012-09-19 10:15:40 EDT
Description of problem:
After a few months or running gluster on 2 nodes scientific linux, i had to reboot then becouse the / partion became full without any reason
after rebooting the first node, everything went fin
after rebooting the second it does not start anymore, i get get folowing info
from glusterd --debug on the second node 
[2012-09-19 16:11:38.770585] I [glusterfsd.c:1666:main] 0-glusterd: Started running glusterd version 3.3.0
[2012-09-19 16:11:38.770826] D [glusterfsd.c:454:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol
[2012-09-19 16:11:38.772458] I [glusterd.c:807:init] 0-management: Using /var/lib/glusterd as working directory
[2012-09-19 16:11:38.772523] D [glusterd.c:243:glusterd_rpcsvc_options_build] 0-: listen-backlog value: 128
[2012-09-19 16:11:38.772754] D [rpcsvc.c:1872:rpcsvc_init] 0-rpc-service: RPC service inited.
[2012-09-19 16:11:38.772780] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ve
r: 1, Port: 0
[2012-09-19 16:11:38.772805] D [rpc-transport.c:248:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.3.0/
rpc-transport/socket.so
[2012-09-19 16:11:38.772978] D [name.c:555:server_fill_address_family] 0-socket.management: option address-family not specified, defaulti
ng to inet/inet6
[2012-09-19 16:11:38.773161] D [rpc-transport.c:248:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.3.0/
rpc-transport/rdma.so
[2012-09-19 16:11:38.773472] C [rdma.c:3960:gf_rdma_init] 0-rpc-transport/rdma: Failed to get IB devices
[2012-09-19 16:11:38.773519] E [rdma.c:4842:init] 0-rdma.management: Failed to initialize IB Device
[2012-09-19 16:11:38.773535] E [rpc-transport.c:316:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2012-09-19 16:11:38.773545] W [rpcsvc.c:1356:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport faile
d
[2012-09-19 16:11:38.773556] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc peer, Num: 123
8437, Ver: 2, Port: 0
[2012-09-19 16:11:38.773565] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc cli, Num: 1238
463, Ver: 2, Port: 0
[2012-09-19 16:11:38.773575] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterD svc mgmt, Num: 123
8433, Ver: 2, Port: 0
[2012-09-19 16:11:38.773584] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Portmap, Num: 34123
456, Ver: 1, Port: 0
[2012-09-19 16:11:38.773593] D [rpcsvc.c:1636:rpcsvc_program_register] 0-rpc-service: New program registered: GlusterFS Handshake, Num: 1
4398633, Ver: 2, Port: 0
[2012-09-19 16:11:38.773611] D [glusterd-utils.c:4658:glusterd_sm_tr_log_init] 0-: returning 0
[2012-09-19 16:11:38.773644] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:38.773655] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:38.773701] D [glusterd-store.c:1202:glusterd_store_retrieve_value] 0-: key UUID read
[2012-09-19 16:11:38.773713] D [glusterd-store.c:1205:glusterd_store_retrieve_value] 0-: key UUID found
[2012-09-19 16:11:38.773727] D [glusterd-store.c:1452:glusterd_retrieve_uuid] 0-: Returning 0
[2012-09-19 16:11:38.773744] I [glusterd.c:95:glusterd_uuid_init] 0-glusterd: retrieved UUID: 82ae7c4b-d32c-4781-90a3-d3f5004aea74
[2012-09-19 16:11:38.840378] D [glusterd.c:298:glusterd_check_gsync_present] 0-glusterd: Returning 0
[2012-09-19 16:11:38.840463] D [glusterd.c:404:glusterd_crt_georep_folders] 0-: Returning 0
[2012-09-19 16:11:39.702574] D [glusterd-utils.c:585:glusterd_volinfo_new] 0-: Returning 0
[2012-09-19 16:11:39.702626] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.702637] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.702658] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.702719] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702735] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702745] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702756] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702774] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702786] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702796] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702806] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702817] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702830] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702843] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702854] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702876] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.rpc-auth-unix,valu
e:off
[2012-09-19 16:11:39.702888] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702900] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.disable,value:on
[2012-09-19 16:11:39.702910] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702921] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.enable-ino32,value
:on
[2012-09-19 16:11:39.702932] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702942] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.export-dir,value:/
mnt/data2/share/
[2012-09-19 16:11:39.702956] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702967] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.export-volumes,val
ue:on
[2012-09-19 16:11:39.702978] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.702989] D [glusterd-store.c:2076:glusterd_store_retrieve_volume] 0-: Parsed as Volume-set:key=nfs.trusted-sync,value:on
[2012-09-19 16:11:39.702999] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703008] E [glusterd-store.c:2080:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0
[2012-09-19 16:11:39.703019] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703027] E [glusterd-store.c:2080:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1
[2012-09-19 16:11:39.703040] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.703077] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.703093] D [glusterd-utils.c:727:glusterd_brickinfo_new] 0-: Returning 0
[2012-09-19 16:11:39.703110] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703123] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703133] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703143] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703153] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703163] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703173] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703183] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703193] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703203] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703213] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703229] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703240] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703249] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703259] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703269] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703279] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703289] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703307] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.703316] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.703334] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.703350] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703361] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.703372] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902377] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902429] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902442] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.902470] D [glusterd-utils.c:727:glusterd_brickinfo_new] 0-: Returning 0
[2012-09-19 16:11:39.902485] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902509] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.902518] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.902542] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.902558] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902569] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902579] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902590] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902600] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.902610] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.902629] D [glusterd-store.c:1807:glusterd_store_retrieve_bricks] 0-: Returning with 0
[2012-09-19 16:11:39.903010] D [glusterd-utils.c:1567:glusterd_volume_compute_cksum] 0-management: Returning with 0
[2012-09-19 16:11:39.903026] D [glusterd-store.c:2146:glusterd_store_retrieve_volume] 0-: Returning with 0
[2012-09-19 16:11:39.903037] D [glusterd-utils.c:941:glusterd_volinfo_find] 0-: Volume share found
[2012-09-19 16:11:39.903046] D [glusterd-utils.c:949:glusterd_volinfo_find] 0-: Returning 0
[2012-09-19 16:11:39.903062] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.903071] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.903088] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.903106] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.903118] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.903134] D [glusterd-store.c:1893:glusterd_store_retrieve_rbstate] 0-: Returning with 0
[2012-09-19 16:11:39.903146] D [glusterd-utils.c:941:glusterd_volinfo_find] 0-: Volume share found
[2012-09-19 16:11:39.903158] D [glusterd-utils.c:949:glusterd_volinfo_find] 0-: Returning 0
[2012-09-19 16:11:39.903175] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.903184] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.903199] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.903215] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with 0
[2012-09-19 16:11:39.903225] D [glusterd-store.c:1955:glusterd_store_retrieve_node_state] 0-: Returning with 0
[2012-09-19 16:11:39.903239] D [glusterd-store.c:2215:glusterd_store_retrieve_volumes] 0-: Returning with 0
[2012-09-19 16:11:39.903267] D [glusterd-store.c:1307:glusterd_store_handle_new] 0-: Returning 0
[2012-09-19 16:11:39.903277] D [glusterd-store.c:1325:glusterd_store_handle_retrieve] 0-: Returning 0
[2012-09-19 16:11:39.903292] D [glusterd-store.c:1498:glusterd_store_iter_new] 0-: Returning with 0
[2012-09-19 16:11:39.903304] D [glusterd-store.c:1616:glusterd_store_iter_get_next] 0-: Returning with -1
[2012-09-19 16:11:39.903314] D [glusterd-store.c:2563:glusterd_store_retrieve_peers] 0-: Returning with -1
[2012-09-19 16:11:39.903323] D [glusterd-store.c:2620:glusterd_restore] 0-: Returning -1
[2012-09-19 16:11:39.903336] E [xlator.c:385:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2012-09-19 16:11:39.903347] E [graph.c:294:glusterfs_graph_init] 0-management: initializing translator failed
[2012-09-19 16:11:39.903356] E [graph.c:483:glusterfs_graph_activate] 0-graph: init failed
[2012-09-19 16:11:39.903547] W [glusterfsd.c:831:cleanup_and_exit] (-->glusterd(main+0x574) [0x4073b4] (-->glusterd(glusterfs_volumes_init+0x145) [0x405b65] (-->glusterd(glusterfs_process_volfp+0x198) [0x405a18]))) 0-: received signum (0), shutting down
[2012-09-19 16:11:39.903573] D [glusterfsd-mgmt.c:2154:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given


on the first node i have 

[2012-09-19 15:14:24.194781] W [socket.c:195:__socket_rwv] 0-management: readv failed (Connection reset by peer)
[2012-09-19 15:14:24.194840] W [socket.c:1512:__socket_proto_state_machine] 0-management: reading from socket failed. Error (Connection reset by peer), peer (192.168.60.62:24007)
[2012-09-19 15:14:24.195108] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x3f4cc0f7e8] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x3f4cc0f4a0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x3f4cc0ef0e]))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2012-09-19 15:14:23.814379 (xid=0x1x)
[2012-09-19 15:14:24.195145] E [glusterd-handshake.c:430:glusterd_peer_dump_version_cbk] 0-: error through RPC layer, retry again later
[2012-09-19 15:14:24.814197] I [socket.c:1798:socket_event_handler] 0-transport: disconnecting now
[2012-09-19 15:14:26.819528] E [socket.c:1715:socket_connect_finish] 0-management: connection to 192.168.60.62:24007 failed (Connection refused)






Version-Release number of selected component (if applicable):
gluster 3.3

How reproducible:
i do not know how to reproduce it on another machine

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 daniel de baerdemaeker 2012-09-19 10:52:34 EDT
-bash-4.1# cat /etc/glusterfs/glusterd.vol 
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
end-volume
-bash-4.1#
Comment 2 Amar Tumballi 2012-12-21 02:56:34 EST
can you please see if its fixed with 3.3.1 version or 3.4.0qa* (qa6 as of now) version? we are not able to reproduce this in house.
Comment 3 Gareth 2013-02-02 16:22:38 EST
I have managed to reproduce this on 3.3.1
Two bricks in replication. An external program filled up / (including /var).
Upon reboot that system could no longer start glusterd, even after space was made.

To rectify, and I cannot be sure why, I had modified 
/var/lib/glusterd/vols/mythfe3brick/bricks/192.168.1.31\:-glusterfs-brick1
on 192.168.1.31 which was the system whch filled the disk
I change the line from
listen-port=0
to
listen-port=24009
Comment 4 Hakan Ardo 2013-03-11 05:21:51 EDT
I had a similar problem. In my case /var/lib/glusterd/peers/xxx had become empty after the reboot. I resolved it by "rm /var/lib/glusterd/peers/xxx". Then I could start glusterfsd and readd the peer using "gluster peer probe".
Comment 5 Gianluca Cecchi 2013-11-22 05:38:44 EST
Hello,
I have the same problem of / becoming full of one node that is Fedora 19 with gluster 3.4.1.
Under / there are quite all paths (apart /tmp).
I have removed  /var/lib/glusterd/peers/xxx file and rebooted the server.
Now I don't get anymore the error

0-management: Initialization of volume 'management' failed, review your volfile again

but glusterd doesn't start and I get:
 [2013-11-22 09:35:23.843703] W [rpc-transport.c:175:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
[2013-11-22 09:35:23.847153] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-11-22 09:35:23.847213] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2013-11-22 09:35:23.860648] I [cli-cmd-volume.c:1275:cli_check_gsync_present] 0-: geo-replication not installed
[2013-11-22 09:35:23.861496] E [socket.c:2157:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)

# systemctl status glusterd
glusterd.service - GlusterFS an clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
   Active: failed (Result: exit-code) since Fri 2013-11-22 10:40:23 CET; 51min ago
  Process: 1042 ExecStart=/usr/sbin/glusterd -p /run/glusterd.pid (code=exited, status=1/FAILURE)

Nov 22 10:40:22 f18ovn01 systemd[1]: Starting GlusterFS an clustered file-system server...
Nov 22 10:40:23 f18ovn01 systemd[1]: glusterd.service: control process exited, code=exited status=1
Nov 22 10:40:23 f18ovn01 systemd[1]: Failed to start GlusterFS an clustered file-system server.
Nov 22 10:40:23 f18ovn01 systemd[1]: Unit glusterd.service entered failed state.

Any other files to check to solve this further error?

fw seems ok:
# iptables -L -n | grep 24007
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:24007

and port not used:
[root@f18ovn01 glusterfs]# netstat -an|grep 24007
[root@f18ovn01 glusterfs]# 

Comparing on the other host it should be glusterd itself listening on that port on which it receives the connection refused error???

[root@f18ovn03 glusterfs]# ps -ef|grep glusterd.pid
root      1043     1  0 Nov21 ?        00:04:03 /usr/sbin/glusterd -p /run/glusterd.pid

[root@f18ovn03 glusterfs]# lsof -Pp 1043 | grep 24007
glusterd 1043 root    7u     IPv4              21822      0t0    TCP f18ovn03:24007->f18ovn03:1022 (ESTABLISHED)
glusterd 1043 root    9u     IPv4              13859      0t0    TCP *:24007 (LISTEN)
glusterd 1043 root   10u     IPv4              13888      0t0    TCP f18ovn03:24007->f18ovn03:1021 (ESTABLISHED)
glusterd 1043 root   12u     IPv4              13891      0t0    TCP localhost:24007->localhost:1020 (ESTABLISHED)
glusterd 1043 root   13u     IPv4              13893      0t0    TCP localhost:24007->localhost:1019 (ESTABLISHED)
Comment 8 Anatoly Belikov 2015-05-18 07:42:10 EDT
Any update on this bug? We have run into the same problem:

[2015-05-18 10:38:43.843495] W [rdma.c:4197:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device)
[2015-05-18 10:38:43.843513] E [rdma.c:4485:init] 0-rdma.management: Failed to initialize IB Device
[2015-05-18 10:38:43.843522] E [rpc-transport.c:320:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2015-05-18 10:38:43.843562] W [rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed
[2015-05-18 10:38:45.142777] E [run.c:190:runner_log] 0-glusterd: command failed: /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd -c /etc/glusterd/geo-replication/gsyncd.conf --config-set-rx gluster-params xlator-option=*-dht.assert-no-child-down=true .
[2015-05-18 10:38:45.142845] E [xlator.c:390:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2015-05-18 10:38:45.142857] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2015-05-18 10:38:45.142866] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
Comment 9 Niels de Vos 2015-05-19 08:21:04 EDT
(In reply to Anatoly Belikov from comment #8)
> Any update on this bug? We have run into the same problem:
...
> [2015-05-18 10:38:45.142777] E [run.c:190:runner_log] 0-glusterd: command
> failed: /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd -c
> /etc/glusterd/geo-replication/gsyncd.conf --config-set-rx gluster-params
> xlator-option=*-dht.assert-no-child-down=true .
...

This is not the same problem, but it has the same effect. Please file a new bug report against the geo-replication component. This could be a packaging issue, or something else that caused the "gsync" command to fail.


---

The original problem reported in this bug is caused by glusterd being unable to read some of its configuration files. This can (or could?) happen when /var/lib is full or out of inodes. Cleanup and manually restoring the configuration under /var/ilb/glusterd is needed in that case. KP or some of the other GlusterD developers can chime in with more details, and maybe a link to the documentation or email that describes how to restore the configuration.
Comment 10 Kaleb KEITHLEY 2015-10-22 11:46:38 EDT
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.