1478319 – glusterd not starting in systemd after reboot

Bug 1478319 - glusterd not starting in systemd after reboot

Summary: glusterd not starting in systemd after reboot

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-08-04 09:53 UTC by Namreg Zepol
Modified:	2018-08-29 03:17 UTC (History)
CC List:	2 users (show)
Fixed In Version:	glusterfs-4.1.3 (or higher)
Clone Of:
Environment:
Last Closed:	2018-08-29 03:17:46 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Namreg Zepol 2017-08-04 09:53:04 UTC

Description of problem: glusterd is not starting after server reboot.


Version-Release number of selected component (if applicable): last versión of glusterFS


How reproducible: Install fresh Fedora 26 Server on two servers. Install glusterfs on them. Create replicated glusterfs volume. Enable glusterd.service. Reboot servers.


Actual results: Service is stopped.


Expected results: glusterd.service should start after reboot.


Additional info: After reboot, manual gluster.service start works.

glusterd logs:

[2017-08-04 08:33:58.466345] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.10.4 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2017-08-04 08:33:58.550516] I [MSGID: 106478] [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors set to 65536
[2017-08-04 08:33:58.550588] I [MSGID: 106479] [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working directory
[2017-08-04 08:33:58.560977] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.10.4/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[2017-08-04 08:33:58.561014] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2017-08-04 08:33:58.561019] W [rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2017-08-04 08:33:58.561025] E [MSGID: 106243] [glusterd.c:1720:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2017-08-04 08:33:58.583458] I [MSGID: 106228] [glusterd.c:500:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2017-08-04 08:33:58.585118] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31004
[2017-08-04 08:33:59.180859] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2017-08-04 08:33:59.180913] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2017-08-04 08:33:59.180925] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-08-04 08:33:59.183900] E [socket.c:3230:socket_connect] 0-management: connection attempt on  failed, (Network is unreachable)
[2017-08-04 08:33:59.184858] E [MSGID: 106187] [glusterd-store.c:4559:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2017-08-04 08:33:59.184908] E [MSGID: 101019] [xlator.c:503:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2017-08-04 08:33:59.184920] E [MSGID: 101066] [graph.c:325:glusterfs_graph_init] 0-management: initializing translator failed
[2017-08-04 08:33:59.184925] E [MSGID: 101176] [graph.c:681:glusterfs_graph_activate] 0-graph: init failed
[2017-08-04 08:33:59.192360] W [glusterfsd.c:1332:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xf7) [0x55e55b721287] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x1c7) [0x55e55b721147] -->/usr/sbin/glusterd(cleanup_and_exit+0x5a) [0x55e55b71dafa] ) 0-: received signum (1), shutting down


Notes:
If I configure a systemd timer with 15 secs delay, it works. I believe is a dependencies problem. glusterd.service unit:
[Unit]
Description=GlusterFS, a clustered file-system server
Requires=rpcbind.service
After=network.target rpcbind.service
Before=network-online.target

[Service]
Type=forking
PIDFile=/var/run/glusterd.pid
LimitNOFILE=65536
Environment="LOG_LEVEL=INFO"
EnvironmentFile=-/etc/sysconfig/glusterd
ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid  --log-level $LOG_LEVEL $GLUSTERD_OPTIONS
KillMode=process

[Install]
WantedBy=multi-user.target

I think glusterd.service is not waiting until network.target and rpcbind.service have started.

Comment 1 Atin Mukherjee 2017-08-08 15:43:14 UTC

https://review.gluster.org/17813 fixes this issue.

Note You need to log in before you can comment on or make changes to this bug.