1415458 – Brick process does not start after node reboot

Bug 1415458 - Brick process does not start after node reboot

Summary: Brick process does not start after node reboot

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Raghavendra G
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-22 08:00 UTC by likunbyl
Modified:	2017-11-07 10:36 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-11-07 10:36:18 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
gluster volume info (5.24 KB, text/plain) 2017-01-22 08:00 UTC, likunbyl	no flags	Details
View All

Description likunbyl 2017-01-22 08:00:20 UTC

Created attachment 1243291 [details]
gluster volume info

Description of problem:

Brick process does not start automatically after reboot of a node.

glusterfs server rebooted, glusterfs service didn’t come up successfully.

Out of totally 6 bricks, 3 came up, 3 didn’t.

Here is the logs of the brick that have failed:

[2016-12-29 03:46:50.240032] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.8.5 (args: /usr/sbin/glusterfsd -s 10.32.3.23 --volfile-id gvol0.10.32.3.23.mnt-brick1-vol -p /var/lib/glusterd/vols/gvol0/run/10.32.3.23-mnt-brick1-vol.pid -S /var/run/gluster/08da045f3e66eefc50c0ff9a035c6794.socket --brick-name /mnt/brick1/vol -l /var/log/glusterfs/bricks/mnt-brick1-vol.log --xlator-option *-posix.glusterd-uuid=58c3b462-a4b6-4655-b2ac-d0502e278e03 --brick-port 49152 --xlator-option gvol0-server.listen-port=49152)
[2016-12-29 03:46:50.258772] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-12-29 03:47:08.153575] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-gvol0-server: adding option 'listen-port' for volume 'gvol0-server' with value '49152'
[2016-12-29 03:47:08.153613] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-gvol0-posix: adding option 'glusterd-uuid' for volume 'gvol0-posix' with value '58c3b462-a4b6-4655-b2ac-d0502e278e03'
[2016-12-29 03:47:08.153777] I [MSGID: 115034] [server.c:398:_check_for_auth_option] 0-gvol0-decompounder: skip format check for non-addr auth option auth.login./mnt/brick1/vol.allow
[2016-12-29 03:47:08.153785] I [MSGID: 115034] [server.c:398:_check_for_auth_option] 0-gvol0-decompounder: skip format check for non-addr auth option auth.login.94bedfd1-619d-402a-9826-67dab7600f43.password
[2016-12-29 03:47:08.153895] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2016-12-29 03:47:08.159776] I [rpcsvc.c:2214:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2016-12-29 03:47:08.159829] W [MSGID: 101002] [options.c:954:xl_opt_validate] 0-gvol0-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
[2016-12-29 03:47:08.159893] E [socket.c:793:__socket_server_bind] 0-tcp.gvol0-server: binding to  failed: Address already in use
[2016-12-29 03:47:08.159899] E [socket.c:796:__socket_server_bind] 0-tcp.gvol0-server: Port is already in use
[2016-12-29 03:47:08.159907] W [rpcsvc.c:1645:rpcsvc_create_listener] 0-rpc-service: listening on transport failed
[2016-12-29 03:47:08.159913] W [MSGID: 115045] [server.c:1061:init] 0-gvol0-server: creation of listener failed
[2016-12-29 03:47:08.159919] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-gvol0-server: Initialization of volume 'gvol0-server' failed, review your volfile again
[2016-12-29 03:47:08.159924] E [MSGID: 101066] [graph.c:324:glusterfs_graph_init] 0-gvol0-server: initializing translator failed
[2016-12-29 03:47:08.159929] E [MSGID: 101176] [graph.c:673:glusterfs_graph_activate] 0-graph: init failed
[2016-12-29 03:47:08.160764] W [glusterfsd.c:1327:cleanup_and_exit] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x3c1) [0x55ead22bee51] -->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x172) [0x55ead22b95d2] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55ead22b8b4b] ) 0-: received signum (98), shutting down

I have encountered such situation several times, some need multiple restarts to bring all bricks back. Before restart, I killed all the daemons, and used netstat to ensure the ports begin with 4915  were closed , and also 24007 , but useless.

Setup:

OS: coreos 1185.5.0
kubernetes: v1.5.1
Image: official gluster-centos:gluster3u8_centos7
Gluster: 3.8.5

20 nodes cluster running 132 bricks dist-rep 44x3 volume

Version-Release number of selected component (if applicable):
3.8.5

How reproducible:
Not always

Steps to Reproduce:
1. Reboot a Gluster node
2. Check the status of gluster volume 

Actual results:
Brick process is not running

Expected results:
Brick process should come up after node reboot

Additional info:
will attach the volume info.

Comment 1 likunbyl 2017-05-12 09:23:28 UTC

hello everyone, Is this bug resolved ?

Comment 2 Niels de Vos 2017-11-07 10:36:18 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Note You need to log in before you can comment on or make changes to this bug.