1322765 – glusterd: glusted didn't come up after node reboot error" realpath () failed for brick /run/gluster/snaps/130949baac8843cda443cf8a6441157f/brick3/b3. The underlying file system may be in bad state [No such file or directory]"

Bug 1322765 - glusterd: glusted didn't come up after node reboot error" realpath () failed for brick /run/gluster/snaps/130949baac8843cda443cf8a6441157f/brick3/b3. The underlying file system may be in bad state [No such file or directory]"

Summary: glusterd: glusted didn't come up after node reboot error" realpath () failed ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Atin Mukherjee
QA Contact:	Anil Shah
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1289439 1311817 1322772 1324014
TreeView+	depends on / blocked

Reported:	2016-03-31 09:46 UTC by Anil Shah
Modified:	2016-09-17 16:47 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-3.7.9-2
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1322772 (view as bug list)
Environment:
Last Closed:	2016-06-23 05:15:04 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Anil Shah 2016-03-31 09:46:51 UTC

Description of problem:

After node reboot, glusterd didn't come up .
Error" realpath () failed for brick /run/gluster/snaps/130949baac8843cda443cf8a6441157f/brick3/b3. The underlying file system may be in bad state [No such file or directory]"

Version-Release number of selected component (if applicable):

glusterfs-3.7.9-1.el7rhgs.x86_64


How reproducible:

100%

Steps to Reproduce:
1. Create 2*2 distribute replicate volume
2. Enable uss
3. Create snasphot and activate
4. Reboot one of the node

Actual results:

After node reboot, glusterd should come up

Expected results:

glusterd is down after node reboot

Additional info:

[root@dhcp46-4 ~]# gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 60769503-f742-458d-97c0-8e090147f82a
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.46.4:/rhs/brick1/b1
Brick2: 10.70.47.46:/rhs/brick2/b2
Brick3: 10.70.46.213:/rhs/brick3/b3
Brick4: 10.70.46.148:/rhs/brick4/b4
Options Reconfigured:
performance.readdir-ahead: on
features.uss: enable
features.barrier: disable
snap-activate-on-create: enable


================================
glusterd logs from node which is rebooted

[2016-03-31 12:03:00.551394] C [MSGID: 106425] [glusterd-store.c:2425:glusterd_store_retrieve_bricks] 0-management: realpath () failed for brick /run/gluster/snaps/130949baac8843cda443cf8a6441157f/brick3/b3. The underlying file system may be in bad state [No such file or directory]

[2016-03-31 12:19:44.102994] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2016-03-31 12:19:44.106631] W [socket.c:870:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 15, Invalid argument
[2016-03-31 12:19:44.106676] E [socket.c:2966:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
The message "I [MSGID: 106498] [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0" repeated 2 times between [2016-03-31 12:19:44.085669] and [2016-03-31 12:19:44.086167]
[2016-03-31 12:19:44.114223] C [MSGID: 106425] [glusterd-store.c:2425:glusterd_store_retrieve_bricks] 0-management: realpath () failed for brick /run/gluster/snaps/130949baac8843cda443cf8a6441157f/brick3/b3. The underlying file system may be in bad state [No such file or directory]
[2016-03-31 12:19:44.114364] E [MSGID: 106201] [glusterd-store.c:3082:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: 130949baac8843cda443cf8a6441157f
[2016-03-31 12:19:44.114387] E [MSGID: 106195] [glusterd-store.c:3475:glusterd_store_retrieve_snap] 0-management: Failed to retrieve snap volumes for snap snap1
[2016-03-31 12:19:44.114399] E [MSGID: 106043] [glusterd-store.c:3629:glusterd_store_retrieve_snaps] 0-management: Unable to restore snapshot: snap1
[2016-03-31 12:19:44.114509] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2016-03-31 12:19:44.114542] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed
[2016-03-31 12:19:44.114554] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed
[2016-03-31 12:19:44.115626] W [glusterfsd.c:1251:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x7fc632a1b2ad] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x120) [0x7fc632a1b150] -->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7fc632a1a739] ) 0-: received signum (0), shutting down

Comment 3 Atin Mukherjee 2016-03-31 10:21:54 UTC

An upstream patch http://review.gluster.org/#/c/13869/1 has been posted for review which explains the RCA

Comment 5 Atin Mukherjee 2016-04-06 05:28:59 UTC

Downstream patch https://code.engineering.redhat.com/gerrit/#/c/71478/ posted for review.

Comment 6 Atin Mukherjee 2016-04-06 05:57:38 UTC

Downstream patch is merged now. Moving the status to Modified.

Comment 8 Anil Shah 2016-04-25 08:19:24 UTC

glusterd in running after node reboot, when snapshots are available of volume.

Bug verified on build glusterfs-3.7.9-2.el7rhgs.x86_64

Comment 10 errata-xmlrpc 2016-06-23 05:15:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.