1335359 – Adding of identical brick (with diff IP/hostname) from peer node is failing.

Bug 1335359 - Adding of identical brick (with diff IP/hostname) from peer node is failing.

Summary: Adding of identical brick (with diff IP/hostname) from peer node is failing.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Atin Mukherjee
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:	1335357
Blocks:	1311817
TreeView+	depends on / blocked

Reported:	2016-05-12 04:33 UTC by Byreddy
Modified:	2016-09-17 16:44 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.7.9-6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-23 05:22:50 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Byreddy 2016-05-12 04:33:34 UTC

Description of problem:
=======================
Adding of identical brick from peer node is failing if similar brick path part of volume is down due to underlying filesystem crash in some other peer node.


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-4


How reproducible:
=================
Always.


Steps to Reproduce:
===================
1. Create a simple distributed volume using one node (node-1) cluster
2. Crash the brick0 underlying filesystem (eg: node1_ip:/bricks/brick0
3. Probe new node node-2 from node-1.
4. Try to add identical brick (node2_ip:/bricks/brick0) part of node-2 // it will fail.


Actual results:
===============
Adding of identical brick (with diff IP/hostname) from peer node is failing.


Expected results:
=================
Adding of identical brick from peer node should work.


Additional info:

Comment 2 Byreddy 2016-05-12 04:35:30 UTC

I will provide the logs

Comment 5 Byreddy 2016-05-12 06:07:20 UTC

glusterd log from node where add-brick failed.
============

[2016-05-12 06:01:49.703293] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Dis
[2016-05-12 06:01:50.800424] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/c1eec530a1c811faf8e3d20e6c09c320.socket failed (Invalid argument)
[2016-05-12 06:02:46.167785] I [MSGID: 106482] [glusterd-brick-ops.c:443:__glusterd_handle_add_brick] 0-management: Received add brick req
[2016-05-12 06:02:46.170433] C [MSGID: 106425] [glusterd-utils.c:1125:glusterd_brickinfo_new_from_brick] 0-management: realpath () failed for brick /bricks/brick0/j0. The underlying filesystem may be in bad state [Input/output error]
[2016-05-12 06:02:46.170912] W [MSGID: 106050] [glusterd-store.c:176:glusterd_store_is_valid_brickpath] 0-management: Failed to create brick info for brick 10.70.43.151:/bricks/brick0/j0
[2016-05-12 06:02:46.170927] E [MSGID: 106257] [glusterd-brick-ops.c:1703:glusterd_op_stage_add_brick] 0-management: brick path 10.70.43.151:/bricks/brick0/j0 is too long
[2016-05-12 06:02:46.170940] W [MSGID: 106122] [glusterd-mgmt.c:188:gd_mgmt_v3_pre_validate_fn] 0-management: ADD-brick prevalidation failed.
[2016-05-12 06:02:46.170950] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Add brick on local node
[2016-05-12 06:02:46.170958] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed
The message "I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/h0 has disconnected from glusterd." repeated 39 times between [2016-05-12 06:01:29.797544] and [2016-05-12 06:03:26.815524]
[2016-05-12 06:03:29.815965] I [MSGID: 106005] [glusterd-handler.c:5034:__glusterd_brick_rpc_notify] 0-management: Brick 10.70.42.77:/bricks/brick0/h0 has disconnected from glusterd.
[2016-05-12 06:03:56.819667] W [socket.c:701:__socket_rwv] 0-management: readv on /var/run/gluster/c1eec530a1c811faf8e3d20e6c09c320.socket failed (Invalid argument)

Comment 6 Atin Mukherjee 2016-05-12 08:29:52 UTC

RCA:

While creating a new brickinfo object we issue a realpath () call irrespective of whether the brick belongs to the same brick. We are still safe here as we mask an ENOENT. But in this case since the patch of the new brick matches with the old one (only the host name differs) and the underlying file system is bad, realpath () fails with an errno different than ENOENT and hence causes add-brick to fail.

Comment 7 Atin Mukherjee 2016-05-12 12:38:03 UTC

Fix of BZ 1335357 will take care of this issue too and hence moving the state to Post.

Comment 9 Atin Mukherjee 2016-05-20 11:14:33 UTC

Downstream patch : https://code.engineering.redhat.com/gerrit/#/c/74663/

Upstream patches:

mainline : http://review.gluster.org/#/c/14306 
release-3.7 : http://review.gluster.org/#/c/14410 
release-3.8 : http://review.gluster.org/#/c/14411

Comment 11 Byreddy 2016-05-23 15:54:38 UTC

Verified this bug using the build "glusterfs-3.7.9-6" and found that fix is working good.

Steps done: Repeated the reproducing steps mentioned in the description section.

Moving to verified state.

Comment 13 errata-xmlrpc 2016-06-23 05:22:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.