Bug 1124744

Summary:	glusterd : "heal full" command failed with "Commit failed on <hostname>. Please check log file for details."
Product:	[Community] GlusterFS	Reporter:	Pranith Kumar K <pkarampu>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED EOL	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	3.5.1	CC:	bugs, hgowtham, joe, kaushal, nlevinki, ravishankar, spandura, vbellur
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1121015	Environment:
Last Closed:	2016-06-17 15:58:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1121015
Bug Blocks:

Description Pranith Kumar K 2014-07-30 08:29:23 UTC

+++ This bug was initially created as a clone of Bug #1121015 +++

Description of problem:
==========================
After a successful "replace-brick commit force" on a brick in a 2 x 2 distribute-replicate volume, executed "gluster volume heal <volume_name> full".

The command failed with "Commit failed on rhs-client12. Please check log file for details." 

replace-brick commit force with source and destination bricks path being same had failed : Refer to bug https://bugzilla.redhat.com/show_bug.cgi?id=1120991 . Hence changed the destination brick path after which replace-brick commit force passed. Later executed "gluster volume heal <volume_name> full".

NOTE : ping-timeout to set to 0 on all the nodes. Time on all the nodes are in sync. 

Version-Release number of selected component (if applicable):
===============================================================
glusterfs 3.6.0.24 built on Jul  3 2014 11:03:38

How reproducible:
================
executed the same command after the failure , the command execution failed once again. 

Steps to Reproduce:
=======================
1. Create a 2 x 2 distribute-replicate volume. Start the volume. 

2. Create a fuse mount. Create files/directories from fuse mount. 

3. Simulate disk replacement on brick1 (node1). i.e Kill the brick process, rm -rf <brick_path> 

4. Tried to execute the command replace-brick start, status, commit with same src and dst path . The commands were not successful

5. Tried to replace the brick using commit force with same src and dst path multiple times .This failed (Refer to bug : https://bugzilla.redhat.com/show_bug.cgi?id=1120991)

6. Executed replace-brick commit force with different src and dst brick path . The command execution was successful. 

7. to trigger heal to the replaced-brick executed "gluster v heal <volume_name> full" . 

Actual results:
===================
root@mia [Jul-18-2014-12:07:43] >gluster v heal vol1 full
Commit failed on rhs-client12. Please check log file for details.

Expected results:
====================
triggering heal shouldn't fail. 

Additional info:
===================
root@mia [Jul-18-2014-11:33:05] >gluster v info
 
Volume Name: vol1
Type: Distributed-Replicate
Volume ID: 5cc2e193-af63-45b5-834f-9bd757cf4e84
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/device0/b1_replaced
Brick2: rhs-client12:/rhs/device0/b2
Brick3: rhs-client13:/rhs/device0/b3
Brick4: rhs-client14:/rhs/device0/b4
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
root@mia [Jul-18-2014-11:34:18] >
root@mia [Jul-18-2014-11:34:20] >
root@mia [Jul-18-2014-11:34:20] >
root@mia [Jul-18-2014-11:34:20] >gluster v replace-brick vol1 rhs-client11:/rhs/device0/b1_replaced rhs-client11:/rhs/device0/b1 commit force
volume replace-brick: success: replace-brick commit successful
root@mia [Jul-18-2014-11:34:48] >gluster v status vol1
Status of volume: vol1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/device0/b1			49154	Y	11178
Brick rhs-client12:/rhs/device0/b2			49152	Y	14924
Brick rhs-client13:/rhs/device0/b3			49152	Y	22192
Brick rhs-client14:/rhs/device0/b4			49152	Y	28035
NFS Server on localhost					2049	Y	6761
Self-heal Daemon on localhost				N/A	Y	6766
NFS Server on rhs-client14				2049	Y	10569
Self-heal Daemon on rhs-client14			N/A	Y	10574
NFS Server on rhs-client13				2049	Y	4738
Self-heal Daemon on rhs-client13			N/A	Y	4743
NFS Server on rhs-client11				2049	Y	11187
Self-heal Daemon on rhs-client11			N/A	Y	11192
NFS Server on rhs-client12				2049	Y	29698
Self-heal Daemon on rhs-client12			N/A	Y	29703
 
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks
 
root@mia [Jul-18-2014-11:34:56] >gluster v info vol1
 
Volume Name: vol1
Type: Distributed-Replicate
Volume ID: 5cc2e193-af63-45b5-834f-9bd757cf4e84
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/device0/b1
Brick2: rhs-client12:/rhs/device0/b2
Brick3: rhs-client13:/rhs/device0/b3
Brick4: rhs-client14:/rhs/device0/b4
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

root@mia [Jul-18-2014-11:38:44] >
root@mia [Jul-18-2014-11:38:44] >gluster v heal vol1 full
Commit failed on rhs-client12. Please check log file for details.

root@mia [Jul-18-2014-11:38:56] >
root@mia [Jul-18-2014-11:39:27] >gluster v status vol1 nfs
Status of volume: vol1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
NFS Server on localhost					2049	Y	6761
NFS Server on rhs-client14				2049	Y	10569
NFS Server on rhs-client13				2049	Y	4738
NFS Server on rhs-client12				2049	Y	29698
NFS Server on rhs-client11				2049	Y	11187
 
root@mia [Jul-18-2014-11:39:31] >less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
root@mia [Jul-18-2014-11:43:42] >
root@mia [Jul-18-2014-11:43:42] >gluster peer status
Number of Peers: 4

Hostname: rhs-client12
Uuid: b43a950a-d945-47b4-8a6e-8f1c5063c9f5
State: Peer in Cluster (Connected)

Hostname: rhs-client14
Uuid: b71c223c-ab9c-4ecf-b935-128ec1f8b2f5
State: Peer in Cluster (Connected)

Hostname: rhs-client11
Uuid: 809e3daf-d43b-49e9-91e4-6eadff2875db
State: Peer in Cluster (Connected)

Hostname: rhs-client13
Uuid: ad6f2447-ed9d-4f78-ad50-503d6aec2a88
State: Peer in Cluster (Connected)


root@mia [Jul-18-2014-12:07:43] >gluster v heal vol1 full
Commit failed on rhs-client12. Please check log file for details.
root@mia [Jul-18-2014-12:27:09] >

--- Additional comment from  on 2014-07-18 03:33:49 EDT ---

SOS Reports : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1121015/

--- Additional comment from RHEL Product and Program Management on 2014-07-18 03:43:14 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Kaushal on 2014-07-21 06:20:33 EDT ---

The problem is that the self-heal daemons are trying to connect to the old bricks port.

The brick client11:b1_replaced was replaced with client11:b1. b1_replaced had the port 49152, and b1 got a new port 49154 (which can be observed in the volume status output).

Looking at the self-heal daemon logs, we know that self-heal daemons were notified of the graph change and did change their graphs. But, the client xlator, client-0, was attempting to connect to client11:49152 (old brick) instead client11:49154 (new brick). This meant that only 1 afr child was up. As heal requires 2 or more children to be up, the heal was failing.

--- Additional comment from Ravishankar N on 2014-07-21 07:26:14 EDT ---

On attaching gdb to the selfhealdaemon of rhs-client12, it was observed that shd was querying glusterd for the port number of "/rhs/device0/b1" which is the currenly alive brick process but the answer it got was 49152 instead of 49154.
--------------------------------
(gdb) b client_query_portmap
Breakpoint 2 at 0x7ff9da3314d0: file client-handshake.c, line 1528.
(gdb) c
Continuing.

Breakpoint 2, client_query_portmap (this=0xe6c4c0, rpc=0xf477f0) at client-handshake.c:1528
(gdb) n
(gdb) p remote_subvol
$2 = 0xe66c10 "/rhs/device0/b1"
(gdb) c
Continuing.

Breakpoint 1, client_query_portmap_cbk (req=0x7ff9d8f94298, iov=0x7ff9d8f942d8, count=1, myframe=0x7ff9e361ea90) at client-handshake.c:1463
(gdb) n
(gdb) p rsp.port
$3 = 49152
----------------
What needs to be figured out is why glusterd sends a stale port number for the portmap query.

--- Additional comment from Kaushal on 2014-07-22 02:11:17 EDT ---

Checked glusterd on rhs-client11. Turns out the pmap signout hadn't happened for the, which had lead to the pmap registry containing stale entries. Because of the way the way the pmap registry is searched, glusterd is returning the stale, old port entry for the brick b1.

---- gdb output
(gdb) p $1.pmap
$2 = (struct pmap_registry *) 0x7f4168358010
(gdb) p *$2
$3 = {base_port = 49152, last_alloc = 49154, ports = {{type = GF_PMAP_PORT_FREE, brickname = 0x0, xprt = 0x0} <repeats 22 times>, {type = GF_PMAP_PORT_FOREIGN, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FREE, brickname = 0x0,
      xprt = 0x0}, {type = GF_PMAP_PORT_FREE, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FOREIGN, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FREE, brickname = 0x0, xprt = 0x0} <repeats 85 times>, {
      type = GF_PMAP_PORT_FOREIGN, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FREE, brickname = 0x0, xprt = 0x0} <repeats 908 times>, {type = GF_PMAP_PORT_FOREIGN, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FOREIGN,
      brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FOREIGN, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FOREIGN, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FREE, brickname = 0x0, xprt = 0x0} <repeats 4642 times>, {
      type = GF_PMAP_PORT_FOREIGN, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FREE, brickname = 0x0, xprt = 0x0} <repeats 18340 times>, {type = GF_PMAP_PORT_FOREIGN, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FREE,
      brickname = 0x0, xprt = 0x0} <repeats 20559 times>, {type = GF_PMAP_PORT_FOREIGN, brickname = 0x0, xprt = 0x0}, {type = GF_PMAP_PORT_FREE, brickname = 0x0, xprt = 0x0} <repeats 4584 times>, {type = GF_PMAP_PORT_BRICKSERVER,
      brickname = 0x1fe4880 "/rhs/device0/b1", xprt = 0x1fd2e80}, {type = GF_PMAP_PORT_BRICKSERVER, brickname = 0x1fdc3c0 "/rhs/device0/b1_replaced", xprt = 0x1fd4400}, {type = GF_PMAP_PORT_BRICKSERVER,
      brickname = 0x1fceae0 "/rhs/device0/b1", xprt = 0x1fd4400}, {type = GF_PMAP_PORT_FREE, brickname = 0x0, xprt = 0x0} <repeats 16381 times>}}
(gdb)
----

As can be seen in the gdb output above, in the pmap.ports table, there are 2 entries for '/rhs/device0/b1'. The sequence of replace-brick commands, b1 -> b1_replaced -> b1, and the apparent failure of pmap_signout to happen have lead to this situation.

Comment 1 Joe Julian 2015-11-15 16:35:14 UTC

I suspect that this bug could also be avoided if we captured a replace-brick where the source is the same as the destination in the cli and suggested simply "start ... force" instead.

Comment 2 Niels de Vos 2016-06-17 15:58:30 UTC

This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.