Bug 1188522 - entry, metadata self-heal in 3.0 and 3.1 are not compatible
Summary: entry, metadata self-heal in 3.0 and 3.1 are not compatible
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.0.4
Assignee: Pranith Kumar K
QA Contact: Anil Shah
URL:
Whiteboard:
Depends On: 1168189 1177339 1177418
Blocks: 1182947
TreeView+ depends on / blocked
 
Reported: 2015-02-03 06:34 UTC by Pranith Kumar K
Modified: 2016-09-17 12:15 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.6.0.44-1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1177339
Environment:
Last Closed: 2015-03-26 06:35:53 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0682 0 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #4 2015-03-26 10:32:55 UTC

Description Pranith Kumar K 2015-02-03 06:34:25 UTC
+++ This bug was initially created as a clone of Bug #1177339 +++

+++ This bug was initially created as a clone of Bug #1168189 +++

Description of problem:
entry self-heal in 3.6 and above, takes full lock on the directory only for the duration of figuring out the xattrs of the directories where as 3.5 takes locks through out the entry-self-heal. If the cluster is heterogeneous then there is a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a replicate volume consisting 2 bricks on machines m1, m2 on version 3.5
2. Create a directory 'd' inside the mount on m2 and cd into it
3. While a brick was down on m2 create lot of files in this directory 'd'. I created 10000 files.
4. upgrade m1 to 3.6
5. Bring the bricks up and initiate self-heal of directories.
6. 3.6 self-heal daemon will start healing
7. access 'd' on mount in m2 then that will also trigger heal sometimes.

Actual results:
Self-heal is happening on directory 'd' by both self-heal daemons in 3.5, 3.6

Expected results:


Additional info:

--- Additional comment from Pranith Kumar K on 2014-11-26 06:26:03 EST ---

With the patch:
while 3.5 heal is in progress 3.6 heal was prevented:
[root@localhost ~]# grep d0555511ee7f0000 /var/log/glusterfs/bricks/brick.log | grep ENTRYLK
[2014-11-26 11:01:51.165591] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks: [REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000, Client=0x7f51d6d21ac0, Frame=19} Lockee = {gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock = {lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain: r2-replicate-0:self-heal}
[2014-11-26 11:01:51.165633] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks: [GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000, Client=0x7f51d6d21ac0, Frame=19} Lockee = {gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock = {lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain: r2-replicate-0:self-heal}
[2014-11-26 11:01:51.173176] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks: [REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000, Client=0x7f51d6d21ac0, Frame=21} Lockee = {gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock = {lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.173242] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks: [TRYAGAIN] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000, Client=0x7f51d6d21ac0, Frame=21} Lockee = {gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock = {lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-11-26 11:01:51.184939] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks: [REQUEST] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000, Client=0x7f51d6d21ac0, Frame=23} Lockee = {gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock = {lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain: r2-replicate-0:self-heal}
[2014-11-26 11:01:51.184989] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks: [GRANTED] Locker = {Pid=18446744073709551610, lk-owner=d0555511ee7f0000, Client=0x7f51d6d21ac0, Frame=23} Lockee = {gfid=26625058-b5f2-4561-97da-ec9e7268119e, fd=(nil), path=/d} Lock = {lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=(null), domain: r2-replicate-0:self-heal}

--- Additional comment from Pranith Kumar K on 2014-12-02 01:37:54 EST ---

In this test case, 3.6 does healing where as 3.5 heal will not get locks:

[root@localhost ~]# egrep "(aaaaaaaaaaa|TRYAGAIN)" /var/log/glusterfs/bricks/brick.log | grep ENTRY
[2014-12-02 06:14:54.502242] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks: [REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000, Client=0x7f831e1a4ac0, Frame=1064} Lockee = {gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil), path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:14:54.502261] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks: [GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000, Client=0x7f831e1a4ac0, Frame=1064} Lockee = {gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil), path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:15:01.491651] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks: [TRYAGAIN] Locker = {Pid=18446744073709551615, lk-owner=08b4e7739a7f0000, Client=0x7f831e1e7880, Frame=1621} Lockee = {gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil), path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK, cmd=LOCK_NB, type=WRITE, basename=(null), domain: r2-replicate-0}
[2014-12-02 06:18:36.741434] I [entrylk.c:244:entrylk_trace_in] 0-r2-locks: [REQUEST] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000, Client=0x7f831e1a4ac0, Frame=91112} Lockee = {gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil), path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
[2014-12-02 06:18:36.741604] I [entrylk.c:271:entrylk_trace_out] 0-r2-locks: [GRANTED] Locker = {Pid=18446744073709551610, lk-owner=a41e327d2e7f0000, Client=0x7f831e1a4ac0, Frame=91112} Lockee = {gfid=fab813d6-2ef2-4885-a293-91476cc5d167, fd=(nil), path=<gfid:fab813d6-2ef2-4885-a293-91476cc5d167>} Lock = {lock=ENTRYLK, cmd=UNLOCK, type=WRITE, basename=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}

--- Additional comment from Anand Avati on 2014-12-02 01:44:27 EST ---

REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in entrylk) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2014-12-02 01:44:32 EST ---

REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in afr-v2 compatible with afr-v1) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2014-12-26 04:45:53 EST ---

REVIEW: http://review.gluster.org/9227 (cluster/afr: Make entry-self-heal in afr-v2 compatible with afr-v1) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2014-12-26 04:46:01 EST ---

REVIEW: http://review.gluster.org/9125 (features/locks: Add lk-owner checks in entrylk) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2014-12-26 04:46:04 EST ---

REVIEW: http://review.gluster.org/9351 (mgmt/glusterd: Add option to enable lock trace) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2014-12-26 04:49:15 EST ---

REVIEW: http://review.gluster.org/9352 (features/locks: Add lk-owner checks in entrylk) posted (#1) for review on release-3.5 by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2014-12-28 23:58:15 EST ---

REVIEW: http://review.gluster.org/9352 (features/locks: Add lk-owner checks in entrylk) posted (#2) for review on release-3.5 by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anil Shah 2015-02-27 06:39:41 UTC
Since AFR v2 code is not available in downstream for 3.0.4 verified healing works fine on AFR v1. 

Bug verified on build glusterfs 3.6.0.45.

Comment 5 errata-xmlrpc 2015-03-26 06:35:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html


Note You need to log in before you can comment on or make changes to this bug.