1139513 – [AFR-V2] - Locking issue leading to races and inconsistencies in entry selfheal codepath

Bug 1139513 - [AFR-V2] - Locking issue leading to races and inconsistencies in entry selfheal codepath

Summary: [AFR-V2] - Locking issue leading to races and inconsistencies in entry selfhe...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Karthik U S
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-09-09 06:36 UTC by Krutika Dhananjay
Modified:	2019-09-24 09:50 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-09-24 09:50:08 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Krutika Dhananjay 2014-09-09 06:36:48 UTC

Description of problem:

While reading the code, I figured that entry self-heal algorithm does not take NULL entrylk in the xlator domain on the same set of nodes on which it successfully grabs entry locks in the self-heal domain. This can lead to execution of entry FOPS in a racy manner, in N-way replication where N > 3 (I still need to some more analysis to come up with possible problems with N = 3. Will update the bug if I do find problems).

Version-Release number of selected component (if applicable):

How reproducible:
N/A

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

The problems can be hit at the moment in N-way replication (N > 3).

Problem #1:
==========

Imagine a case of 4-way replication and two self-heal daemons simultaneously trying to perform entry selfheal on an inode 'I'.

Brick-0 Brick-1 Brick-2 Brick-3
SHD-1: nb-entrylk
in shd-domain on
NULL range YES YES FAILS FAILS

SHD-2: nb-entrylk
in shd-domain on
NULL range FAILS FAILS YES YES

SHD-1:nb-entrylk FAILS FAILS YES YES
in xlator domain
on NULL range

SHD-2:nb-entrylk
in xlator domain
on NULL range YES YES FAILS FAILS

Both SHDs perform lookup on 'I' on all sub-volumes and compute src and sink.
For SHD-1, the source and sink would be in the set S1= {subvol-2, subvol-3}.
Similarly, for SHD-2, the source and healed_sink would be in the set S2 = {subvol-0, subvol-1}.

Now both SHDs unlock the locks they'd held in xlator domain.

After this, SHD-1 attempts readdir() on subvols in set S1, in the same order, one at a time. So does SHD-2 on S2, and they attempt blocking entrylks on all up subvolumes for each entry name in xlator domain.

Now lets say a network split causes SHD-1 to be able to see only bricks in S1 and SHD-2 to see only bricks in S2. This would mean that NOW, SHD-1 holds lock in Set S1 and SHD-2 in set S2. And SHD-1 performs healing between subvol-2 and subvol-3 while holding locks on subvol-0 and subvol-1. Likewise, SHD-2 performs healing between subvol-0 and subvol-1 while holding locks on subvol-2 and subvol-3.

And in the end, SHD-1 updates changelogs for set S2 (where it originally held the locks) while it did selfhealing on Set S1. Similarly SHD-2 updates changelogs for Set S1 while it did selfhealing on Set S2.

Problem-2:
=========

Lets say in a 4-way replication setup, SHD-1 holds SH-domain locks on subvols 0 and 1. And SHD-2 holds SH-domain locks on subvols 2 and 3.

Now SHD-1 races ahead, takes full lock on all subvolumes, inspects changelog, computes source and sink and unlocks. Now SHD-2 also takes the full lock on all subvolumes, inspects changelog, computes src and sink and unlocks. Now both of them perform selfheal serially based on the looked up xattrs read inside the lock. Both of them end up updating the changelog on the parent, as a result, the final changelog at the end of selfheal could be incorrect.

Comment 1 Krutika Dhananjay 2014-09-09 10:50:37 UTC

Turns out the same issue exists with name selfheal, data and metadata selfheal as well.

Comment 2 Amar Tumballi 2018-08-29 03:45:25 UTC

Did we happen to fix the issue?

Comment 3 Pranith Kumar K 2018-09-03 14:16:26 UTC

No, the issue still exists for replica-count > 3.

Comment 4 Karthik U S 2019-09-24 09:50:08 UTC

Since there is no plan to work on this in near future and is not reported by any of our users recently, we are closing this bug.
If it is seen frequently on any of the latest branches and needs to be worked on in priority, feel free to reopen this.

Note You need to log in before you can comment on or make changes to this bug.