861308 – lookup blocked while waiting for self-heal that fails due to pre-existing locks

Bug 861308 - lookup blocked while waiting for self-heal that fails due to pre-existing locks

Summary: lookup blocked while waiting for self-heal that fails due to pre-existing locks

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	867342
TreeView+	depends on / blocked

Reported:	2012-09-28 06:08 UTC by Joe Julian
Modified:	2014-12-14 19:40 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Clones:	867342 (view as bug list)
Environment:
Last Closed:	2014-12-14 19:40:29 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Joe Julian 2012-09-28 06:08:03 UTC

Description of problem:
We replaced a server and there were, apparently, stale inode locks. Directory listings, or stat to specific filenames that were affected, caused the client to hang. The only way to release those calls were to force-unmount the client.

Version-Release number of selected component (if applicable):
3.3.0

How reproducible:
Always

Steps to Reproduce:
1. File with stale inode lock in with pending attributes
2. lookup() the file
  
Actual results:
Client is locked up

Expected results:
At least an error should have been returned

Additional info:

Comment 1 Pranith Kumar K 2012-11-01 11:35:50 UTC

Joe,
   Could you give us the steps to re-create the issue, as in how to end up with stale-locks?. Once there are stale locks I understand that we will observe hangs. Going into the stale-locks state is the important thing to re-create.

Pranith

Comment 2 Pranith Kumar K 2012-12-14 05:40:00 UTC

Joe,
   Any updates on this issue?. We would like to close the bug if the steps are not available :-|.

Pranith.

Comment 3 Joe Julian 2012-12-14 05:45:10 UTC

No clue how I had stale locks, but the point is that even with those locks, applications shouldn't wait in zombie state waiting for that lookup. If a response cannot be returned to the lookup, at least throw an error.

We came across this yesterday, too, where a user had libvirtd with zombie status until a vm image self-heal was completed (more than 4 hours). Not knowing why your apparently hung for 4 hours is unacceptable.

Comment 4 Pranith Kumar K 2012-12-17 11:55:10 UTC

Joe,
   Self-heal does 16 self-heals in the background per a replica pair by default. 
This is to prevent performance problems. You can either increase this number or set cluster.data-self-heal to "off" to prevent this from happening. (Note: This does not disable data-self-heal on self-heal-daemon)

Pranith.

Comment 5 Pranith Kumar K 2013-01-07 17:49:11 UTC

Joe,
    Could you let us know if this information is sufficient?

Pranith

Comment 6 Joe Julian 2013-01-07 17:59:34 UTC

Why can't we just give the application it's data? We've queued the background self-heal, we know which copy is good; can't we give that data to the application at that point?

If this was a write operation, I could understand the difficulty, but this is a read operation that the strace is showing it stuck in.

Comment 7 Pranith Kumar K 2013-01-08 02:46:38 UTC

Joe,
    Getting stuck in readv is NEW!!, readv does not even take any locks, it directly goes to the brick and then responds with whatever data is given by the brick. I need more information to figure this out. Do you know how to re-create this issue, where it gets stuck in readv? Do you happen to have the statedump of the mount, bricks when this happened?

Readv can trigger a self-heal but that does not block the readv fop.

Pranith

Comment 8 Joe Julian 2013-01-11 08:54:56 UTC

I don't know how the stale locks happened. The lock was there long before I was able to recognize there was an issue. I don't know how the locks are produced or cleared under normal circumstances so I can't even speculate.

Comment 6 was, I think, a bit of a red herring. I responded to comment 4 without carefully going back through the bug and putting it in the correct context. Comment 4 doesn't actually make any sense with respect to the original bug report which was about lookup() and inode locks.

Should lookup() be blocked if there is an inode lock on the target file? Since that block even affects a common ls of the directory, I wouldn't have expected it to. (By common I mean the default ls of most distros which includes color or decoration which requires a stat call on the file triggering that lookup.)

I wouldn't have expected that behavior. What's the harm in responding to a lookup with one good replica? lookup isn't a write operation if I'm understanding the purpose of that function call correctly.

Again, I don't know how to create stale inode locks (or even active ones for that matter) so I have no idea how to reproduce.

By "stale", I mean that there were no applications using that volume and it was only mounted on one client. After I cleared the inode locks, the system worked normally.

Comment 9 Pranith Kumar K 2013-06-05 06:10:35 UTC

Joe,
   We are observing a new bug which fits the description at least readdir hang in the following bug:
https://bugzilla.redhat.com/show_bug.cgi?id=959212

https://bugzilla.redhat.com/show_bug.cgi?id=959212#c8 has precise steps to see the issue (re-creatable 100% of the times). Could you check and let us know if what you observed and bug 959212 are possible duplicates?

Pranith

Comment 10 Niels de Vos 2014-11-27 14:53:55 UTC

The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.

Note You need to log in before you can comment on or make changes to this bug.