Bug 762571 (GLUSTER-839)

Summary: Errors in 3.0.4 replicate
Product: [Community] GlusterFS Reporter: Sachidananda Urs <sac>
Component: replicateAssignee: Anand Avati <aavati>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: mainlineCC: amarts, chida, chrisw, dimitri, gluster-bugs, kris.buytaert, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sachidananda Urs 2010-04-21 04:58:58 UTC
The following errors are seen on the replicate setup:

[2010-04-18 12:04:20] E [posix.c:509:posix_lookup] posix5:
post-operation lstat on parent of
.landfill/gardener-files-gardens-tangle002-active_domains_by_site.json.txt
failed: No such file or directory
[2010-04-18 15:44:37] E [posix.c:2366:posix_open] posix5: open on
/mnt/brick5/gardener/files/gardens/tangle002: Is a directory
[2010-04-18 15:44:38] E [posix.c:2366:posix_open] posix5: open on
/mnt/brick5/gardener/files/gardens/tangle002: Is a directory
[2010-04-18 15:44:38] E [posix.c:2146:posix_truncate] posix5: truncate
on /gardener/files/gardens/tangle002 failed: Is a directory

Some observations by Vikas@:

> [2010-04-18 12:04:20] E [posix.c:509:posix_lookup] posix5:
> post-operation lstat on parent of
> .landfill/gardener-files-gardens-tangle002-active_domains_by_site.json.txt
> failed: No such file or directory

.landfill is the replicate directory for self-heal. When self-heal decides that a file
should be deleted on a subvolume, it moves (rename) it to this directory (as a safety feature).

So this file was originally:
/gardener/files/gardens/tangle002/active_domains_by_site.json.txt

> [2010-04-18 15:44:37] E [posix.c:2366:posix_open] posix5: open on
> /mnt/brick5/gardener/files/gardens/tangle002: Is a directory
> [2010-04-18 15:44:38] E [posix.c:2366:posix_open] posix5: open on
> /mnt/brick5/gardener/files/gardens/tangle002: Is a directory
> [2010-04-18 15:44:38] E [posix.c:2146:posix_truncate] posix5: truncate
> on /gardener/files/gardens/tangle002 failed: Is a directory

Due to the above lstat failing, client has wrongly concluded that /gardener/files/gardens/tangle002
does not exist (ENOENT). Then presumably it has sent open(O_CREAT) which has led to
EISDIR.

Comment 1 Amar Tumballi 2010-05-04 07:57:49 UTC
This can be avoided by removing the dentry from inode table once the file is moved (ie, renamed) to .landfill directory. That way, we can prevent this file from getting resolved in server-protocol, hence preventing the errors.

Comment 2 Dimitri Vanoverbeke 2010-07-14 09:25:38 UTC
Do seem to have the same issue with 3.0.2 when testing the selfheal function (killing second storage node  while writing from client...)...


[2010-07-14 12:57:25] E [posix.c:477:posix_lookup] posix: post-operation lstat on parent of .landfill/vhosts-blade-user.samh.include failed: No such file or directory
[2010-07-14 12:57:25] E [posix.c:477:posix_lookup] posix: post-operation lstat on parent of .landfill/vhosts-blade-user.blade.wildcards.include failed: No such file or directory
[2010-07-14 12:57:25] E [posix.c:477:posix_lookup] posix: post-operation lstat on parent of .landfill/vhosts-blade-user.bauwensp2.wildcards.include failed: No such file or directory
[2010-07-14 12:57:25] E [posix.c:477:posix_lookup] posix: post-operation lstat on parent of .landfill/vhosts-blade-user.beutenb.wildcards.include failed: No such file or directory
[2010-07-14 13:38:22] E [posix.c:2331:posix_open] posix: open on /data/export/.landfill/vhosts-apoc-user.rommess2.include: No such file or directory
[2010-07-14 13:38:45] E [posix.c:2331:posix_open] posix: open on /data/export/.landfill/vhosts-apoc-user.rommess2.include: No such file or directory
[2010-07-14 13:46:53] E [posix.c:2331:posix_open] posix: open on /data/export/.landfill/vhosts-apoc-user.rommess2.include: No such file or directory

Comment 3 Amar Tumballi 2010-10-05 06:01:14 UTC
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.

Comment 4 Amar Tumballi 2010-10-05 08:35:54 UTC
Avati, can you check this and update status accordingly ?

Comment 5 Amar Tumballi 2010-10-05 09:24:19 UTC
With 3.1 releases (check latest codebase), this bug will no more hold good. Please update to 3.1xx releases.