Bug 762571 (GLUSTER-839) - Errors in 3.0.4 replicate
Summary: Errors in 3.0.4 replicate
Keywords:
Status: CLOSED NOTABUG
Alias: GLUSTER-839
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Anand Avati
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-21 04:58 UTC by Sachidananda Urs
Modified: 2015-12-01 16:45 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Sachidananda Urs 2010-04-21 04:58:58 UTC
The following errors are seen on the replicate setup:

[2010-04-18 12:04:20] E [posix.c:509:posix_lookup] posix5:
post-operation lstat on parent of
.landfill/gardener-files-gardens-tangle002-active_domains_by_site.json.txt
failed: No such file or directory
[2010-04-18 15:44:37] E [posix.c:2366:posix_open] posix5: open on
/mnt/brick5/gardener/files/gardens/tangle002: Is a directory
[2010-04-18 15:44:38] E [posix.c:2366:posix_open] posix5: open on
/mnt/brick5/gardener/files/gardens/tangle002: Is a directory
[2010-04-18 15:44:38] E [posix.c:2146:posix_truncate] posix5: truncate
on /gardener/files/gardens/tangle002 failed: Is a directory

Some observations by Vikas@:

> [2010-04-18 12:04:20] E [posix.c:509:posix_lookup] posix5:
> post-operation lstat on parent of
> .landfill/gardener-files-gardens-tangle002-active_domains_by_site.json.txt
> failed: No such file or directory

.landfill is the replicate directory for self-heal. When self-heal decides that a file
should be deleted on a subvolume, it moves (rename) it to this directory (as a safety feature).

So this file was originally:
/gardener/files/gardens/tangle002/active_domains_by_site.json.txt

> [2010-04-18 15:44:37] E [posix.c:2366:posix_open] posix5: open on
> /mnt/brick5/gardener/files/gardens/tangle002: Is a directory
> [2010-04-18 15:44:38] E [posix.c:2366:posix_open] posix5: open on
> /mnt/brick5/gardener/files/gardens/tangle002: Is a directory
> [2010-04-18 15:44:38] E [posix.c:2146:posix_truncate] posix5: truncate
> on /gardener/files/gardens/tangle002 failed: Is a directory

Due to the above lstat failing, client has wrongly concluded that /gardener/files/gardens/tangle002
does not exist (ENOENT). Then presumably it has sent open(O_CREAT) which has led to
EISDIR.

Comment 1 Amar Tumballi 2010-05-04 07:57:49 UTC
This can be avoided by removing the dentry from inode table once the file is moved (ie, renamed) to .landfill directory. That way, we can prevent this file from getting resolved in server-protocol, hence preventing the errors.

Comment 2 Dimitri Vanoverbeke 2010-07-14 09:25:38 UTC
Do seem to have the same issue with 3.0.2 when testing the selfheal function (killing second storage node  while writing from client...)...


[2010-07-14 12:57:25] E [posix.c:477:posix_lookup] posix: post-operation lstat on parent of .landfill/vhosts-blade-user.samh.include failed: No such file or directory
[2010-07-14 12:57:25] E [posix.c:477:posix_lookup] posix: post-operation lstat on parent of .landfill/vhosts-blade-user.blade.wildcards.include failed: No such file or directory
[2010-07-14 12:57:25] E [posix.c:477:posix_lookup] posix: post-operation lstat on parent of .landfill/vhosts-blade-user.bauwensp2.wildcards.include failed: No such file or directory
[2010-07-14 12:57:25] E [posix.c:477:posix_lookup] posix: post-operation lstat on parent of .landfill/vhosts-blade-user.beutenb.wildcards.include failed: No such file or directory
[2010-07-14 13:38:22] E [posix.c:2331:posix_open] posix: open on /data/export/.landfill/vhosts-apoc-user.rommess2.include: No such file or directory
[2010-07-14 13:38:45] E [posix.c:2331:posix_open] posix: open on /data/export/.landfill/vhosts-apoc-user.rommess2.include: No such file or directory
[2010-07-14 13:46:53] E [posix.c:2331:posix_open] posix: open on /data/export/.landfill/vhosts-apoc-user.rommess2.include: No such file or directory

Comment 3 Amar Tumballi 2010-10-05 06:01:14 UTC
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.

Comment 4 Amar Tumballi 2010-10-05 08:35:54 UTC
Avati, can you check this and update status accordingly ?

Comment 5 Amar Tumballi 2010-10-05 09:24:19 UTC
With 3.1 releases (check latest codebase), this bug will no more hold good. Please update to 3.1xx releases.


Note You need to log in before you can comment on or make changes to this bug.