Bug 761796 (GLUSTER-64)

Summary: Using cluster/replicate as a cluster/unify namespace brick crashes
Product: [Community] GlusterFS Reporter: Basavanagowda Kanur <gowda>
Component: unifyAssignee: Amar Tumballi <amarts>
Status: CLOSED WONTFIX QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: mainlineCC: anush, gluster-bugs, gowda, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTNR Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 762141    
Bug Blocks:    

Description Basavanagowda Kanur 2009-06-25 06:41:23 UTC
[Migrated from savannah BTS] - bug 26776 [https://savannah.nongnu.org/bugs/?26776]

Wed 10 Jun 2009 05:14:11 PM GMT, original submission by Jonathan Steffan <damaestro>:

When using a cluster/replicate brick for the namespace of cluster/unify crashes shortly after data population.

How to crash:

1.) Start populating data.
2.) for i in {1..1000}; do ls -R /path/to/glusterfsmount; done
3.) Wait for short period.
4.) Client disconnects from servers and the filesystem is left in an unusable state.

--------------------------------------------------------------------------------
Wed 10 Jun 2009 11:38:32 PM GMT, comment #1 by Jonathan Steffan <damaestro>:

This also happens when using a cluster/afr volume for the namespace brick of a cluster/unify.

--------------------------------------------------------------------------------
Tue 16 Jun 2009 04:21:57 PM GMT, comment #2 by Jonathan Steffan <damaestro>:

Okay, this looks like it's less an issue about using these translators together and more about something crashing/leaking in the client. Using a replicated namespace brick just makes everything happen faster. The crash happens after a collection of the following is seen on the server side:

[2009-06-15 19:54:41] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /mpath/to/some/sort/of/content/20812.jpg/30 failed: Not a directory
[2009-06-15 19:55:05] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/real_estate.jpg/0 failed: Not a directory
[2009-06-15 19:55:08] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /mpath/to/some/sort/of/content/real_estate.jpg/600 failed: Not a directory
[2009-06-15 19:58:55] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/720music.jpg/oliveCOVER.jpg failed: Not a directory
[2009-06-15 19:59:09] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /mpath/to/some/sort/of/content/0/20812.jpg/30 failed: Not a directory
[2009-06-15 19:59:53] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/34971.jpg/400 failed: Not a directory
[2009-06-15 20:11:14] E [posix.c:270:posix_lookup] iops_lun_disk0: lstat on /path/to/some/sort/of/content/headlines failed: Not a directory
[..... many lstat .....]
[2009-06-15 20:11:15] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/headlines failed: Not a directory
[2009-06-15 20:28:38] E [posix.c:382:posix_opendir] unify_lun_disk0: opendir failed on /path/to/some/sort/of/other/content/bang_4.jpg: Not a directory
[2009-06-15 20:31:01] E [posix.c:1298:posix_utimens] iops_lun_disk0: utimes on /path/to/disk-ld1/path/to/some/sort/of/content/90/.112294.jpg.pBk95v failed: No such file or directory
[2009-06-15 21:19:12] E [posix.c:1147:posix_chmod] iops_lun_disk0: chmod on /path/to/some/sort/of/content/70/.279.jpg.XARvhk failed: No such file or directory
[2009-06-15 21:21:47] E [posix.c:1147:posix_chmod] iops_lun_disk0: chmod on /mpath/to/some/sort/of/content/80/.1180.jpg.49pYSO failed: No such file or directory
[2009-06-15 19:40:11] E [posix.c:382:posix_opendir] unify_lun_disk0: opendir failed on /mpath/to/some/sort/of/content/0/60/31068.jpg: Not a directory
[2009-06-15 19:54:41] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/20812.jpg/30 failed: Not a directory
[2009-06-15 19:55:05] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/real_estate.jpg/0 failed: Not a directory
[2009-06-15 19:55:08] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/real_estate.jpg/600 failed: Not a directory
[2009-06-15 19:58:55] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/720music.jpg/oliveCOVER.jpg failed: Not a directory
[2009-06-15 19:59:09] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/20812.jpg/30 failed: Not a directory
[2009-06-15 19:59:53] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/34971.jpg/400 failed: Not a directory
[2009-06-15 20:07:58] E [posix.c:1147:posix_chmod] iops_lun_disk0: chmod on /path/to/some/sort/of/content/.81445.jpg.gQiXuD failed: No such file or directory
[2009-06-15 20:11:15] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/headlines failed: Not a directory
[2009-06-15 20:26:12] E [posix.c:1147:posix_chmod] iops_lun_disk0: chmod on /mpath/to/some/sort/of/content/0/.105545.jpg.ZeY2ga failed: No such file or directory

This is looking like it's possible that the issue is actually with the switch scheduler screwing up.

volume main_storage
type cluster/unify
# unify_namespace is a cluster/replicate to two servers
option namespace unify_namespace
option scheduler switch
# Anything not defined here ends up in 'bulk'
option scheduler.switch.case jpg:iops;gif:iops;png:iops;flv:iops;swf:iops;css:iops;xml:iops;htm:iops;wav:bulkaudio
subvolumes iops bulkaudio bulk
end-volume

--------------------------------------------------------------------------------

Tue 16 Jun 2009 04:27:36 PM GMT, comment #3 by Jonathan Steffan <damaestro>:
volume main_storage 
  type cluster/unify 
  # unify_namespace is a cluster/replicate to two servers 
  option namespace unify_namespace 
  option scheduler switch 
  # Anything not defined here ends up in 'bulk' 
  option scheduler.switch.case *jpg*:iops;*gif*:iops;*png*:iops;*flv*:iops;*swf*:iops;*css*:iops;*xml*:iops;*htm*:iops;*LOFI.mp3*:iops;*wav*:bulkaudio 
  subvolumes iops bulkaudio bulk 
end-volume 	

--------------------------------------------------------------------------------
Tue 16 Jun 2009 06:04:27 PM GMT, comment #4 by 	Jonathan Steffan <damaestro>:

http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=22

--------------------------------------------------------------------------------
Thu 18 Jun 2009 03:32:15 PM GMT, comment #5 by 	Jonathan Steffan <damaestro>:

We have removed the cluster/unify translator and are just going with multiple mounts to segment content. Everything is working now so I suspect this is an issue with running cluster/replicate or cluster/distribute under the unify translator.. or it's an issue with the switch scheduler, which I have opened another bug for.

Comment 1 Amar Tumballi 2009-11-26 00:45:53 UTC
adding dependency on bug-409, once committed, we can close all unify related bugs