Hide Forgot
[Migrated from savannah BTS] - bug 26776 [https://savannah.nongnu.org/bugs/?26776] Wed 10 Jun 2009 05:14:11 PM GMT, original submission by Jonathan Steffan <damaestro>: When using a cluster/replicate brick for the namespace of cluster/unify crashes shortly after data population. How to crash: 1.) Start populating data. 2.) for i in {1..1000}; do ls -R /path/to/glusterfsmount; done 3.) Wait for short period. 4.) Client disconnects from servers and the filesystem is left in an unusable state. -------------------------------------------------------------------------------- Wed 10 Jun 2009 11:38:32 PM GMT, comment #1 by Jonathan Steffan <damaestro>: This also happens when using a cluster/afr volume for the namespace brick of a cluster/unify. -------------------------------------------------------------------------------- Tue 16 Jun 2009 04:21:57 PM GMT, comment #2 by Jonathan Steffan <damaestro>: Okay, this looks like it's less an issue about using these translators together and more about something crashing/leaking in the client. Using a replicated namespace brick just makes everything happen faster. The crash happens after a collection of the following is seen on the server side: [2009-06-15 19:54:41] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /mpath/to/some/sort/of/content/20812.jpg/30 failed: Not a directory [2009-06-15 19:55:05] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/real_estate.jpg/0 failed: Not a directory [2009-06-15 19:55:08] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /mpath/to/some/sort/of/content/real_estate.jpg/600 failed: Not a directory [2009-06-15 19:58:55] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/720music.jpg/oliveCOVER.jpg failed: Not a directory [2009-06-15 19:59:09] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /mpath/to/some/sort/of/content/0/20812.jpg/30 failed: Not a directory [2009-06-15 19:59:53] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/34971.jpg/400 failed: Not a directory [2009-06-15 20:11:14] E [posix.c:270:posix_lookup] iops_lun_disk0: lstat on /path/to/some/sort/of/content/headlines failed: Not a directory [..... many lstat .....] [2009-06-15 20:11:15] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/headlines failed: Not a directory [2009-06-15 20:28:38] E [posix.c:382:posix_opendir] unify_lun_disk0: opendir failed on /path/to/some/sort/of/other/content/bang_4.jpg: Not a directory [2009-06-15 20:31:01] E [posix.c:1298:posix_utimens] iops_lun_disk0: utimes on /path/to/disk-ld1/path/to/some/sort/of/content/90/.112294.jpg.pBk95v failed: No such file or directory [2009-06-15 21:19:12] E [posix.c:1147:posix_chmod] iops_lun_disk0: chmod on /path/to/some/sort/of/content/70/.279.jpg.XARvhk failed: No such file or directory [2009-06-15 21:21:47] E [posix.c:1147:posix_chmod] iops_lun_disk0: chmod on /mpath/to/some/sort/of/content/80/.1180.jpg.49pYSO failed: No such file or directory [2009-06-15 19:40:11] E [posix.c:382:posix_opendir] unify_lun_disk0: opendir failed on /mpath/to/some/sort/of/content/0/60/31068.jpg: Not a directory [2009-06-15 19:54:41] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/20812.jpg/30 failed: Not a directory [2009-06-15 19:55:05] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/real_estate.jpg/0 failed: Not a directory [2009-06-15 19:55:08] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/real_estate.jpg/600 failed: Not a directory [2009-06-15 19:58:55] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/720music.jpg/oliveCOVER.jpg failed: Not a directory [2009-06-15 19:59:09] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/20812.jpg/30 failed: Not a directory [2009-06-15 19:59:53] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/34971.jpg/400 failed: Not a directory [2009-06-15 20:07:58] E [posix.c:1147:posix_chmod] iops_lun_disk0: chmod on /path/to/some/sort/of/content/.81445.jpg.gQiXuD failed: No such file or directory [2009-06-15 20:11:15] E [posix.c:270:posix_lookup] unify_lun_disk0: lstat on /path/to/some/sort/of/content/headlines failed: Not a directory [2009-06-15 20:26:12] E [posix.c:1147:posix_chmod] iops_lun_disk0: chmod on /mpath/to/some/sort/of/content/0/.105545.jpg.ZeY2ga failed: No such file or directory This is looking like it's possible that the issue is actually with the switch scheduler screwing up. volume main_storage type cluster/unify # unify_namespace is a cluster/replicate to two servers option namespace unify_namespace option scheduler switch # Anything not defined here ends up in 'bulk' option scheduler.switch.case jpg:iops;gif:iops;png:iops;flv:iops;swf:iops;css:iops;xml:iops;htm:iops;wav:bulkaudio subvolumes iops bulkaudio bulk end-volume -------------------------------------------------------------------------------- Tue 16 Jun 2009 04:27:36 PM GMT, comment #3 by Jonathan Steffan <damaestro>: volume main_storage type cluster/unify # unify_namespace is a cluster/replicate to two servers option namespace unify_namespace option scheduler switch # Anything not defined here ends up in 'bulk' option scheduler.switch.case *jpg*:iops;*gif*:iops;*png*:iops;*flv*:iops;*swf*:iops;*css*:iops;*xml*:iops;*htm*:iops;*LOFI.mp3*:iops;*wav*:bulkaudio subvolumes iops bulkaudio bulk end-volume -------------------------------------------------------------------------------- Tue 16 Jun 2009 06:04:27 PM GMT, comment #4 by Jonathan Steffan <damaestro>: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=22 -------------------------------------------------------------------------------- Thu 18 Jun 2009 03:32:15 PM GMT, comment #5 by Jonathan Steffan <damaestro>: We have removed the cluster/unify translator and are just going with multiple mounts to segment content. Everything is working now so I suspect this is an issue with running cluster/replicate or cluster/distribute under the unify translator.. or it's an issue with the switch scheduler, which I have opened another bug for.
adding dependency on bug-409, once committed, we can close all unify related bugs