Bug 764110 (GLUSTER-2378) - race provokes ELOOP in "entries" rpc call
Summary: race provokes ELOOP in "entries" rpc call
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-2378
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Csaba Henk
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-07 06:04 UTC by Csaba Henk
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTA
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Csaba Henk 2011-02-07 06:04:22 UTC
Starting with a top-level directory "d" on master side, then iterating

# rmdir d && ln -s d d
# rm d && mkdir d

will eventually provoke the following error:

[2011-02-07 06:38:48.525156] I [master:92:crawl] GMaster: crawling...
[2011-02-07 06:38:48.531614] E [repce(slave):73:worker] <top>: call failed:
Traceback (most recent call last):
  File "repce.py", line 69, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "resource.py", line 95, in entries
    return os.listdir(path)
OSError: [Errno 40] Too many levels of symbolic links: './d'
[2011-02-07 06:38:48.537357] E [repce:136:__call__] RepceClient: call 4007:139868164290304:1297057128.53 (entries) failed on peer with OSError
[2011-02-07 06:38:48.537626] E [gsyncd:140:main] <top>: FAIL:
Traceback (most recent call last):
  File "gsyncd.py", line 136, in main
    main_i()
  File "gsyncd.py", line 230, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/csaba/work/devel/gsyncd.py/resource.py", line 336, in service_loop
    GMaster(self, args[0]).crawl()
  File "/csaba/work/devel/gsyncd.py/master.py", line 63, in __init__
    self.crawl()
  File "/csaba/work/devel/gsyncd.py/master.py", line 170, in crawl
    True)[-1], blame=e) == False:
  File "/csaba/work/devel/gsyncd.py/master.py", line 136, in indulgently
    return fnc(e)
  File "/csaba/work/devel/gsyncd.py/master.py", line 169, in <lambda>
    self.crawl(e, xte),
  File "/csaba/work/devel/gsyncd.py/master.py", line 120, in crawl
    dem, des = ( x.server.entries(path) for x in (self.master, self.slave) )
  File "/csaba/work/devel/gsyncd.py/master.py", line 120, in <genexpr>
    dem, des = ( x.server.entries(path) for x in (self.master, self.slave) )
  File "/csaba/work/devel/gsyncd.py/repce.py", line 148, in __call__
    return self.ins(self.meth, *a)
  File "/csaba/work/devel/gsyncd.py/repce.py", line 137, in __call__
    raise res
OSError: [Errno 40] Too many levels of symbolic links: './d'
failed with OSError.

Comment 1 Csaba Henk 2011-02-23 11:27:48 UTC
The error is reproducible with the following static layout:

master looks like:
# ls -l /mnt/gluster0
/mnt/gluster0:
total 8
drwxr-xr-x 2 root root 4096 Feb  7 06:38 d

slave looks like:
# ls -l /scratch/fex0
/scratch/fex0:
total 0
lrwxrwxrwx 1 root root 1 Feb  7 06:38 d -> d

Comment 2 Lakshmipathi G 2011-03-21 08:22:18 UTC
(In reply to comment #0)
> Starting with a top-level directory "d" on master side, then iterating
> 
> # rmdir d && ln -s d d
> # rm d && mkdir d
> 

tried the same against commit-id "6d8077eb9c7129f4bdb352132e565906568dfb66" - Following error message appeared.but this didnt stop/kill master-slave.
It continued to gsync any new files to slave even after this message.


[root@ip-10-170-205-102 mntpt]# [2011-03-21 07:14:17.460310] E [repce(slave):76:exception] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 72, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/local/libexec/glusterfs/python/syncdaemon/resource.py", line 98, in entries
    raise OSError(ENOTDIR, os.strerror(ENOTDIR))
OSError: [Errno 20] Not a directory



> will eventually provoke the following error:
> 
> [2011-02-07 06:38:48.525156] I [master:92:crawl] GMaster: crawling...
> [2011-02-07 06:38:48.531614] E [repce(slave):73:worker] <top>: call failed:
> Traceback (most recent call last):
>   File "repce.py", line 69, in worker
>     res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "resource.py", line 95, in entries
>     return os.listdir(path)
> OSError: [Errno 40] Too many levels of symbolic links: './d'
> [2011-02-07 06:38:48.537357] E [repce:136:__call__] RepceClient: call
> 4007:139868164290304:1297057128.53 (entries) failed on peer with OSError
> [2011-02-07 06:38:48.537626] E [gsyncd:140:main] <top>: FAIL:
> Traceback (most recent call last):
>   File "gsyncd.py", line 136, in main
>     main_i()
>   File "gsyncd.py", line 230, in main_i
>     local.service_loop(*[r for r in [remote] if r])
>   File "/csaba/work/devel/gsyncd.py/resource.py", line 336, in service_loop
>     GMaster(self, args[0]).crawl()
>   File "/csaba/work/devel/gsyncd.py/master.py", line 63, in __init__
>     self.crawl()
>   File "/csaba/work/devel/gsyncd.py/master.py", line 170, in crawl
>     True)[-1], blame=e) == False:
>   File "/csaba/work/devel/gsyncd.py/master.py", line 136, in indulgently
>     return fnc(e)
>   File "/csaba/work/devel/gsyncd.py/master.py", line 169, in <lambda>
>     self.crawl(e, xte),
>   File "/csaba/work/devel/gsyncd.py/master.py", line 120, in crawl
>     dem, des = ( x.server.entries(path) for x in (self.master, self.slave) )
>   File "/csaba/work/devel/gsyncd.py/master.py", line 120, in <genexpr>
>     dem, des = ( x.server.entries(path) for x in (self.master, self.slave) )
>   File "/csaba/work/devel/gsyncd.py/repce.py", line 148, in __call__
>     return self.ins(self.meth, *a)
>   File "/csaba/work/devel/gsyncd.py/repce.py", line 137, in __call__
>     raise res
> OSError: [Errno 40] Too many levels of symbolic links: './d'
> failed with OSError.

Comment 3 Csaba Henk 2011-03-22 03:41:06 UTC
(In reply to comment #2)
> (In reply to comment #0)
> > Starting with a top-level directory "d" on master side, then iterating
> > 
> > # rmdir d && ln -s d d
> > # rm d && mkdir d
> > 
> 
> tried the same against commit-id "6d8077eb9c7129f4bdb352132e565906568dfb66" -
> Following error message appeared.but this didnt stop/kill master-slave.
> It continued to gsync any new files to slave even after this message.
> 
> 
> [root@ip-10-170-205-102 mntpt]# [2011-03-21 07:14:17.460310] E
> [repce(slave):76:exception] <top>: call failed: 
> Traceback (most recent call last):
>   File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 72, in
> worker
>     res = getattr(self.obj, rmeth)(*in_data[2:])
>   File "/usr/local/libexec/glusterfs/python/syncdaemon/resource.py", line 98,
> in entries
>     raise OSError(ENOTDIR, os.strerror(ENOTDIR))
> OSError: [Errno 20] Not a directory

That's fine. These errors are managed. I admit it's sort of misleading that these are reported as error level messages, but that's how the Python logging API can be used to display exceptions (probably I should go a bit into it and implement exception printing for other bug levels).

So please just check if the synchronization as such is successful, ie. slave content corresponds to master content. If yes, then all is fine.

Comment 4 Lakshmipathi G 2011-04-15 05:27:49 UTC
verified with 3.2.0qa12.After running 
Master#mkdir d ; for i in {1..50};do rmdir d && ln -s d d; rm d && mkdir d; done

gsyncd works properly.


Note You need to log in before you can comment on or make changes to this bug.