Bug 1529916 - glusterfind doesn't terminate when it fails
Summary: glusterfind doesn't terminate when it fails
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterfind
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
Assignee: Shwetha K Acharya
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-31 01:41 UTC by nh2
Modified: 2019-11-11 13:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-11 12:40:07 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description nh2 2017-12-31 01:41:26 UTC
Description of problem:

Gluster 3.12.3.

When a `glusterfind pre` invocation fails due to unrecoverable errors, `glusterfind` doesn't terminate, it just prints some more weird errors and then hangs:

    [root@node-1:~]# glusterfind pre --no-encode --regenerate-outfile cdn-rsync-myvol myvol /tmp/cdn-rsync-myvol.outfile
    10.0.0.2 - pre failed; stdout (including remote stderr):
    Connection to 10.0.0.2 closed.

    stderr:
    Fail to create dir /var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-013544-927166-Nl7zE1/eef7fa430d9ab60e74ec72b66629f783f9fb37fa: [Errno 28] No space left on device: '/var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-013544-927166-Nl7zE1'


    10.0.0.3 - pre failed; stdout (including remote stderr):
    Connection to 10.0.0.3 closed.

    stderr:
    /data/glusterfs/myvol/brick1/brick Error during Changelog Crawl: [Errno 22] Invalid argument


In this case, there is "No space left on device" on the remote side, and another very unhelpful error ("Invalid argument" without further info); nevertheless `glusterfind` does not terminate right then; it continues a bit more until the next error:

    10.0.0.1 - pre failed; stdout (including remote stderr):
    /data/glusterfs/myvol-production/brick1/brick Error during Changelog Crawl: [Errno 22] Invalid argument

    stderr:

Comment 1 nh2 2017-12-31 01:49:17 UTC
`glusterfind`, when it hangs, also seems to ignore Ctrl+C, only an explicit `kill $(pidof glusterfind)` shuts it down.

Comment 2 Sunny Kumar 2019-09-26 09:56:35 UTC
(In reply to nh2 from comment #0)
> Description of problem:
> 
> Gluster 3.12.3.
> 
> When a `glusterfind pre` invocation fails due to unrecoverable errors,
> `glusterfind` doesn't terminate, it just prints some more weird errors and
> then hangs:
> 
>     [root@node-1:~]# glusterfind pre --no-encode --regenerate-outfile
> cdn-rsync-myvol myvol /tmp/cdn-rsync-myvol.outfile
>     10.0.0.2 - pre failed; stdout (including remote stderr):
>     Connection to 10.0.0.2 closed.
> 
>     stderr:
>     Fail to create dir
> /var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-
> 013544-927166-Nl7zE1/eef7fa430d9ab60e74ec72b66629f783f9fb37fa: [Errno 28] No
> space left on device:
> '/var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-
> 013544-927166-Nl7zE1'
> 
> 
>     10.0.0.3 - pre failed; stdout (including remote stderr):
>     Connection to 10.0.0.3 closed.
> 
>     stderr:
>     /data/glusterfs/myvol/brick1/brick Error during Changelog Crawl: [Errno
> 22] Invalid argument
> 
Do you find the error for same dir or every time dir is different.
 
> 
> In this case, there is "No space left on device" on the remote side, and
> another very unhelpful error ("Invalid argument" without further info);
> nevertheless `glusterfind` does not terminate right then; it continues a bit
> more until the next error:
> 
>     10.0.0.1 - pre failed; stdout (including remote stderr):
>     /data/glusterfs/myvol-production/brick1/brick Error during Changelog
> Crawl: [Errno 22] Invalid argument
> 
>     stderr:

Do you still seeing this in recent reslease?

Comment 3 Sunny Kumar 2019-11-11 12:40:07 UTC
Closing this bug as I am seeing this on latest release.

Comment 4 nh2 2019-11-11 13:02:29 UTC
I cannot easily answer this question as my deployment switched to Ceph.

Did you mean "I am NOT seeing this on latest release"?


Note You need to log in before you can comment on or make changes to this bug.