Bug 1529916

Summary: glusterfind doesn't terminate when it fails
Product: [Community] GlusterFS Reporter: nh2 <nh2-redhatbugzilla>
Component: glusterfindAssignee: Shwetha K Acharya <sacharya>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: mainlineCC: avishwan, bugs, khiremat, sunkumar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-11 12:40:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description nh2 2017-12-31 01:41:26 UTC
Description of problem:

Gluster 3.12.3.

When a `glusterfind pre` invocation fails due to unrecoverable errors, `glusterfind` doesn't terminate, it just prints some more weird errors and then hangs:

    [root@node-1:~]# glusterfind pre --no-encode --regenerate-outfile cdn-rsync-myvol myvol /tmp/cdn-rsync-myvol.outfile
    10.0.0.2 - pre failed; stdout (including remote stderr):
    Connection to 10.0.0.2 closed.

    stderr:
    Fail to create dir /var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-013544-927166-Nl7zE1/eef7fa430d9ab60e74ec72b66629f783f9fb37fa: [Errno 28] No space left on device: '/var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-013544-927166-Nl7zE1'


    10.0.0.3 - pre failed; stdout (including remote stderr):
    Connection to 10.0.0.3 closed.

    stderr:
    /data/glusterfs/myvol/brick1/brick Error during Changelog Crawl: [Errno 22] Invalid argument


In this case, there is "No space left on device" on the remote side, and another very unhelpful error ("Invalid argument" without further info); nevertheless `glusterfind` does not terminate right then; it continues a bit more until the next error:

    10.0.0.1 - pre failed; stdout (including remote stderr):
    /data/glusterfs/myvol-production/brick1/brick Error during Changelog Crawl: [Errno 22] Invalid argument

    stderr:

Comment 1 nh2 2017-12-31 01:49:17 UTC
`glusterfind`, when it hangs, also seems to ignore Ctrl+C, only an explicit `kill $(pidof glusterfind)` shuts it down.

Comment 2 Sunny Kumar 2019-09-26 09:56:35 UTC
(In reply to nh2 from comment #0)
> Description of problem:
> 
> Gluster 3.12.3.
> 
> When a `glusterfind pre` invocation fails due to unrecoverable errors,
> `glusterfind` doesn't terminate, it just prints some more weird errors and
> then hangs:
> 
>     [root@node-1:~]# glusterfind pre --no-encode --regenerate-outfile
> cdn-rsync-myvol myvol /tmp/cdn-rsync-myvol.outfile
>     10.0.0.2 - pre failed; stdout (including remote stderr):
>     Connection to 10.0.0.2 closed.
> 
>     stderr:
>     Fail to create dir
> /var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-
> 013544-927166-Nl7zE1/eef7fa430d9ab60e74ec72b66629f783f9fb37fa: [Errno 28] No
> space left on device:
> '/var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-
> 013544-927166-Nl7zE1'
> 
> 
>     10.0.0.3 - pre failed; stdout (including remote stderr):
>     Connection to 10.0.0.3 closed.
> 
>     stderr:
>     /data/glusterfs/myvol/brick1/brick Error during Changelog Crawl: [Errno
> 22] Invalid argument
> 
Do you find the error for same dir or every time dir is different.
 
> 
> In this case, there is "No space left on device" on the remote side, and
> another very unhelpful error ("Invalid argument" without further info);
> nevertheless `glusterfind` does not terminate right then; it continues a bit
> more until the next error:
> 
>     10.0.0.1 - pre failed; stdout (including remote stderr):
>     /data/glusterfs/myvol-production/brick1/brick Error during Changelog
> Crawl: [Errno 22] Invalid argument
> 
>     stderr:

Do you still seeing this in recent reslease?

Comment 3 Sunny Kumar 2019-11-11 12:40:07 UTC
Closing this bug as I am seeing this on latest release.

Comment 4 nh2 2019-11-11 13:02:29 UTC
I cannot easily answer this question as my deployment switched to Ceph.

Did you mean "I am NOT seeing this on latest release"?