Bug 1529916

Summary:	glusterfind doesn't terminate when it fails
Product:	[Community] GlusterFS	Reporter:	nh2 <nh2-redhatbugzilla>
Component:	glusterfind	Assignee:	Shwetha K Acharya <sacharya>
Status:	CLOSED WORKSFORME	QA Contact:
Severity:	high	Docs Contact:
Priority:	medium
Version:	mainline	CC:	avishwan, bugs, khiremat, sunkumar
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-11-11 12:40:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description nh2 2017-12-31 01:41:26 UTC

Description of problem:

Gluster 3.12.3.

When a `glusterfind pre` invocation fails due to unrecoverable errors, `glusterfind` doesn't terminate, it just prints some more weird errors and then hangs:

    [root@node-1:~]# glusterfind pre --no-encode --regenerate-outfile cdn-rsync-myvol myvol /tmp/cdn-rsync-myvol.outfile
    10.0.0.2 - pre failed; stdout (including remote stderr):
    Connection to 10.0.0.2 closed.

    stderr:
    Fail to create dir /var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-013544-927166-Nl7zE1/eef7fa430d9ab60e74ec72b66629f783f9fb37fa: [Errno 28] No space left on device: '/var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-013544-927166-Nl7zE1'


    10.0.0.3 - pre failed; stdout (including remote stderr):
    Connection to 10.0.0.3 closed.

    stderr:
    /data/glusterfs/myvol/brick1/brick Error during Changelog Crawl: [Errno 22] Invalid argument


In this case, there is "No space left on device" on the remote side, and another very unhelpful error ("Invalid argument" without further info); nevertheless `glusterfind` does not terminate right then; it continues a bit more until the next error:

    10.0.0.1 - pre failed; stdout (including remote stderr):
    /data/glusterfs/myvol-production/brick1/brick Error during Changelog Crawl: [Errno 22] Invalid argument

    stderr:

Comment 1 nh2 2017-12-31 01:49:17 UTC

`glusterfind`, when it hangs, also seems to ignore Ctrl+C, only an explicit `kill $(pidof glusterfind)` shuts it down.

Comment 2 Sunny Kumar 2019-09-26 09:56:35 UTC

(In reply to nh2 from comment #0)
> Description of problem:
> 
> Gluster 3.12.3.
> 
> When a `glusterfind pre` invocation fails due to unrecoverable errors,
> `glusterfind` doesn't terminate, it just prints some more weird errors and
> then hangs:
> 
>     [root@node-1:~]# glusterfind pre --no-encode --regenerate-outfile
> cdn-rsync-myvol myvol /tmp/cdn-rsync-myvol.outfile
>     10.0.0.2 - pre failed; stdout (including remote stderr):
>     Connection to 10.0.0.2 closed.
> 
>     stderr:
>     Fail to create dir
> /var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-
> 013544-927166-Nl7zE1/eef7fa430d9ab60e74ec72b66629f783f9fb37fa: [Errno 28] No
> space left on device:
> '/var/var/lib/misc/glusterfsd/glusterfind/cdn-rsync-myvol/myvol/20171231-
> 013544-927166-Nl7zE1'
> 
> 
>     10.0.0.3 - pre failed; stdout (including remote stderr):
>     Connection to 10.0.0.3 closed.
> 
>     stderr:
>     /data/glusterfs/myvol/brick1/brick Error during Changelog Crawl: [Errno
> 22] Invalid argument
> 
Do you find the error for same dir or every time dir is different.
 
> 
> In this case, there is "No space left on device" on the remote side, and
> another very unhelpful error ("Invalid argument" without further info);
> nevertheless `glusterfind` does not terminate right then; it continues a bit
> more until the next error:
> 
>     10.0.0.1 - pre failed; stdout (including remote stderr):
>     /data/glusterfs/myvol-production/brick1/brick Error during Changelog
> Crawl: [Errno 22] Invalid argument
> 
>     stderr:

Do you still seeing this in recent reslease?

Comment 3 Sunny Kumar 2019-11-11 12:40:07 UTC

Closing this bug as I am seeing this on latest release.

Comment 4 nh2 2019-11-11 13:02:29 UTC

I cannot easily answer this question as my deployment switched to Ceph.

Did you mean "I am NOT seeing this on latest release"?