This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1030797 - recollindex stall when indexing encrypted zip files
recollindex stall when indexing encrypted zip files
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: recoll (Show other bugs)
19
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Terje Røsten
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-15 02:19 EST by MrY
Modified: 2013-11-18 03:50 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-18 03:50:19 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description MrY 2013-11-15 02:19:34 EST
Description of problem:
When indexing files and one of them is an encrypted zip file the process stalls for several minutes.

Version-Release number of selected component (if applicable):
recoll-1.19.4-2.fc18.x86_64

How reproducible:
Everytime

Steps to Reproduce:
Run "recollindex -m -w 3 -D -x" and in the path there is an encrypted zip file. After a while the output stop

Expected results:
No stalling during the indexing

Additional info:
During the stalling switch to another console. Find the process id "ps -ef |grep recoll". The file in use is always "python /usr/share/recoll/filters/rclzip" and if using the "lsof -p xxx" the process is accessing an encrypted zip file
Comment 1 Terje Røsten 2013-11-15 03:58:58 EST
Thanks for report.

Can you please retry with 1.19.9-1 available in testing?

 https://admin.fedoraproject.org/updates/FEDORA-2013-20825/recoll-1.19.9-1.fc18
Comment 2 MrY 2013-11-15 06:09:38 EST
(In reply to Terje Røsten from comment #1)
> 
> Can you please retry with 1.19.9-1 available in testing?
> 
>  https://admin.fedoraproject.org/updates/FEDORA-2013-20825/recoll-1.19.9-1.
> fc18

I installed recoll.x86_64 0:1.19.9-1.fc18 but I have the same problem.
Comment 3 Jean-Francois Dockes 2013-11-15 12:30:04 EST
Hi and thanks for reporting this, recoll developper here.

I tried to reproduce the issue, but this is not what I see on my sample of uncrypted zip, instead I get instantly:

    /usr/local/share/recoll/filters/rclzip encrypted.zip  
    == Entry 1 ipath  (mimetype [text/plain]):
    
    RCLMFILT: rclzip : extractone: failed: [File localdefs.in is encrypted, password required for extraction]
    Not ok, eof 0

You could try the same command to see if rclzip stalls, I guess it will.

I can see 2 possibilities:

 - If you can reproduce the issue on a file which you can share (no confidentiality issue), then I'd be glad to work on the issue.

 - Else, I can see no other workaround than to configure recoll to skip the file, using  skippedPaths, skippedNames or by disabling zip indexing entirely.

Cheers,

jf
Comment 4 MrY 2013-11-17 09:20:41 EST
I did not mentioned that the indexing stopped before it was finished. I thought this was related to the stalling when it tried to indexing the encrypted zip file. When the bug was created it always wrote a lot of lines with "...selectloop..", for several minutes, when it tried to process one of the encrypted files, but not now. (Changes since I created the bug. I have updated the kernel and installed python-mutagen.)

If I run from command line with "-m" it stop after some time. The output sometimes report errno=28. I don't know if 28 is ENOSPC "No space left on device". But the disk is not full.

Snip from output
...
:3:../utils/workqueue.h:212:DbUpd: tasks 1 nowakes 0 wsleeps 2 csleeps 0
:2:rclmonrcv.cpp:619:RclIntf::addWatch: inotify_add_watch failed. errno 28
:2:rclmonrcv.cpp:199:rclMonRcvRun: tree walk failed
...
Comment 5 Jean-Francois Dockes 2013-11-17 15:16:58 EST
(In reply to MrY from comment #4)
> I did not mentioned that the indexing stopped before it was finished. I
> thought this was related to the stalling when it tried to indexing the
> encrypted zip file. When the bug was created it always wrote a lot of lines
> with "...selectloop..", for several minutes, when it tried to process one of
> the encrypted files, but not now. (Changes since I created the bug. I have
> updated the kernel and installed python-mutagen.)

As you probably guesses, the "selectloop" thing gets printed while the indexer is waiting for an external command to complete. So I understand that the original bug cannot be reproduced ?

> 
> If I run from command line with "-m" it stop after some time. The output
> sometimes report errno=28. I don't know if 28 is ENOSPC "No space left on
> device". But the disk is not full.
> 
> Snip from output
> ...
> :3:../utils/workqueue.h:212:DbUpd: tasks 1 nowakes 0 wsleeps 2 csleeps 0
> :2:rclmonrcv.cpp:619:RclIntf::addWatch: inotify_add_watch failed. errno 28
> :2:rclmonrcv.cpp:199:rclMonRcvRun: tree walk failed
> ...

I would guess that inotify is running out of resources. Maybe you have a very big tree ?

There are a few indications about changing the inotify kernel parameters in the recoll manual: http://www.lesbonscomptes.com/recoll/usermanual/RCL.INDEXING.MONITOR.html

I guess that you would want to change the max_user_watches parameter.

But really the first thing might be to be sure that you really need real-time indexing on such a big tree. On many trees it would be possible to identify a small active part which needs real-time, and a big slow part for which daily or slower updates is enough. You can set this up easily with multiple indexes (which can all be queried together). Using multiple indexes is described in the manual.

Except if I'm all wrong, your tree is not big, and this is the actual bug ?
Comment 6 Jean-Francois Dockes 2013-11-17 15:26:45 EST
s/you probably guesses/you probably guessed/ !
Comment 7 MrY 2013-11-18 02:04:16 EST
Close the bug, it was not a bug.

When I followed http://www.lesbonscomptes.com/recoll/usermanual/RCL.INDEXING.MONITOR.html and increased the resources the problem disappeared.

One suggestion. Perhaps the output "errno 28" should more clearly say this is an serious error and that system resources must be increased. Also in the documentation "monitoring a big tree". Clarify how big is a "big tree". 

But again it was not an error and thanks for the quick response.
Comment 8 Jean-Francois Dockes 2013-11-18 02:56:57 EST
Agreed, and I'll improve doc and messages.
Comment 9 Terje Røsten 2013-11-18 03:50:19 EST
Thanks guys!

Note You need to log in before you can comment on or make changes to this bug.