Red Hat Bugzilla – Bug 1030797
recollindex stall when indexing encrypted zip files
Last modified: 2013-11-18 03:50:19 EST
Description of problem:
When indexing files and one of them is an encrypted zip file the process stalls for several minutes.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Run "recollindex -m -w 3 -D -x" and in the path there is an encrypted zip file. After a while the output stop
No stalling during the indexing
During the stalling switch to another console. Find the process id "ps -ef |grep recoll". The file in use is always "python /usr/share/recoll/filters/rclzip" and if using the "lsof -p xxx" the process is accessing an encrypted zip file
Thanks for report.
Can you please retry with 1.19.9-1 available in testing?
(In reply to Terje RÃ¸sten from comment #1)
> Can you please retry with 1.19.9-1 available in testing?
I installed recoll.x86_64 0:1.19.9-1.fc18 but I have the same problem.
Hi and thanks for reporting this, recoll developper here.
I tried to reproduce the issue, but this is not what I see on my sample of uncrypted zip, instead I get instantly:
== Entry 1 ipath (mimetype [text/plain]):
RCLMFILT: rclzip : extractone: failed: [File localdefs.in is encrypted, password required for extraction]
Not ok, eof 0
You could try the same command to see if rclzip stalls, I guess it will.
I can see 2 possibilities:
- If you can reproduce the issue on a file which you can share (no confidentiality issue), then I'd be glad to work on the issue.
- Else, I can see no other workaround than to configure recoll to skip the file, using skippedPaths, skippedNames or by disabling zip indexing entirely.
I did not mentioned that the indexing stopped before it was finished. I thought this was related to the stalling when it tried to indexing the encrypted zip file. When the bug was created it always wrote a lot of lines with "...selectloop..", for several minutes, when it tried to process one of the encrypted files, but not now. (Changes since I created the bug. I have updated the kernel and installed python-mutagen.)
If I run from command line with "-m" it stop after some time. The output sometimes report errno=28. I don't know if 28 is ENOSPC "No space left on device". But the disk is not full.
Snip from output
:3:../utils/workqueue.h:212:DbUpd: tasks 1 nowakes 0 wsleeps 2 csleeps 0
:2:rclmonrcv.cpp:619:RclIntf::addWatch: inotify_add_watch failed. errno 28
:2:rclmonrcv.cpp:199:rclMonRcvRun: tree walk failed
(In reply to MrY from comment #4)
> I did not mentioned that the indexing stopped before it was finished. I
> thought this was related to the stalling when it tried to indexing the
> encrypted zip file. When the bug was created it always wrote a lot of lines
> with "...selectloop..", for several minutes, when it tried to process one of
> the encrypted files, but not now. (Changes since I created the bug. I have
> updated the kernel and installed python-mutagen.)
As you probably guesses, the "selectloop" thing gets printed while the indexer is waiting for an external command to complete. So I understand that the original bug cannot be reproduced ?
> If I run from command line with "-m" it stop after some time. The output
> sometimes report errno=28. I don't know if 28 is ENOSPC "No space left on
> device". But the disk is not full.
> Snip from output
> :3:../utils/workqueue.h:212:DbUpd: tasks 1 nowakes 0 wsleeps 2 csleeps 0
> :2:rclmonrcv.cpp:619:RclIntf::addWatch: inotify_add_watch failed. errno 28
> :2:rclmonrcv.cpp:199:rclMonRcvRun: tree walk failed
I would guess that inotify is running out of resources. Maybe you have a very big tree ?
There are a few indications about changing the inotify kernel parameters in the recoll manual: http://www.lesbonscomptes.com/recoll/usermanual/RCL.INDEXING.MONITOR.html
I guess that you would want to change the max_user_watches parameter.
But really the first thing might be to be sure that you really need real-time indexing on such a big tree. On many trees it would be possible to identify a small active part which needs real-time, and a big slow part for which daily or slower updates is enough. You can set this up easily with multiple indexes (which can all be queried together). Using multiple indexes is described in the manual.
Except if I'm all wrong, your tree is not big, and this is the actual bug ?
s/you probably guesses/you probably guessed/ !
Close the bug, it was not a bug.
When I followed http://www.lesbonscomptes.com/recoll/usermanual/RCL.INDEXING.MONITOR.html and increased the resources the problem disappeared.
One suggestion. Perhaps the output "errno 28" should more clearly say this is an serious error and that system resources must be increased. Also in the documentation "monitoring a big tree". Clarify how big is a "big tree".
But again it was not an error and thanks for the quick response.
Agreed, and I'll improve doc and messages.