Bug 203769 - Would like Beagle to be able to index/search OpenOffice/Open Document and PDF files
Would like Beagle to be able to index/search OpenOffice/Open Document and PDF...
Status: CLOSED WORKSFORME
Product: Fedora
Classification: Fedora
Component: beagle (Show other bugs)
5
All Linux
medium Severity medium
: ---
: ---
Assigned To: Alexander Larsson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-23 13:12 EDT by Andrig Miller
Modified: 2007-11-30 17:11 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-25 17:41:03 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andrig Miller 2006-08-23 13:12:18 EDT
Description of problem:

I would like Beagle to be able to index and search OpenOffice 1.0.x, Open
Document (OpenOffice 2.0.x) and PDF files.  I have done some testing, and I
never get these file formats in any search results through Beagle.


Version-Release number of selected component (if applicable):

0.2.6-1.fc5.1


How reproducible:

Every time you search.

Steps to Reproduce:
1. Search for anything, including the title of an OpenOffice/Open Document or
PDF file.
2.
3.
  
Actual results:

You never get any search results that match OpenOffice/Open Document or PDF files.


Expected results:

I would like OpenOffice/Open Document and PDF files to be displayed as search
results from appropriate search queries.


Additional info:
Comment 1 Alexander Larsson 2006-08-24 08:22:40 EDT
That is strange. Its suppose to handle those. Does it pick up other types of files?
Comment 2 Alexander Larsson 2006-08-24 08:27:24 EDT
I assume you get hits when you search for a filename, but not when you search
for some text that is in the document?
Comment 3 Andrig Miller 2006-08-24 18:29:07 EDT
Actually, I get hits on all other types of text files, but I never get anything
for OpenOffice/OpenDocument and PDF files.  Even if I search for a word in the
file name.

I did some more experimentation, and what I found, is it isn't finding anything
in my Documents directory under my home.  This is weirder than I thought.
Comment 4 Alexander Larsson 2006-08-25 04:10:10 EDT
Are you saying its finding OOo/PDF files that are not under the Documents directory?
Comment 5 Andrig Miller 2006-08-25 16:59:44 EDT
Sorry, I should have been more specific.  I doesn't seem to find much of
anything in my home directory and its sub-directories at all.  For example, I
have a file in my home directory called install.log, which is a basic text file.
 If I search with the term install, it doesn't find the file at all.  It runs 96
other files, in other places, but not the one from my home directory.  What is
funny, is that I also have a directory called "install" in there, and it does
find that directory.  If I do another search with the name of another directory
in my home, such as "Documents" it doesn't find it at all.  Once again, if I
search for "Desktop", it does find the Desktop directory in my home directory. 
It is hit or miss as to what it finds, even if you specify the exact name of
what you are looking for.
Comment 6 Alexander Larsson 2006-08-28 04:34:35 EDT
Hmmm, it seems more like beagle hasn't indexed all the subdirectories in your
homedir. 

What does beagle-info --status and beagle-info --index-info say?
Comment 7 Andrig Miller 2006-08-28 14:49:49 EDT
Here is the output from my laptop for the two commands listed above:

[andrig@localhost ~]$ beagle-info --status
Scheduler:
Count: 5
Status: Executing task
Immediate 0 (8/24/2006 4:31:25 PM)
file:///etc/printcap



Pending Tasks:
1 Immediate 0 (8/25/2006 4:03:51 AM)
file:///tmp/whatis.aD8530

2 Immediate 0 (8/25/2006 4:03:51 AM)
uid:cyIxdH4aM0GhE55bUdSb7g

3 Immediate 0 (8/25/2006 2:01:15 PM)
file:///etc/DIR_COLORS

4 Immediate 0 (8/25/2006 2:01:15 PM)
file:///etc/DIR_COLORS.xterm

5 Immediate 0 (8/25/2006 2:01:41 PM)
file:///etc/scsi_id.config

6 Immediate 0 (8/25/2006 2:01:45 PM)
file:///etc/ld.so.cache

7 Immediate 0 (8/25/2006 2:49:06 PM)
file:///tmp/10161.cs

8 Immediate 0 (8/25/2006 2:49:10 PM)
uid:ZjPcgAOL8kSH7fxNetD4Yw

9 Immediate 0 (8/25/2006 2:52:24 PM)
file:///tmp/86631.cs

10 Immediate 0 (8/25/2006 2:52:26 PM)
uid:LUEMVBtqQkKxao_zTLjraw

11 Immediate 0 (8/25/2006 9:32:02 PM)
uid:3C9cy0J9t0Omye7U1pnOdg

12 Immediate 0 (8/26/2006 4:03:32 AM)
file:///tmp/whatis.d19683

13 Immediate 0 (8/26/2006 4:03:32 AM)
uid:NQvONLeXe0y3c_j0pTacXA

14 Immediate 0 (8/26/2006 4:04:45 AM)
file:///etc/prelink.cache

15 Immediate 0 (8/27/2006 4:03:22 AM)
file:///tmp/whatis.L30093

16 Immediate 0 (8/27/2006 4:03:22 AM)
uid:7xXCvmu9skmgTLVmd8oh+Q

17 Immediate 0 (8/27/2006 4:25:24 AM)
file:///tmp/whatis.h30337

18 Immediate 0 (8/27/2006 4:25:24 AM)
uid:CjnITA7ITkC+vo0Yr_T2cw

19 Immediate 0 (8/28/2006 4:03:20 AM)
file:///tmp/whatis.T23949

20 Immediate 0 (8/28/2006 4:03:20 AM)
uid:nCJ9rj2LREyn5IAD2cXHPw

21 Immediate 0 (8/28/2006 12:44:53 PM)
file:///etc/printcap

22 Maintenance 100 (8/24/2006 4:31:26 PM)
Final Flush for FileSystemIndex

23 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Inbox;uid=3549#0

24 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Inbox;uid=3549#1

25 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Inbox;uid=3549#2

26 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Inbox;uid=3549#3

27 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Inbox;uid=3549#4

28 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Inbox;uid=3549#5

29 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Inbox;uid=3549#6

30 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Inbox;uid=3549#7

31 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=4#0

32 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=6#0

33 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=8#0

34 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=10#0

35 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=12#0

36 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=14#0

37 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=16#0

38 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=18#0

39 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=20#0

40 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=22#0

41 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=24#0

42 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=26#0

43 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=28#0

44 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=30#0

45 Delayed 1 (8/24/2006 4:31:26 PM)
email://local@local/Sent;uid=32#0

46 Maintenance 0 (8/24/2006 4:30:04 PM)
Optimize EvolutionDataServerIndex

47 Maintenance 0 (8/24/2006 4:30:04 PM)
Optimize KMailIndex

48 Maintenance 0 (8/24/2006 4:30:04 PM)
Optimize FileSystemIndex

49 Maintenance 0 (8/24/2006 4:30:05 PM)
Optimize GaimLogIndex

50 Maintenance 0 (8/24/2006 4:30:05 PM)
Optimize IndexingServiceIndex

51 Maintenance 0 (8/24/2006 4:30:05 PM)
Optimize BlamIndex

52 Maintenance 0 (8/24/2006 4:30:05 PM)
Optimize LifereaIndex

53 Maintenance 0 (8/24/2006 4:30:05 PM)
Optimize AkregatorIndex

54 Maintenance 0 (8/24/2006 4:30:05 PM)
Optimize KonqHistoryIndex

55 Maintenance 0 (8/24/2006 4:30:05 PM)
Optimize KopeteIndex

56 Delayed 0 (8/24/2006 4:31:07 PM)
/home/andrig/.evolution/mail/imap/anmiller@pobox-2.corp.redhat.com/folders/INBOX/subfolders/Drafts/summary

57 Delayed 0 (8/24/2006 4:31:07 PM)
/home/andrig/.evolution/mail/imap/anmiller@pobox-2.corp.redhat.com/folders/INBOX/subfolders/TCAB/summary

58 Maintenance 100 (8/24/2006 4:31:22 PM)
Final Flush for EvolutionMailIndex

59 Delayed 0 (8/24/2006 4:31:07 PM)
Tree Crawler
Pending directories: 6

60 Delayed 0 (8/24/2006 4:31:08 PM)
File Crawler

61 Delayed 0 (8/24/2006 4:31:08 PM)
/home/andrig/.evolution/mail/imap/anmiller@pobox-2.corp.redhat.com/folders/INBOX/subfolders/sent-mail/summary

62 Delayed 0 (8/24/2006 4:31:08 PM)
/home/andrig/.evolution/mail/imap/anmiller@pobox-2.corp.redhat.com/folders/INBOX/subfolders/Expenses/summary

63 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/05deac0a-7cef-43e0-ab73-f61a090cb3b9

64 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/2165758d-0613-4075-9bb1-c8058c890d50

65 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/41946dff-bf4e-41c3-bcca-47e57407973f

66 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/4e830bf5-6e62-4196-82ee-a02f5294fee3

67 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/5ad0317c-bd61-4c0a-a4b2-30534177ead1

68 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/6b3e3e32-9d05-4fdd-88b0-1a7c3d148b81

69 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/7778e548-72f3-4fb6-9f92-7fad0ee178a7

70 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/a1e2e6b1-fa09-4080-8ca4-8b13d2c20c11

71 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/b0d8452e-d5e7-48e8-bb3b-aa161bcfc344

72 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/b1387f79-8a93-465a-b24f-eea826f056f9

73 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/ba09f556-6196-4045-897d-f1136a170a28

74 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/c26284f7-1f46-4879-99bc-3867667d6cbb

75 Delayed 0 (8/24/2006 4:31:08 PM)
note://tomboy/c2f3ea33-d4fe-4a49-a1fc-fff75ea09a2e

76 Maintenance 0 (8/24/2006 4:30:05 PM)
Optimize TomboyIndex

77 Delayed 0 (8/24/2006 4:31:26 PM)
/home/andrig/.evolution/mail/imap/anmiller@pobox-2.corp.redhat.com/folders/INBOX/subfolders/Integration/summary

78 Delayed 0 (8/25/2006 9:43:03 AM)
/home/andrig/.evolution/mail/imap/anmiller@pobox-2.corp.redhat.com/folders/INBOX/summary

79 Maintenance 0 (8/24/2006 4:31:26 PM)
Optimize EvolutionMailIndex

80 Delayed 0 (8/24/2006 4:35:51 PM)
uid:vZevSfbo4keZViLXBWxuxA

81 Delayed 0 (8/25/2006 11:00:25 AM)
uid:4QgeHoI_zEaeDFt09V+1tA


[andrig@localhost ~]$ beagle-info --index-info
Index information:
Name: EvolutionDataServer
Count: 7
Indexing: False

Name: EvolutionMail
Count: 38
Indexing: False

Name: KMail
Count: 0
Indexing: False

Name: Files
Count: 21873
Indexing: True

Name: GaimLog
Count: 0
Indexing: False

Name: IndexingService
Count: 77
Indexing: False

Name: Tomboy
Count: 1
Indexing: False

Name: Blam
Count: 0
Indexing: False

Name: Liferea
Count: 0
Indexing: False

Name: Akregator
Count: 0
Indexing: False

Name: KonquerorHistory
Count: 0
Indexing: False

Name: Kopete
Count: 0
Indexing: False

Name: applications
Count: 272
Indexing: False

Name: documentation
Count: 7086
Indexing: False
Comment 8 Alexander Larsson 2006-08-29 03:28:06 EDT
I not 100% sure how to read this, but i think it means its still indexing your
homedir. There was another bugreport of how indexing was taking a long time, and
this was recently fixed upstream (and in rawhide). Maybe its just not done yet?
Comment 9 Andrig Miller 2006-09-02 17:40:20 EDT
Well, if it takes weeks, that is a long time.  Do you know what this other bug
is, so I could take a look and see if it matches up with what's happening on my
system?
Comment 10 Alexander Larsson 2006-09-04 09:00:26 EDT
That was bug 187475
Comment 11 Andrig Miller 2007-06-15 16:38:41 EDT
This can be closed.  Since this, I have upgraded to FC6, and now both
OpenOffice.org and PDF files are getting index (speedily, I might add), and I
can find them through the search interface.  This has been fixed for awhile.
Comment 12 redhat-bugs2eran 2007-06-15 17:01:35 EDT
I had a similar problem with Fedora 7, but it turns out to have a different
cause: something overrode the MIME setup in my account and associated *.pdf with
application/x-extension-pdf instead of the default application/pdf. This broke
Beagle's PDF indexing.

Manually eliminating all mentions of application/x-extension-pdf in
~/.{local,gnome}* fixed the problem.

Note You need to log in before you can comment on or make changes to this bug.