Bug 484610 - Doc Search results are not good enough yet..
Summary: Doc Search results are not good enough yet..
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Other
Version: 530
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: John Matthews
QA Contact: wes hayutin
URL: http://rlx-2-04.rhndev.redhat.com/rhn...
Whiteboard:
Depends On:
Blocks: 457073
TreeView+ depends on / blocked
 
Reported: 2009-02-09 00:21 UTC by wes hayutin
Modified: 2009-10-28 19:49 UTC (History)
2 users (show)

Fixed In Version: sat530-unconfirmed
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-10-28 19:49:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description wes hayutin 2009-02-09 00:21:52 UTC
Description of problem:

Ok.. 
here is my test... 

Use doc search and search w/ content and title for "kickstart"
You get a list of hits on kickstart.. ok..

Now..  go to the 5.2.0 reference guide (which is what is indexed in the above test)

1. open url
http://rlx-2-04.rhndev.redhat.com/rhn/help/Search.do?Search!.x=13&Search!.y=19&search_string=kickstart&view_mode=search_content_title

2. select all, copy and paste it into a file.  You are pasting all the "titles" for the reference guide.

3. now run the following
[whayutin@localhost ~]$ cat doc.txt | grep -i kickstart
        6.4.9. Kickstart — 
        8.1.1. Create a Kickstart Profile for the Guest Systems
        8.1.2. Kickstart Your Host System
        8.2.1. Create a Kickstart Profile for the Guest Systems


6.4.9, 8.1.1 8.1.2 and 8.2.1 are not in the first page of the results.

Now try a doc search of kickstart just using  "title"

You get:
Page Title  	Summary
6.4.9. Kickstart — 	... server containing the kickstart configuration file. This kickstart configuration file in turn ... a List of ...
Chapter 7. Implementing Kickstart 	... Chapter 7. Implementing Kickstart Chapter 7. Implementing Kickstart Prev Next Chapter 7. Implementing ... of rhnreg_ks, the kickstart ... 

You should get all of the following.
        6.4.9. Kickstart — 
        8.1.1. Create a Kickstart Profile for the Guest Systems
        8.1.2. Kickstart Your Host System
        8.2.1. Create a Kickstart Profile for the Guest Systems

Comment 1 John Matthews 2009-02-09 20:53:30 UTC
These are the issues I see:

1) The WebUI was reversing the order of doc search results, so it was displaying the worst scoring result first.  
fixed in spacewalk master, http://git.fedoraproject.org/git/?p=spacewalk.git;a=commitdiff;h=ec3baf98993626cee1f2ba7096b7bc9871e1ac7d


2) the html "title" info displayed by help docs is not as consistent as hoped.   For most cases, the title is useful and it corresponds to the subsection header/title as called out from the Table of Contents, for other sections the data is not broken out as finely and it's all on one page.  When all the data is on one page, all info indexed from that will share the common title.  

This is part of the issue identified in comment #1
When doing a search ONLY on title for "kickstart", we expected at least the results from the Reference Guide:
        6.4.9. Kickstart — 
        8.1.1. Create a Kickstart Profile for the Guest Systems
        8.1.2. Kickstart Your Host System
        8.2.1. Create a Kickstart Profile for the Guest Systems

Yet we get:
 Chapter 7. Implementing Kickstart (Client Configuration)
 6.4.9. Kickstart —


The reason 8.1.1, 8.1.2, and 8.2.1 is missing is because they are not technically "titles", in the sense of html <title>.  Doc Search only indexes the html value for <title> under a title field.  All entries for 8.1.1, 8.1.2, and 8.2.1 share the common title "Chapter 8. Virtualization"

It looks like most of the help docs split out subsections into their own page, with a corresponding html <title>, Chapter 8 from the Reference Guide seems to be one of the problem spots which is a little different. (I also see 2 sections from the Channel Management Guide which will be problematic)
http://www.redhat.com/docs/manuals/satellite/Red_Hat_Network_Satellite-5.2.0/html/Reference_Guide/ch-virtualization.html#virtualization-host
http://www.redhat.com/docs/manuals/satellite/Red_Hat_Network_Satellite-5.2.0/html/Channel_Management_Guide/Channel_Management_Guide-Custom_Errata_Management.html#Channel_Management_Guide-Manage_Errata-Unpublished_Errata

From my perspective the ideal fix would be if docs could be modified so that each link under the table of contents is it's own page, with correct html <title>.  

If this isn't possible, we could look into writing a nutch plugin.

Comment 2 John Matthews 2009-02-09 20:54:01 UTC
Making bug public

Comment 4 John Matthews 2009-04-04 22:19:09 UTC
Placing on MODIFIED.

The result sorting has been fixed so best results are on the top.

As comment #2 mentions, the issue with title search is related to the docs and how they are written.  Speaking with John Ha, it looks like breaking out the docs finer would be easy, we did not request this because of the possible impact it would have to the help links embedded through our code.  

If title search results are still less than desired, lets open up a new bug on that.

Comment 5 wes hayutin 2009-04-15 16:08:37 UTC
looks good
verified

Comment 6 Jan Pazdziora 2009-08-19 15:43:09 UTC
The "Documentation" is in the pulldown menu next to the search textfield but when I try to search, I get to page

https://xen5.englab.brq.redhat.com/rhn/help/Search.do?search_string=kickstart&view_mode=search_content_title

which has a red message

    * Index files missing from search-server. Assuming data exists in database,
indexes can be regenerated by running: /etc/init.d/rhn-search cleanindex, then
restart rhn-search

When I do

[root@xen5 ~]# /etc/init.d/rhn-search cleanindex
Stopping rhn-search...
Waiting for rhn-search to exit...
Stopped rhn-search.
2009-08-19 17:11:06,713 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Running query: deleteLastErrata
2009-08-19 17:11:15,204 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Attempting to delete /usr/share/rhn/search/indexes/errata
2009-08-19 17:11:16,758 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Running query: deleteLastPackage
2009-08-19 17:11:17,082 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Attempting to delete /usr/share/rhn/search/indexes/package
2009-08-19 17:11:18,935 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Running query: deleteLastServer
2009-08-19 17:11:19,222 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Attempting to delete /usr/share/rhn/search/indexes/server
2009-08-19 17:11:19,589 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Running query: deleteLastHardwareDevice
2009-08-19 17:11:19,862 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Attempting to delete /usr/share/rhn/search/indexes/hwdevice
2009-08-19 17:11:19,913 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Running query: deleteLastSnapshotTag
2009-08-19 17:11:20,005 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Running query: deleteLastServerCustomInfo
2009-08-19 17:11:20,072 [main] INFO  com.redhat.satellite.search.DeleteIndexes
- Index files have been deleted and database has been cleaned up, ready to
reindex
Starting rhn-search...
[root@xen5 ~]# service rhn-search restart
Stopping rhn-search...
Waiting for rhn-search to exit...
Waiting for rhn-search to exit...
Waiting for rhn-search to exit...
Waiting for rhn-search to exit...
Stopped rhn-search.
Starting rhn-search...
[root@xen5 ~]# 

and do the search again, the the result is the same?

So in stage, we are stuck with the fact that the "You get a list of hits on kickstart.. ok.." precondition does not seem to be met.

Is that correct?

Comment 7 John Matthews 2009-08-20 13:22:06 UTC
No, this is not correct.  The results should be displayed.  I have not been able to replicate the problem Jan has seen in comment #6.  I tried several workstations in the RDU office and they did not see the problem either. 

We verified the satellite used has doc indexes and this worked fine for me when I accessed the same satellite mentioned in the bz.

It looks like this might be a problem related to different browsers.

Jan was seeing this on firefox 3.5 and I *think* konqueror

I tested this on a few machines in RDU and my home setup, I did not see the problem. I used:
Fedora-9 firefox 3.0.7
Fedora-10 firefox 3.0.8
Fedora-11 firefox 3.5.2
Fedora-11 konqueror 4.2.4

Comment 8 Jan Pazdziora 2009-08-21 15:04:53 UTC
The problem in comment 5 was caused by bug 518664.


Note You need to log in before you can comment on or make changes to this bug.