Bug 507687 - DocSearch: WebUI displays "Â" in most of it's title links.
Summary: DocSearch: WebUI displays "Â" in most of it's title links.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Satellite 5
Classification: Red Hat
Component: WebUI
Version: 530
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: John Matthews
QA Contact: Sayli Karmarkar
URL:
Whiteboard:
Depends On:
Blocks: 457073
TreeView+ depends on / blocked
 
Reported: 2009-06-23 19:01 UTC by John Matthews
Modified: 2009-09-10 19:32 UTC (History)
2 users (show)

Fixed In Version: sat530
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-10 19:32:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description John Matthews 2009-06-23 19:01:55 UTC
Description of problem:

Most of the title links displayed from DocSearch contain a "Â".
Here's an example of what the titles display on a regular doc search

Chapter 3. Building Custom Packages 
3.2.2. RHN SSL Maintenance Tool Options 
Chapter 6. Manually Scripting the Configuration 
Channel Management Guide 
Index 
3.2.2. Signing packages 



It looks like the help docs contain an unprinted character and lucene is translating this to Â


Here is a snippet of the HTML as per Firefox view source
<p id="title"><a href="/rhn/help/release-notes/satellite/index.jsp"><strong>Chapter 3. Building Custom Packages</strong></a></p>


Here is a snippet of the HTML from wget with a "cat -A":
<p id="title"><a href="/rhn/help/release-notes/satellite/index.jsp"><strong>ChapterM-BM- 3.M-BM- Building Custom Packages</strong></a></p>

Notice the "M-BM-" correspond to where we see "Â" from what lucene stores


example of the data from lucene
"ChapterÂ&#160;3.Â&#160;Building Custom Packages"




Version-Release number of selected component (if applicable):
Satellite-5.3.0-RHEL5-re20090619.0-i386-embedded-oracle.iso

How reproducible:
Always

Steps to Reproduce:
1. In English Locale, do a doc search for "channel"

  
Actual results:
Chapter 3. Building Custom Packages 

Expected results:
Chapter 3. Building Custom Packages

Additional info:

Comment 1 John Matthews 2009-06-23 20:18:17 UTC
We figured out the character in question is part of &nbsp in UTF-8 encoding.
In hex it is c2a0

Comment 2 John Matthews 2009-07-06 18:54:27 UTC
Setting nutch to use utf8 for the default encoding

<property>
  <name>parser.character.encoding.default</name>
  <value>utf8</value>
  <description>The character encoding to fall back to when no other information
  is available</description>
</property>




commit 3ab14eea49d0e1c64e84b1cfe2b5e98f98c8bddd
Refs: rhn-i18n-guides-5.3.0.8-1-1-g3ab14ee
Author:     John Matthews <jmatthew>
AuthorDate: Mon Jul 6 14:32:06 2009 -0400
Commit:     John Matthews <jmatthew>
CommitDate: Mon Jul 6 14:32:06 2009 -0400

    507687 - DocSearch force nutch to use "UTF-8" encoding.
---
 doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml b/doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml
index 70975c2..dcd22a6 100644
--- a/doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml
+++ b/doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml
@@ -55,4 +55,11 @@
     <name>file.content.limit</name> <value>-1</value>
 </property>

+<property>
+  <name>parser.character.encoding.default</name>
+  <value>utf8</value>
+  <description>The character encoding to fall back to when no other information
+  is available</description>
+</property>
+

Comment 3 John Matthews 2009-07-06 19:47:22 UTC
These are the package versions that have the fix

Package: satellite-doc-indexes-5.3.17-1.el5sat
Package: satellite-doc-indexes-5.3.17-1.el4sat

Comment 4 Sayli Karmarkar 2009-07-09 16:04:37 UTC
verified. Not seeing  in doc search links now.

Comment 5 Milan Zázrivec 2009-08-21 11:20:57 UTC
Verified in stage -> RELEASE_PENDING

Comment 6 Brandon Perkins 2009-09-10 19:32:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1434.html


Note You need to log in before you can comment on or make changes to this bug.