Bug 507687 - DocSearch: WebUI displays "Â" in most of it's title links.
DocSearch: WebUI displays "Â" in most of it's title links.
Product: Red Hat Satellite 5
Classification: Red Hat
Component: WebUI (Show other bugs)
All Linux
low Severity medium
: ---
: ---
Assigned To: John Matthews
Sayli Karmarkar
Depends On:
Blocks: 457073
  Show dependency treegraph
Reported: 2009-06-23 15:01 EDT by John Matthews
Modified: 2009-09-10 15:32 EDT (History)
2 users (show)

See Also:
Fixed In Version: sat530
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-09-10 15:32:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description John Matthews 2009-06-23 15:01:55 EDT
Description of problem:

Most of the title links displayed from DocSearch contain a "Â".
Here's an example of what the titles display on a regular doc search

Chapter 3. Building Custom Packages 
3.2.2. RHN SSL Maintenance Tool Options 
Chapter 6. Manually Scripting the Configuration 
Channel Management Guide 
3.2.2. Signing packages 

It looks like the help docs contain an unprinted character and lucene is translating this to Â

Here is a snippet of the HTML as per Firefox view source
<p id="title"><a href="/rhn/help/release-notes/satellite/index.jsp"><strong>Chapter 3. Building Custom Packages</strong></a></p>

Here is a snippet of the HTML from wget with a "cat -A":
<p id="title"><a href="/rhn/help/release-notes/satellite/index.jsp"><strong>ChapterM-BM- 3.M-BM- Building Custom Packages</strong></a></p>

Notice the "M-BM-" correspond to where we see "Â" from what lucene stores

example of the data from lucene
"ChapterÂ&#160;3.Â&#160;Building Custom Packages"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. In English Locale, do a doc search for "channel"

Actual results:
Chapter 3. Building Custom Packages 

Expected results:
Chapter 3. Building Custom Packages

Additional info:
Comment 1 John Matthews 2009-06-23 16:18:17 EDT
We figured out the character in question is part of &nbsp in UTF-8 encoding.
In hex it is c2a0
Comment 2 John Matthews 2009-07-06 14:54:27 EDT
Setting nutch to use utf8 for the default encoding

  <description>The character encoding to fall back to when no other information
  is available</description>

commit 3ab14eea49d0e1c64e84b1cfe2b5e98f98c8bddd
Refs: rhn-i18n-guides-
Author:     John Matthews <jmatthew@redhat.com>
AuthorDate: Mon Jul 6 14:32:06 2009 -0400
Commit:     John Matthews <jmatthew@redhat.com>
CommitDate: Mon Jul 6 14:32:06 2009 -0400

    507687 - DocSearch force nutch to use "UTF-8" encoding.
 doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml b/doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml
index 70975c2..dcd22a6 100644
--- a/doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml
+++ b/doc-indexes/NUTCH_CONF_TEMPLATE/nutch-site.xml
@@ -55,4 +55,11 @@
     <name>file.content.limit</name> <value>-1</value>

+  <name>parser.character.encoding.default</name>
+  <value>utf8</value>
+  <description>The character encoding to fall back to when no other information
+  is available</description>
Comment 3 John Matthews 2009-07-06 15:47:22 EDT
These are the package versions that have the fix

Package: satellite-doc-indexes-5.3.17-1.el5sat
Package: satellite-doc-indexes-5.3.17-1.el4sat
Comment 4 Sayli Karmarkar 2009-07-09 12:04:37 EDT
verified. Not seeing  in doc search links now.
Comment 5 Milan Zázrivec 2009-08-21 07:20:57 EDT
Verified in stage -> RELEASE_PENDING
Comment 6 Brandon Perkins 2009-09-10 15:32:48 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.