Bug 507679

Summary: DocSearch: Problem with encoding of i18n strings
Product: Red Hat Satellite 5 Reporter: John Matthews <jmatthew>
Component: WebUIAssignee: John Matthews <jmatthew>
Status: CLOSED CURRENTRELEASE QA Contact: Milan Zázrivec <mzazrivec>
Severity: medium Docs Contact:
Priority: low    
Version: 530CC: cperry, jesusr, msuchy
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: sat530 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-10 19:32:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 457073    

Description John Matthews 2009-06-23 18:23:50 UTC
Description of problem:

Documentation search is displaying corrupted results for languages like Hindi.

When submitting the query for say "पंजीयन"
The string to search on is changing to:   "पà¤à¤à¥à¤¯à¤¨"


Version-Release number of selected component (if applicable):
ISO: Satellite-5.3.0-RHEL5-re20090619.0-i386-embedded-oracle.iso


How reproducible:
Always

Steps to Reproduce:
1. Change locale to Hindi in Firefox or Set this as your locale choice in Satellite.
2. Go to: /rhn/help/Search.do
3. Enter: "पंजीयन"  as your search term
  
Actual results:
The search string is translated to: "पà¤à¤à¥à¤¯à¤¨"
The results also look incorrect with bad characters which do not represent Hindi.

Expected results:
The search string should remain: "पंजीयन"
The results should be in Hindi characters




Additional info:

Comment 1 John Matthews 2009-06-23 18:35:26 UTC
Here is background on the problem.

1) We submit a form with a unicode character string.
2) The java action "DocSearchSetupAction" receives the string in the correct encoding, the locale is set to Hindi and the encoding is set to "UTF-8".
3) The action processes the form vars, then stuffs them into request parameters and does a forward.
4) We enter back into DocSearchSetupAction, now we are getting our parameters from a "GET".   The searchString has been changed at this point.

Character encoding operates on the content and on the get parameters separately.  I believe the issue here is that the GET parameters are not being encoded as UTF-8.

A fix that looks to work is to add add an attribute "URIEncoding="UTF-8" to the Connector element.

Example:  /etc/tomcat5/server.xml

      <!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->
    <Connector port="8080" maxHttpHeaderSize="8192"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true" 
               URIEncoding="UTF-8" />


    <!-- Define an AJP 1.3 Connector on port 8009 -->
    <Connector port="8009" 
        enableLookups="false" redirectPort="8443" protocol="AJP/1.3" 
        URIEncoding="UTF-8"/>




For more info see:  
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q2

Comment 2 John Matthews 2009-06-24 21:58:17 UTC
This is the fix in VADER branch

 spacewalk/config/etc/sysconfig/rhn-satellite-prep/etc/tomcat5/server.xml |  394 ++++++++++
 1 file changed, 394 insertions(+)

New commits:
commit b0d90ccfcb4298fbc5e8aced5231a7fd390945dc
Author: John Matthews <jmatthew>
Date:   Wed Jun 24 11:42:14 2009 -0400

    507679 - Set URIEncoding to UTF-8 for tomcats server.xml config file
    Intent of this change is to force encoding of GET parameters to be in UTF-8
    This is the default server.xml from tomcat RPM with a change of URIEncoding="UTF-8"
    added to the Connector elements.
    For background info reference: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q2




TESTPLAN Note:

While testing, please verify that /etc/tomcat5/server.xml
contains URIEncoding="UTF-8" for the connector element on port 8009 as per comment 1.

Comment 3 Milan Zázrivec 2009-07-09 12:14:11 UTC
Verified, spacewalk-search-0.5.10-14, /etc/tomcat5/server.xml:
...
<Connector port="8009" 
        enableLookups="false" redirectPort="8443" protocol="AJP/1.3" 
        URIEncoding="UTF-8"/>
...

Comment 4 Miroslav Suchý 2009-08-17 13:07:37 UTC
verfied in stage. the search string remains पंजीयन

Comment 5 Brandon Perkins 2009-09-10 19:32:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1434.html