Bug 1021767

Summary: JSP Character encoding is not set properly.
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: jooho lee <jlee>
Component: WebAssignee: Rémy Maucherat <rmaucher>
Status: CLOSED NOTABUG QA Contact: Radim Hatlapatka <rhatlapa>
Severity: high Docs Contact: Russell Dickenson <rdickens>
Priority: unspecified    
Version: 6.1.1CC: mvelas
Target Milestone: ER10   
Target Release: EAP 6.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-04 06:54:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
This test application show the bug none

Description jooho lee 2013-10-22 03:45:20 UTC
Created attachment 814848 [details]
This test application show the bug

Description of problem:
In order to use international Character such as Korean, Japanese and Chinese, JSP file should contain this line to encode/decode properly:
<%@ page language="java" contentType="text/html; charset=UTF-8"
    pageEncoding="utf8"%>

But it does not work properly. For example, there is A.jsp , B.jsp files

A.jsp pass "가나다" character to B.jsp using GET METHOD as a parameter.

From URL, I can see localhost:8080/KoreanChar/A.jsp?msg=%EA%B0%80%EB%82%98%EB%8B%A4. The parameter "가나다" is encoded by UTF-8.

However, at B.jsp, the parameter is decoded automatically by ISO-8859-1 so it show broken character ê°ëë¤.  If I encode the broken character by ISO-8859-1 and decode it by UTF-8 then I can see original character "가나다".

Even I set following system-property, it is not working properly.

<system-properties>
     <property name="org.apache.catalina.connector.URI_ENCODING" value="UTF-8"/>
     <property name="org.apache.catalina.connector.USE_BODY_ENCODING_FOR_QUERY_STRING" value="true"/>
</system-properties>


Version-Release number of selected component (if applicable):


How reproducible:
I attached KoreanChar.zip.

Steps to Reproduce:
1. Import & deploy it using JBDS or Eclipse
2. http://localhost:8080/KoreanChar/ 
   ==>console :
         Original Msg : 가나다
         Encode OriginalMsg by ISO: %3F%3F%3F
         Encode OriginalMsg by UTF: %EA%B0%80%EB%82%98%EB%8B%A4
         Decode unicodeString by ISO : ì´ì£¼í¸
         Decode unicodeString by UTF : 가나다
         getCharacterEncoding :null
         getContentType :null

3. input 가나다 and submit
4. See broken Character.
  ===> Page show :
         getCharacterEncoding :null
         getContentType :null
         ------------------------------------------------
         #### Original Msg : ê°ëë¤######
         #### Encode OriginalMsg by ISO : %EA%B0%80%EB%82%98%EB%8B%A4######
         #### Decode EncodedMsgbyISO by UTF가나다###### 
 
Add system-property in standalone.xml

<system-properties>
     <property name="org.apache.catalina.connector.URI_ENCODING" value="UTF-8"/>
     <property name="org.apache.catalina.connector.USE_BODY_ENCODING_FOR_QUERY_STRING" value="true"/>
</system-properties>

Then test again.=> Same result.


Actual results:
         getCharacterEncoding :null
         getContentType :null
         ------------------------------------------------
         #### Original Parameter Msg : ê°ëë¤######
        

Expected results:
         getCharacterEncoding :null
         getContentType :null
         ------------------------------------------------
         #### Original Parameter Msg : 가나다
        

Additional info:

Comment 1 Rémy Maucherat 2013-10-23 08:06:50 UTC
This is only about a GET request that would have its URI badly decoded. This works.

It seems you tried to set all possible configuration options and UTF-8 is mentioned just about everywhere, but actually USE_BODY_ENCODING_FOR_QUERY_STRING likely overrides everything (your GET has no charset to specify for its non existent body, and the HTTP default is not UTF-8).

Encodings in URI is not a very good idea unless you like problems ...

Comment 2 jooho lee 2013-10-23 08:39:41 UTC
Hi Remy,

Basically, for using international language, I suppose encoding is one of must to do to avoid character problems. As you mentioned, Http default is not UTF-8 so sometimes it makes some problems happen with global language such as Chinese, Korean and so on. Hence, normally this kind of option which override charset enforcely are used.

However, even I tested with jsp which contains charset=utf-8 & encoding=utf8, problem was occurred. Although I set system properties "URI_ENCODING and USE_BODY_ENCODING_FOR_QUERY_STRING" to override the charset once agagin, it was also same result.

Actually, I didn't test POST but I am not sure why you think Encoding in URI is not a very good way. As I mentioned above, it is usual to use encoding uri for international words.

Moreover, I think this is definitely a bug that paramter come from previous page is forcebly decoded by ISO-8859-1 even though charset is defined as utf-8 on the top of file.

Comment 3 Rémy Maucherat 2013-10-23 13:01:21 UTC
Yes, you set everything you can, but that's counter productive. So drop USE_BODY_ENCODING_FOR_QUERY_STRING.

Comment 4 Martin Velas 2014-08-04 06:54:51 UTC
Using the configuration proposed by Rémy (setting only the org.apache.catalina.connector.URI_ENCODING property to UTF-8), I obtained expected correctly encoded output:

Results: #### Original Parameter Msg : 가나다######

Issue was verified against EAP 6.3.0.ER10.