Bug 1021767 - JSP Character encoding is not set properly.
JSP Character encoding is not set properly.
Status: CLOSED NOTABUG
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Web (Show other bugs)
6.1.1
Unspecified Unspecified
unspecified Severity high
: ER10
: EAP 6.3.0
Assigned To: Rémy Maucherat
Radim Hatlapatka
Russell Dickenson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-21 23:45 EDT by jooho lee
Modified: 2014-08-05 07:49 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-08-04 02:54:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
This test application show the bug (5.51 KB, application/zip)
2013-10-21 23:45 EDT, jooho lee
no flags Details

  None (edit)
Description jooho lee 2013-10-21 23:45:20 EDT
Created attachment 814848 [details]
This test application show the bug

Description of problem:
In order to use international Character such as Korean, Japanese and Chinese, JSP file should contain this line to encode/decode properly:
<%@ page language="java" contentType="text/html; charset=UTF-8"
    pageEncoding="utf8"%>

But it does not work properly. For example, there is A.jsp , B.jsp files

A.jsp pass "가나다" character to B.jsp using GET METHOD as a parameter.

From URL, I can see localhost:8080/KoreanChar/A.jsp?msg=%EA%B0%80%EB%82%98%EB%8B%A4. The parameter "가나다" is encoded by UTF-8.

However, at B.jsp, the parameter is decoded automatically by ISO-8859-1 so it show broken character 가나다.  If I encode the broken character by ISO-8859-1 and decode it by UTF-8 then I can see original character "가나다".

Even I set following system-property, it is not working properly.

<system-properties>
     <property name="org.apache.catalina.connector.URI_ENCODING" value="UTF-8"/>
     <property name="org.apache.catalina.connector.USE_BODY_ENCODING_FOR_QUERY_STRING" value="true"/>
</system-properties>


Version-Release number of selected component (if applicable):


How reproducible:
I attached KoreanChar.zip.

Steps to Reproduce:
1. Import & deploy it using JBDS or Eclipse
2. http://localhost:8080/KoreanChar/ 
   ==>console :
         Original Msg : 가나다
         Encode OriginalMsg by ISO: %3F%3F%3F
         Encode OriginalMsg by UTF: %EA%B0%80%EB%82%98%EB%8B%A4
         Decode unicodeString by ISO : 이주호
         Decode unicodeString by UTF : 가나다
         getCharacterEncoding :null
         getContentType :null

3. input 가나다 and submit
4. See broken Character.
  ===> Page show :
         getCharacterEncoding :null
         getContentType :null
         ------------------------------------------------
         #### Original Msg : 가나다######
         #### Encode OriginalMsg by ISO : %EA%B0%80%EB%82%98%EB%8B%A4######
         #### Decode EncodedMsgbyISO by UTF가나다###### 
 
Add system-property in standalone.xml

<system-properties>
     <property name="org.apache.catalina.connector.URI_ENCODING" value="UTF-8"/>
     <property name="org.apache.catalina.connector.USE_BODY_ENCODING_FOR_QUERY_STRING" value="true"/>
</system-properties>

Then test again.=> Same result.


Actual results:
         getCharacterEncoding :null
         getContentType :null
         ------------------------------------------------
         #### Original Parameter Msg : 가나다######
        

Expected results:
         getCharacterEncoding :null
         getContentType :null
         ------------------------------------------------
         #### Original Parameter Msg : 가나다
        

Additional info:
Comment 1 Rémy Maucherat 2013-10-23 04:06:50 EDT
This is only about a GET request that would have its URI badly decoded. This works.

It seems you tried to set all possible configuration options and UTF-8 is mentioned just about everywhere, but actually USE_BODY_ENCODING_FOR_QUERY_STRING likely overrides everything (your GET has no charset to specify for its non existent body, and the HTTP default is not UTF-8).

Encodings in URI is not a very good idea unless you like problems ...
Comment 2 jooho lee 2013-10-23 04:39:41 EDT
Hi Remy,

Basically, for using international language, I suppose encoding is one of must to do to avoid character problems. As you mentioned, Http default is not UTF-8 so sometimes it makes some problems happen with global language such as Chinese, Korean and so on. Hence, normally this kind of option which override charset enforcely are used.

However, even I tested with jsp which contains charset=utf-8 & encoding=utf8, problem was occurred. Although I set system properties "URI_ENCODING and USE_BODY_ENCODING_FOR_QUERY_STRING" to override the charset once agagin, it was also same result.

Actually, I didn't test POST but I am not sure why you think Encoding in URI is not a very good way. As I mentioned above, it is usual to use encoding uri for international words.

Moreover, I think this is definitely a bug that paramter come from previous page is forcebly decoded by ISO-8859-1 even though charset is defined as utf-8 on the top of file.
Comment 3 Rémy Maucherat 2013-10-23 09:01:21 EDT
Yes, you set everything you can, but that's counter productive. So drop USE_BODY_ENCODING_FOR_QUERY_STRING.
Comment 4 Martin Velas 2014-08-04 02:54:51 EDT
Using the configuration proposed by Rémy (setting only the org.apache.catalina.connector.URI_ENCODING property to UTF-8), I obtained expected correctly encoded output:

Results: #### Original Parameter Msg : 가나다######

Issue was verified against EAP 6.3.0.ER10.

Note You need to log in before you can comment on or make changes to this bug.