Bug 832235 - JBoss : charset problem when sending data from forms to MySQL
JBoss : charset problem when sending data from forms to MySQL
Product: OpenShift Origin
Classification: Red Hat
Component: Containers (Show other bugs)
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Dan Mace
libra bugs
: SupportQuestion, Triaged
Depends On:
  Show dependency treegraph
Reported: 2012-06-14 19:37 EDT by Clément HÉLY
Modified: 2015-05-14 18:55 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
JBoss AS 7.1 MySQL Hibernate
Last Closed: 2012-06-18 16:31:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
A full example of the application behaving correctly with regards to Unicode (32.36 KB, application/zip)
2012-06-18 13:57 EDT, Dan Mace
no flags Details

  None (edit)
Description Clément HÉLY 2012-06-14 19:37:22 EDT
On my app running on JBoss AS 7.1 @ OpenShift, I'm trying to save data received from a form (using POST method) in a MySQL database. The insertion works well, but accentuated characters (like 'é', 'à', 'è', ...) are corrupted during the process and appear like "é" or "çÃ" (for instance) in the DB.

I've checked the charset used by the databse : all the tables are in utf8_general_ci. My forms have the attribute accept-charset="UTF-8". All my jsp are saved use UTF-8.
I force my servlet to convert data received by forms to UTF-8 strings, and at this point accentuated characters are not corrupted.

I tried to add these 2 lines in the persistence.xml :
<property name="hibernate.connection.useUnicode" value="true"/>
<property name="hibernate.connection.characterEncoding" value="UTF-8" />

And these 2 others in standalone.xml :
    <property name="org.apache.catalina.connector.URI_ENCODING" value="UTF-8"/>
    <property name="org.apache.catalina.connector.USE_BODY_ENCODING_FOR_QUERY_STRING" value="true"/>

But it still fails.

Since it is perfectly working on my local instance of JBoss AS 7.1, there must be some parameter in OpenShift configuration that prevents some unicode characters to be saved in database.

Additionnal info :
Comment 1 Dan Mace 2012-06-15 14:46:42 EDT
I have been unable to reproduce this. With a stock jbossas-7 app, using a @Stateless JAX-RS controller with an injected EntityManager, I persisted a model into a test MySQL table with the unicode characters reported, and they persisted uncorrupted.

I'd like to more accurately reproduce the reporter's environment. I'm going to need more detail on the specific setup in the original report. I need:

- Sample form code used
- Sample controller code on the servlet side

The original report contains lots of helpful details, but is too vague on the specifics of the servlet implementation (e.g., stateful session bean? stateless? jax-rs controller? how is the form data being processed in the handler? etc.)

Once I get some more specifics, I'll attempt to reproduce once again.
Comment 2 Clément HÉLY 2012-06-15 19:11:57 EDT
Thanks for the fast answer.

Here are a few code samples :

All the forms have the same structure :

<form action="createMeal" method="post" onsubmit="return newMealValidation()" accept-charset="UTF-8">
	<div class="control-group">
	<label class="control-label" for="name"><strong>Nom : </strong></label>
	<div class="controls">
		<input id="name" name="name" type="text" />
	... [other fields] ...
        <input type="submit" class="btn btn-info" value="Créer" />

And here is the servlet that receives the data from the above form :

public class InsertNewMealServlet extends HttpServlet
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
	String name = new String(request.getParameter("name").getBytes(), "UTF-8");
	String description = new String(request.getParameter("description").getBytes(), "UTF-8");
	float price = Float.parseFloat(request.getParameter("price"));
	Type type = DaoFactory.getInstance().getTypeDao().findType(Long.parseLong(request.getParameter("type")));
	Meal meal = new Meal(name, description, price, type);

As you can see, I get data from the POST request, and try to force the UTF-8 charset before persisting it.

The MealDao.save(Meal meal) method uses the persist() method from EntityManager. I get the EntityManager instance from EntityManagerFactory.createEntityManager().
The following line is used to instantiate this EntityManagerFactory :


And here is a sample from persistence.xml :

<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence" version="2.0">
	<persistence-unit name="myapp" transaction-type="RESOURCE_LOCAL">
			<property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect" />
			<property name="hibernate.show_sql" value="true" />
			<property name="hibernate.format_sql" value="true" />
			<property name="hibernate.hbm2ddl.auto" value="update" />
			<property name="hibernate.connection.useUnicode" value="true"/>
			<property name="hibernate.connection.characterEncoding" value="UTF-8" />

And finally, the datasource is declared in standalone.xml as follow :

<datasource jndi-name="java:jboss/datasources/MysqlDS" enabled="${mysql.enabled}" use-java-context="true" pool-name="MysqlDS">

Here's what I think can be useful. Don't hesitate to tell me if you need more info or more code from my application. Thanks again for your help
Comment 3 Dan Mace 2012-06-18 13:57:31 EDT
Created attachment 592713 [details]
A full example of the application behaving correctly with regards to Unicode
Comment 4 Dan Mace 2012-06-18 13:58:20 EDT
I think we may have missed something very basic. I've attached an example project which demonstrates a successful use case for this issue. The zip also contains the DDL necessary to create the schema used in the test code.

Basically, what I think is missing from all our experiments is a call to HttpServletRequest#setCharacterEncoding for the request. Once the correct encoding is set, the inbound UTF characters are correctly decoded on subsequent calls to HttpServletRequest#getParameter.

Please test once more using this fix and let me know how it works out. My suspicion is that your original example is working due to some coincedence. If the setCharacterEncoding call fixes the problem, we can assume there's no bug here.
Comment 5 Dan Mace 2012-06-18 14:09:05 EDT
I still would like to understand why the default request encoding seems to be UTF-8 on the reporter's local JBoss instance, and why it may differ from the version deployed on OpenShift. Although there may not be a bug per say, there may be a piece of configuration we aren't properly exposing the user which would allow them to set the default encoding.
Comment 6 Clément HÉLY 2012-06-18 14:23:43 EDT
Well... I don't know why i missed that... But it works now that i added the HttpServletRequest#setCharacterEncoding !

Indeed, there is no bug. Even if, as you say, it is strange that the default encoding differs between OpenShift and my local instance.

Anyway, thanks a lot for your help. And sorry for the loss of time.
Comment 7 Dan Mace 2012-06-18 14:25:01 EDT
I am still not convinced there's no problem here. You should be able to set the default request encoding to get picked up from the client via standalone.xml. I am going to explore that further.
Comment 8 Dan Mace 2012-06-18 15:59:11 EDT
I can't find any official JBoss documentation suggesting there is any way to set the default POST request charset reliably (or at all, for that matter). I am going to have to say that creating an encoding filter is the appropriate way to handle this consistently. Unless the client sends the charset specification in the form post (e.g. "application/x-www-form-urlencoded;charset=utf-8"), you're going to end up decoding in ISO-8955-1.

Even if you could configure the client to send the correct headers to drive the server configuration, it wouldn't be reliable.

Without some specific evidence that the JBossWeb configuration supports any sort of encoding defaults configuration which can override the client headers, I am not going to be able to attribute this to a bug.

Here is some Tomcat reference documentation which is related (but again, not directly applicable to JBossWeb):


If anybody can find some official JBoss documentation on the subject, we can revisit the issue. In the meantime, a Filter to set the encoding is the solution.

Note You need to log in before you can comment on or make changes to this bug.