Bug 832235

Summary: JBoss : charset problem when sending data from forms to MySQL
Product: OKD Reporter: Clément HÉLY <clementhely>
Component: ContainersAssignee: Dan Mace <dmace>
Status: CLOSED NOTABUG QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xKeywords: SupportQuestion, Triaged
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
JBoss AS 7.1 MySQL Hibernate
Last Closed: 2012-06-18 20:31:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
A full example of the application behaving correctly with regards to Unicode none

Description Clément HÉLY 2012-06-14 23:37:22 UTC
On my app running on JBoss AS 7.1 @ OpenShift, I'm trying to save data received from a form (using POST method) in a MySQL database. The insertion works well, but accentuated characters (like 'é', 'à', 'è', ...) are corrupted during the process and appear like "é" or "çÃ" (for instance) in the DB.


I've checked the charset used by the databse : all the tables are in utf8_general_ci. My forms have the attribute accept-charset="UTF-8". All my jsp are saved use UTF-8.
I force my servlet to convert data received by forms to UTF-8 strings, and at this point accentuated characters are not corrupted.

I tried to add these 2 lines in the persistence.xml :
<property name="hibernate.connection.useUnicode" value="true"/>
<property name="hibernate.connection.characterEncoding" value="UTF-8" />

And these 2 others in standalone.xml :
<system-properties>
    <property name="org.apache.catalina.connector.URI_ENCODING" value="UTF-8"/>
    <property name="org.apache.catalina.connector.USE_BODY_ENCODING_FOR_QUERY_STRING" value="true"/>
</system-properties>


But it still fails.


Since it is perfectly working on my local instance of JBoss AS 7.1, there must be some parameter in OpenShift configuration that prevents some unicode characters to be saved in database.



Additionnal info :
https://openshift.redhat.com/community/forums/openshift/jboss-mysql-html-forms-encoding-problem

Comment 1 Dan Mace 2012-06-15 18:46:42 UTC
I have been unable to reproduce this. With a stock jbossas-7 app, using a @Stateless JAX-RS controller with an injected EntityManager, I persisted a model into a test MySQL table with the unicode characters reported, and they persisted uncorrupted.

I'd like to more accurately reproduce the reporter's environment. I'm going to need more detail on the specific setup in the original report. I need:

- Sample form code used
- Sample controller code on the servlet side

The original report contains lots of helpful details, but is too vague on the specifics of the servlet implementation (e.g., stateful session bean? stateless? jax-rs controller? how is the form data being processed in the handler? etc.)

Once I get some more specifics, I'll attempt to reproduce once again.

Comment 2 Clément HÉLY 2012-06-15 23:11:57 UTC
Thanks for the fast answer.

Here are a few code samples :

All the forms have the same structure :

<form action="createMeal" method="post" onsubmit="return newMealValidation()" accept-charset="UTF-8">
	<div class="control-group">
	<label class="control-label" for="name"><strong>Nom : </strong></label>
	<div class="controls">
		<input id="name" name="name" type="text" />
	</div>
	</div>
	... [other fields] ...
        <input type="submit" class="btn btn-info" value="Créer" />
</form>



And here is the servlet that receives the data from the above form :

@WebServlet("/management/createMeal")
public class InsertNewMealServlet extends HttpServlet
{
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
{
	String name = new String(request.getParameter("name").getBytes(), "UTF-8");
	String description = new String(request.getParameter("description").getBytes(), "UTF-8");
	float price = Float.parseFloat(request.getParameter("price"));
	Type type = DaoFactory.getInstance().getTypeDao().findType(Long.parseLong(request.getParameter("type")));
		
	Meal meal = new Meal(name, description, price, type);
	DaoFactory.getInstance().getMealDao().save(meal);
	response.sendRedirect("listAllMeals");
	}
}


As you can see, I get data from the POST request, and try to force the UTF-8 charset before persisting it.


The MealDao.save(Meal meal) method uses the persist() method from EntityManager. I get the EntityManager instance from EntityManagerFactory.createEntityManager().
The following line is used to instantiate this EntityManagerFactory :

Persistence.createEntityManagerFactory("myapp");


And here is a sample from persistence.xml :

<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence" version="2.0">
	<persistence-unit name="myapp" transaction-type="RESOURCE_LOCAL">
		<provider>org.hibernate.ejb.HibernatePersistence</provider>
		<non-jta-data-source>java:jboss/datasources/MysqlDS</non-jta-data-source>
		<class>com.youfood.server.entity.Meal</class>
		<properties>
			<property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect" />
			<property name="hibernate.show_sql" value="true" />
			<property name="hibernate.format_sql" value="true" />
			<property name="hibernate.hbm2ddl.auto" value="update" />
			<property name="hibernate.connection.useUnicode" value="true"/>
			<property name="hibernate.connection.characterEncoding" value="UTF-8" />
		</properties>
	</persistence-unit>
</persistence>



And finally, the datasource is declared in standalone.xml as follow :

<datasource jndi-name="java:jboss/datasources/MysqlDS" enabled="${mysql.enabled}" use-java-context="true" pool-name="MysqlDS">
  <connection-url>jdbc:mysql://${env.OPENSHIFT_DB_HOST}:${env.OPENSHIFT_DB_PORT}/${env.OPENSHIFT_GEAR_NAME}</connection-url>
  <driver>mysql</driver>
  <security>
     <user-name>${env.OPENSHIFT_DB_USERNAME}</user-name>
     <password>${env.OPENSHIFT_DB_PASSWORD}</password>
  </security>
</datasource>




Here's what I think can be useful. Don't hesitate to tell me if you need more info or more code from my application. Thanks again for your help

Comment 3 Dan Mace 2012-06-18 17:57:31 UTC
Created attachment 592713 [details]
A full example of the application behaving correctly with regards to Unicode

Comment 4 Dan Mace 2012-06-18 17:58:20 UTC
I think we may have missed something very basic. I've attached an example project which demonstrates a successful use case for this issue. The zip also contains the DDL necessary to create the schema used in the test code.

Basically, what I think is missing from all our experiments is a call to HttpServletRequest#setCharacterEncoding for the request. Once the correct encoding is set, the inbound UTF characters are correctly decoded on subsequent calls to HttpServletRequest#getParameter.

Please test once more using this fix and let me know how it works out. My suspicion is that your original example is working due to some coincedence. If the setCharacterEncoding call fixes the problem, we can assume there's no bug here.

Comment 5 Dan Mace 2012-06-18 18:09:05 UTC
I still would like to understand why the default request encoding seems to be UTF-8 on the reporter's local JBoss instance, and why it may differ from the version deployed on OpenShift. Although there may not be a bug per say, there may be a piece of configuration we aren't properly exposing the user which would allow them to set the default encoding.

Comment 6 Clément HÉLY 2012-06-18 18:23:43 UTC
Well... I don't know why i missed that... But it works now that i added the HttpServletRequest#setCharacterEncoding !

Indeed, there is no bug. Even if, as you say, it is strange that the default encoding differs between OpenShift and my local instance.


Anyway, thanks a lot for your help. And sorry for the loss of time.

Comment 7 Dan Mace 2012-06-18 18:25:01 UTC
I am still not convinced there's no problem here. You should be able to set the default request encoding to get picked up from the client via standalone.xml. I am going to explore that further.

Comment 8 Dan Mace 2012-06-18 19:59:11 UTC
I can't find any official JBoss documentation suggesting there is any way to set the default POST request charset reliably (or at all, for that matter). I am going to have to say that creating an encoding filter is the appropriate way to handle this consistently. Unless the client sends the charset specification in the form post (e.g. "application/x-www-form-urlencoded;charset=utf-8"), you're going to end up decoding in ISO-8955-1.

Even if you could configure the client to send the correct headers to drive the server configuration, it wouldn't be reliable.

Without some specific evidence that the JBossWeb configuration supports any sort of encoding defaults configuration which can override the client headers, I am not going to be able to attribute this to a bug.

Here is some Tomcat reference documentation which is related (but again, not directly applicable to JBossWeb):

http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

If anybody can find some official JBoss documentation on the subject, we can revisit the issue. In the meantime, a Filter to set the encoding is the solution.