Bug 1088956

Summary: MalformedByteSequenceException in Namespace test on Windows
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Katerina Odabasi <kanovotn>
Component: RESTEasyAssignee: Ron Sigal <rsigal>
Status: CLOSED CURRENTRELEASE QA Contact: Katerina Odabasi <kanovotn>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.3.0CC: kanovotn, rsigal, rsvoboda, weli
Target Milestone: DR9Flags: smumford: needinfo+
Target Release: EAP 6.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
In a previous release of JBoss EAP 6, when encoding was not specified in the body of a client request, RESTeasy returned a response in the encoding of the server, not in the encoding of the original request. This issue has been resolved in this release by setting UTF-8 as the default encoding if no encoding is requested by the client.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Katerina Odabasi 2014-04-17 13:56:47 UTC
Description of problem:
Testcase org.jboss.resteasy.test.xxe.namespace.TestNamespace fails on Windows w2k12r2 with the following exception:

[org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.]
 	at org.jboss.resteasy.plugins.providers.jaxb.CollectionProvider.readFrom(CollectionProvider.java:149) [resteasy-jaxb-provider-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.interception.MessageBodyReaderContextImpl.proceed(MessageBodyReaderContextImpl.java:106) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
	at org.jboss.resteasy.plugins.interceptors.encoding.GZIPDecodingInterceptor.read(GZIPDecodingInterceptor.java:63) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.interception.MessageBodyReaderContextImpl.proceed(MessageBodyReaderContextImpl.java:109) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.MessageBodyParameterInjector.inject(MessageBodyParameterInjector.java:169) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.MethodInjectorImpl.injectArguments(MethodInjectorImpl.java:136) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
3 	at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:159) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.ResourceMethod.invokeOnTarget(ResourceMethod.java:269) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:227) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.ResourceMethod.invoke(ResourceMethod.java:216) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.SynchronousDispatcher.getResponse(SynchronousDispatcher.java:542) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:524) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:126) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:208) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:55) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
 	at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:50) [resteasy-jaxrs-2.3.8.Final-redhat-1.jar:]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) [jboss-servlet-api_3.0_spec-1.0.2.Final-redhat-1.jar:1.0.2.Final-redhat-1]
 	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:295) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:231) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169) [jboss-as-web-7.4.0.Final-redhat-8.jar:7.4.0.Final-redhat-8]
 	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:340) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:653) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926) [jbossweb-7.4.0.Final-redhat-1.jar:7.4.0.Final-redhat-1]
 	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45]


Version-Release number of selected component (if applicable):
6.3.0.ER1

How reproducible:
always

Steps to Reproduce:
1. Set up encoding to cp1250 before running the test.
2. Run the test case org.jboss.resteasy.test.xxe.namespace.TestNamespace
git clone git://git.app.eng.bos.redhat.com/jbossqe-eap-resteasy-ts.git resteasy-ts
cd resteasy-ts
git checkout 9dd5e7fa4f143de3b4b37ee9b020b10d4985ba74
mvn -s settings.xml -Dproductized -Peap-productization-maven-repository -Dversions-test.org.jboss.resteasy=2.3.8.Final-redhat-1 -Djboss.home=PATH_TO_EAP -Djboss730 clean verify -fn -pl :resteasy-jaxb-provider -Dtest=TestNamespace -Darquillian

Replace PATH_TO_EAP with actual value.

Additional info:
On the server is send request with data with specified encoding (utf-8 in this case).
The response from the server contains malformed characters:
INFO  [stdout] (http-/127.0.0.1:8080-4) MovieResource(map): title = La R?gle du Jeu

This is due to header in the response from the server doesn't contain encoding.
There is a workaround for this to set @Produces("*/*;charset=UTF-8") to the called resource.
To apply this workaround uncomment lines with @Produces for each resource.

I think, that if request has specified encoding in the body it should be recognized by the server and response should be returned in the same encoding.

Comment 1 Katerina Odabasi 2014-05-28 14:12:42 UTC
The issue is reproducible also on upstream testsuite on Branch_2_3, setting up LC_ALL and LANG enviroment variables to some cp* encoding and running the testsuite with command:

mvn clean verify -pl :resteasy-jaxb-provider,:resteasy-jaxrs,:resteasy-test-arquillian -DfailIfNoTests=false -Darquillian -Dtest=TestNamespace -Djboss.home=path_to_eap_installation

I found RFC 2616 [1] from which I understand that encoding specified in request should be followed.

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4.1

Comment 2 Weinan Li 2014-05-28 14:18:34 UTC
Hi Katerina, could you please report it on JIRA for this bug? Or the RESTEasy development team won't fix it.

Comment 3 Katerina Odabasi 2014-05-28 15:11:06 UTC
Hi Weinan, according to our instructions, bugzilla is the only tool for tracking EAP issues. If you need any additional tickets for the development team, kindly open it yourself.

Comment 4 Weinan Li 2014-06-09 15:04:07 UTC
Hi Ron, could you please help to check this issue? Seems it's a development relative bug.

Comment 5 Ron Sigal 2014-06-10 18:08:11 UTC
Hey Wei,

I almost missed this issue. It went to a Thunderbird folder I rarely look at.

I'll take a look. Fortunately, I have a working Windows laptop. I guess we'll have to test on Android as well, some day. ;)

-Ron

Comment 6 Ron Sigal 2014-06-22 05:16:30 UTC
This also works:

      @POST
      @Path("text/accepts")
      @Consumes("text/plain")
      public String textAccepts(String movie)
      {
         return movie;
      }

      ...

      ClientRequest request = new ClientRequest(generateURL("/text/accepts"));
      Map<String, String> params = new HashMap<String, String>();
      params.put("charset", "UTF-16");
      MediaType mt = new MediaType("text", "plain", params);
      request.body(mt, "La Règle du Jeu");
      request.accept(mt);
      ClientResponse<?> response = request.post();

Note that the JAX-RS 2.0 spec says "When writing responses, implementations SHOULD respect application-supplied character set metadata and SHOULD use UTF-8 if a character set is not specified by the application or if the application specifies a character set that is unsupported."

So, I think that

      ClientRequest request = new ClientRequest(generateURL("/text/accepts"));
      Map<String, String> params = new HashMap<String, String>();
      params.put("charset", "UTF-16");
      MediaType mt = new MediaType("text", "plain", params);
      request.body(mt, "La Règle du Jeu");
//      request.accept(mt);
      ClientResponse<?> response = request.post();

should result in the result being encoded in UTF-8 rather than UTF-16.

Actually, I think that Katerina's comment "I think, that if request has specified encoding in the body it should be recognized by the server and response should be returned in the same encoding." makes sense, and might be worth adding to the spec. But, for now, I don't think the spec supports the idea.

Comments?

Comment 7 Katerina Odabasi 2014-06-25 14:10:25 UTC
Hi Ron,

The JAX-RS 2.0 spec also says that keywords like SHOULD etc. are to be interpreted as described in RFC 2119. 
RFC 2119 says:
"SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course."

So I think the jax-rs spec actually supports the idea. What do you think?

Comment 8 Ron Sigal 2014-06-25 17:22:02 UTC
Good point, Katerina. Let's see what Bill Burke says.

Comment 9 Scott Mumford 2014-07-04 00:50:41 UTC
Set the Target Release to TBD EAP 6  so that this bug appears in Release Notes dashboards and can be included as required.

Comment 10 Scott Mumford 2014-07-04 00:52:21 UTC
Ron, Katerina, could someone please supply some information about this issue for inclusion in the 6.3.0 Release Notes (Known Issues).

Comment 11 Katerina Odabasi 2014-07-04 08:27:11 UTC
Hi Scott, a brief summary:

When encoding is specified in the body of the client request, Resteasy returns response in encoding of the server, not in encoding of original request. To receive response in specified encoding request.accept(mediaType) header must be specified or @Produces annotation for the resource can be used.

Comment 12 Scott Mumford 2014-07-07 21:14:39 UTC
Thanks Katerina.

I've added a release note text and marked this for inclusion in the 6.3.0 Release Notes. 

I've also changed the Target Release to 6.4.0 to ensure this gets picked up in our Release Note filters. Feel free to return it to TBD 6 after the 6.3.0 GA.

(Also clearing NEEDINFO)

Comment 13 Ron Sigal 2014-08-06 00:15:38 UTC
Hi Katerina,

I'm having a reversal of thinking on this issue. As you pointed out, the client isn't saying anything to the server about charsets. The JAXB implementation is guessing the charset, and, in Windows, it's guessing wrong. As I mentioned in Comment 6, everything works fine if the client tells the server which charset it's sending and which charset it expects in return.

I'm thinking that it's reasonable to expect the client to take that responsibility, and that it would be sufficient for me to just fix the tests.

What do you think?

-Ron

Comment 14 Katerina Odabasi 2014-08-13 07:41:34 UTC
Hi Ron,

I looked at RESTEASY-1066 and Stuart is suggesting there:

"Looking at the Resteasy DefaultTextPlain provider if no charset is provided then it just uses String.getBytes() which is problematic because this is platform dependent, and may provide something completely bogus in some locales. I think it should either use String.getBytes("ISO-8859-1") to follow the defaults recommended by the HTTP spec, or use UTF-8 (and set the charset in the Content-Type header) and follow the JAX-RS spec recommendation."

Ok, determining encoding using charset of the entity seems going beyond the spec. 
(I thought we could extract Media type from request body request.getBodyContentType(); And give it to the server.)

But we should return consistent results on all platforms. Therefore without charset it should return ISO-8859-1 or UTF-8 as Stuart suggest. Sounds reasonable?

Comment 15 Ron Sigal 2014-08-14 00:31:38 UTC
Great point, Katerina. I forgot about what Stuart said on RESTEASY-1056 when I wrote my last comment. In fact, I've noticed that the Resteasy text oriented providers, including the JAXB providers, are inconsistent with respect to charsets. I'll sort that out, make sure UTF-8 is the default, and fix TestNamespace.

Comment 16 Ron Sigal 2014-08-14 00:34:38 UTC
Meant RESTEASY-1066.

Comment 17 Ron Sigal 2014-08-17 19:19:07 UTC
I have modified the DefaultTextPlain and StringTextStar in resteasy-jaxrs, as well as providers/jaxb, so that responses will default to using UTF-8.

The changes to Branch_2_3 are covered by pull request https://github.com/resteasy/Resteasy/pull/551.

Comment 18 Ron Sigal 2014-08-21 22:38:05 UTC
Pull request #551 merged, RESTEASY-1066 closed.

Comment 21 Katerina Odabasi 2014-11-26 15:17:16 UTC
Verified in EAP 6.4.0.DR11.