Bug 801296 - Storing a byte array via Memcached client fails on Windows
Summary: Storing a byte array via Memcached client fails on Windows
Keywords:
Status: VERIFIED
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: unspecified
Version: 6.0.0,6.4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 6.0.0
Assignee: Tristan Tarrant
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-08 08:30 UTC by Martin Gencur
Modified: 2023-03-02 08:28 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
A byte array's storage was occasionally misinterpreted by the server as an invalid command. This was caused by the memcached server not always consuming the CR/LF delimiter which marks the end of a client request. The server then attempted to decode the delimiter as the header of the following request and incorrectly reported this as a bad message. This is now fixed and running the server operates as expected.
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
Server log (30.29 KB, text/x-log)
2012-03-08 08:30 UTC, Martin Gencur
no flags Details
Surefire report (7.52 KB, text/plain)
2012-03-08 08:32 UTC, Martin Gencur
no flags Details
TRACE log solaris 10 x64 java6 (429.97 KB, text/x-log)
2012-03-09 13:50 UTC, Tomas Sykora
no flags Details
pcap of testSetGetObject method in distributuon mode (37.79 KB, application/vnd.tcpdump.pcap)
2012-05-06 08:42 UTC, Tomas Sykora
no flags Details
pcap of testSetGetObject method in replication mode (39.34 KB, application/vnd.tcpdump.pcap)
2012-05-06 08:43 UTC, Tomas Sykora
no flags Details
pcap of testSetGetObject method in local mode (18.40 KB, application/vnd.tcpdump.pcap)
2012-05-06 08:44 UTC, Tomas Sykora
no flags Details
server "trace" log from the same run as next pcap (1) (241.63 KB, text/x-log)
2012-05-07 13:48 UTC, Tomas Sykora
no flags Details
pcap of testSetGetObject method in local mode - the same run as server log (1) (19.46 KB, application/vnd.tcpdump.pcap)
2012-05-07 13:50 UTC, Tomas Sykora
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-2053 0 Major Resolved Memcached server occasionally does not consume CR/LF after data block 2018-09-12 17:59:52 UTC

Description Martin Gencur 2012-03-08 08:30:56 UTC
Created attachment 568541 [details]
Server log

Description of problem:

The testSetGetObject() test method fails on Windows 2008 32b and Windows 2008 R2 64bit. The test method can be found at https://svn.devel.redhat.com/repos/jboss-qa/edg/edg-functional-tests/trunk/core-tests/src/test/java/org/jboss/test/cluster/datagrid/memcached/MemcachedClientTestCase.java and looks like this:

@Test
    public void testSetGetObject() throws Exception {
        ByteArrayOutputStream bout = new ByteArrayOutputStream();
        ObjectOutputStream oo = new ObjectOutputStream(bout);
        oo.writeObject(new TestSerializable("CONTENT"));
        oo.flush();

        byte[] byteData = bout.toByteArray();
        mc.writeln("set " + KEY_A + " 0 0 " + byteData.length);
        mc.flush();
        mc.write(byteData);
        mc.write("\r\n".getBytes(ENCODING));
        mc.flush();
        assertEquals("STORED", mc.readln());
        byte[] bytesBack = mc.getBytes(KEY_A);
        assertEquals(byteData.length, bytesBack.length);
        ObjectInputStream oin = new ObjectInputStream(new ByteArrayInputStream(bytesBack));
        TestSerializable ts = (TestSerializable) oin.readObject();
        assertEquals("CONTENT", ts.getContent());
    }

The test method fails at line assertEquals(byteData.length, bytesBack.length); where I get NullPointerException


Attaching server.log and surefire report.


How reproducible:

Run functional tests in core-tests submodule.

Comment 1 Martin Gencur 2012-03-08 08:32:15 UTC
Created attachment 568542 [details]
Surefire report

The surefire report says that it fails on line 132, please ignore that as I added some more lines before running the test. It actually fails on line specified in the description of this BZ

Comment 2 Tristan Tarrant 2012-03-09 09:02:25 UTC
Could you please rerun this with trace logs enabled for org.infinispan.server.memcached

Comment 3 Tomas Sykora 2012-03-09 13:50:15 UTC
Created attachment 568906 [details]
TRACE log solaris 10 x64 java6

I found this issue on Solaris too.
 
I attached trace log from run on Solaris 10 x64 (Hudson slave dev20-01-solaris10-x86) with this java:

java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)

Comment 4 Tristan Tarrant 2012-03-09 16:56:23 UTC
The problem is that the server is receiving a null command for the preceding set operation, and therefore it breaks when performing the get. I will investigate this further

Comment 5 Tristan Tarrant 2012-03-20 16:20:53 UTC
Would it be possible to get pcaps using wireshark of test ? Also could you try running just that single test ?

Comment 6 mark yarborough 2012-04-04 13:15:05 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
CCFR - Tristan

Comment 7 Tristan Tarrant 2012-04-04 13:22:05 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-CCFR - Tristan+When running the server on Windows and Solaris occasionally the storage of a byte array is misinterpreted by the server as an invalid command. This does not happen under RHEL.

Comment 8 Misha H. Ali 2012-04-04 15:52:54 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-When running the server on Windows and Solaris occasionally the storage of a byte array is misinterpreted by the server as an invalid command. This does not happen under RHEL.+When running the server on Windows and Solaris, a byte array's storage is occasionally misinterpreted by the server as an invalid command. This does not occur when the server runs on Red Hat Enterprise Linux.

Comment 9 Tomas Sykora 2012-05-06 08:42:42 UTC
Created attachment 582391 [details]
pcap of testSetGetObject method in distributuon mode

Comment 10 Tomas Sykora 2012-05-06 08:43:36 UTC
Created attachment 582392 [details]
pcap of testSetGetObject method in replication mode

Comment 11 Tomas Sykora 2012-05-06 08:44:06 UTC
Created attachment 582393 [details]
pcap of testSetGetObject method in local mode

Comment 12 Tomas Sykora 2012-05-06 08:51:59 UTC
Hi Tristan,

I have added three small gifts for you :-)

You can see packet capture of MemcachedClientTestCase.testSetGetObject test method. I have run and capture packets in distribution, replication and local mode. I successfully opened attached files in wireshark-gnome so I suppose it could be fine for you too. 

I hope this will help you. In any case ask more/detailed info.
Once again sorry for my delay I was blocked by more urgent tasks until this time.

Configuration: 

OS:
Microsoft Windows Server 2008 R2 enterprise
version 6.1.7601 Service Pack 1 Build 7601
x64-based PC

Java:
jdk1.6.0_27

Note: 
I was not able to reproduce this problem on the same system 32 bit architecture system. Test was run more than 5 times in all modes. Maybe only bad probability... But on x64 win failed every time in all modes when I was running test manually. (In Hudson - it depends - unpredictable behaviour)

Note2: 
We can see this problem on Solaris machines too. But I can't pcap loopback on solaris lo0 interface. That is not possible according to internet-info.

Comment 13 Tristan Tarrant 2012-05-06 14:12:02 UTC
The only pcap that makes sense in Wireshark is the local mode one (which should be enough).

The sequence of messages in the pcap is as follows (binary data removed for clarity):

REQUEST set a 0 0 136
srJcom.jboss.datagrid.test.memcached.MemcachedClientTestCase$TestSerializable?XT3LcontenttLjava/lang/String;xptCONTENT
RESPONSE STORED

REQUEST get a
RESPONSE ERROR
RESPONSE VALUE a 0 136
srJcom.jboss.datagrid.test.memcached.MemcachedClientTestCase$TestSerializable?XT3LcontenttLjava/lang/String;xptCONTENT
END

As you can see the server is attempting to reply correctly to the "get a" command, but for some reason a spurious "ERROR" response happens before the "VALUE" response causing the client to get confused.

Galder, any insight ?

Comment 14 Tristan Tarrant 2012-05-06 14:16:31 UTC
Tomas, ideally it would be great if the attached pcap and "trace" server log could come from the same run (the server.log above tell us that the server is processing a set for a 144 byte payload, whereas the pcap shows a 136 byte payload (that may be because of serialization differences).

Comment 15 Tomas Sykora 2012-05-07 13:48:00 UTC
Created attachment 582669 [details]
server "trace" log from the same run as next pcap (1)

Comment 16 Tomas Sykora 2012-05-07 13:50:36 UTC
Created attachment 582672 [details]
pcap of testSetGetObject method in local mode - the same run as server log (1)

Hi Tristan,

In case of any problems let me know I would try to provide any other logs or information. It was run on the same configuration as last time and both logs are from the same one run of that method.

Comment 17 Galder Zamarreño 2012-05-08 11:17:57 UTC
Tomas, can't find https://svn.devel.redhat.com/repos/jboss-qa/edg/edg-functional-tests/trunk/core-tests/src/test/java/org/jboss/test/cluster/datagrid/memcached/MemcachedClientTestCase.java, where is that?

Can you try a few things separately? I don't know what mc.writeln really does, but bearing in mind that, try this:

1. Switch the following line:

mc.writeln("set " + KEY_A + " 0 0 " + byteData.length);

for:

mc.writeln("set " + KEY_A + " 0 0 " + byteData.length + " ");

See if that works.

2. If that doesn't work, try using something else other than writeln, in case this is a plattform dependent method:

// convert the following to UTF-8:
"set " + KEY_A + " 0 0 " + byteData.length + "\r\n"

And the call:
mc.write(set);

3. If that doesn't work, try:

// convert the following to UTF-8:
"set " + KEY_A + " 0 0 " + byteData.length + " \r\n"

mc.write(set);

Cheers,

Comment 18 Tristan Tarrant 2012-05-08 11:33:12 UTC
Galder, the tests now live here: https://svn.devel.redhat.com/repos/jboss-qa/jdg/jdg-functional-tests/trunk/remote/

Comment 19 mark yarborough 2012-05-09 13:23:17 UTC
Per Tristan triage, moved from ER8 to ER9 target release.

Comment 20 Tomas Sykora 2012-05-14 11:30:27 UTC
Galder Hi!
Thank you for your suggestions. I have carefully tried all of them on same configuration as last time. 

If I add " " space (2 cases):
1) mc.writeln("set " + KEY_A + " 0 0 " + byteData.length + " ");
2) to UTF-8: "set " + KEY_A + " 0 0 " + byteData.length + " \r\n"

it doesn't work. We can see this error on console:

07:07:44,326 ERROR [org.infinispan.server.memcached.MemcachedDecoder] (HotRodServerWorker-1-1) ISPN005003: Exception rep
orted: java.io.StreamCorruptedException: Unable to parse noreply optional argument

and this in surefire report:

org.junit.ComparisonFailure: expected:<[STORED]> but was:<[CLIENT_ERROR bad command line format: Unable to parse noreply optional argument]>

at this this line:	
assertEquals("STORED", mc.readln());


last case:

// convert the following to UTF-8:
"set " + KEY_A + " 0 0 " + byteData.length + "\r\n"

unfortunately doesn't work either. Same error as Martin mentioned.
NullPointerException at line assertEquals(byteData.length, bytesBack.length);



Please see related links: 

(for write and writeln methods)

https://svn.devel.redhat.com/repos/jboss-qa/jdg/jdg-functional-tests/trunk/remote/common-utils/src/main/java/com/jboss/datagrid/test/utils/memcached/MemcachedHelper.java

(for testSetGetObject method)

https://svn.devel.redhat.com/repos/jboss-qa/jdg/jdg-functional-tests/trunk/remote/core-tests/src/test/java/com/jboss/datagrid/test/memcached/MemcachedClientTestCase.java 

If any log/info needed I will provide it asap.

Comment 21 Tristan Tarrant 2012-05-16 21:13:26 UTC
Tried with ER9 on Windows x64 and I could not reproduce the problem.

Comment 22 Tomas Sykora 2012-05-18 08:32:31 UTC
ER9 on Solaris10 x86 + sunjdk6+7 and Windows x64 sunjdk7 --> I can still reproduce this issue occasionally.

It seems to be unpredictable.
(For proper verification we need to run this method MORE times in a row to be sure about this.)

Comment 23 Tomas Sykora 2012-05-23 07:12:16 UTC
ER10: Can't replicate this anymore. Setting as verified.

Comment 24 Misha H. Ali 2012-06-05 23:55:36 UTC
This bug is flagged for addition to JDG 6 GA. Flagging a NEEDINFO for assignee for information about the fix and to correct the identified problem (technical notes box), if required.

Comment 25 Tristan Tarrant 2012-06-06 09:06:10 UTC
Since this is not an issue anymore, removing the technical notes.

Comment 26 Tristan Tarrant 2012-06-06 09:06:10 UTC
Deleted Technical Notes Contents.

Old Contents:
When running the server on Windows and Solaris, a byte array's storage is occasionally misinterpreted by the server as an invalid command. This does not occur when the server runs on Red Hat Enterprise Linux.

Comment 27 Tristan Tarrant 2012-06-06 09:54:57 UTC
The issue was caused by the memcached server not always consuming the CR/LF delimiter which marks the end of a client request. The server then attempted to decode the delimiter as the header of the following request and incorrectly reported this as a bad message

Comment 28 Misha H. Ali 2012-06-07 03:17:20 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When running the server on Windows and Solaris, a byte array's storage was occasionally misinterpreted by the server as an invalid command. This was caused by the memcached server not always consuming the CR/LF delimiter which marks the end of a client request. The server then attempted to decode the delimiter as the header of the following request and incorrectly reported this as a bad message. This is now fixed and running the server on Windows and Solaris now operates as expected.

Comment 29 Tristan Tarrant 2012-06-07 09:31:42 UTC
Misha, I updated the tech note: the problem actually happened on all platforms (just not as frequently on RHEL), so I removed all references to Windows and Solaris

Comment 30 Tristan Tarrant 2012-06-07 09:31:42 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-When running the server on Windows and Solaris, a byte array's storage was occasionally misinterpreted by the server as an invalid command. This was caused by the memcached server not always consuming the CR/LF delimiter which marks the end of a client request. The server then attempted to decode the delimiter as the header of the following request and incorrectly reported this as a bad message. This is now fixed and running the server on Windows and Solaris now operates as expected.+A byte array's storage was occasionally misinterpreted by the server as an invalid command. This was caused by the memcached server not always consuming the CR/LF delimiter which marks the end of a client request. The server then attempted to decode the delimiter as the header of the following request and incorrectly reported this as a bad message. This is now fixed and running the server operates as expected.


Note You need to log in before you can comment on or make changes to this bug.