Description of problem: Against 2 nodes JDG cluster, a Hot Rod C++ client is keep operating get and put. Then kill 1 node, the client sometimes fails to failover and leaves a error message like "Failed to connect (host: 127.0.1.1 port: 11222) Operation now in progress". How reproducible: The customer is the same as Bug 1228026. Their client is Nginx/LuaJIT and a reproducing environment is attached there as "hotrodwrapper.zip". Say one Siege session is 500 requests in 100 concurrent users (50 req/s on my machine) and a node is killed during the session. About once per 10 sessions, 500 is returned which means a retry failed. Steps to Reproduce: 1. Prepare 2 nodes JDG cluster and env of "hotrodwrapper.zip" in Bug 1228026. 2. Run "make siege". This runs one Siege session (siege -r5 -c100 -lsiege.log -i -furls.txt). 3. Kill one node during the session. 4. Check error log, ./nginx/logs/error.log, to find the error message. Actual results: Siege reports 500 returned and error.log contains the following line. 2015/08/11 16:53:39 [error] 31974#0: *600 [lua] hotrod.lua:20: Failed to connect (host: 127.0.1.1 port: 11222) Operation now in progress, client: 127.0.0.1, server: , request: "GET /hotrod/default/foo/foovalue HTTP/1.1", host: "127.0.0.1:8000" Expected results: No 500 requests and error messages. Additional info: The error message was generated when "connect" failed. The customer said "send" also fails sometimes.
Created attachment 1067483 [details] hotrodwrapper.cpp Replace hotrodwrapper.cpp in "hotrodwrapper.zip" of Bug 1228026 with the attachment. And modify "run" target in the Makefile as follows. run: main LD_LIBRARY_PATH=$(LUAJIT_LIB):$(HOTROD_LIB) HOTROD_LOG_LEVEL="TRACE" ./$(PROGRAM) >run.log 2>&1 Then "make run" will run the standalone main function. It seems this program always fails to failover. TRACE [Socket.cpp:107] Trying to connect to 127.0.0.1 (127.0.0.1). DEBUG [Socket.cpp:134] Attempting connection to 127.0.0.1:11222 DEBUG [Socket.cpp:147] Failed to connect to 127.0.0.1:11222 terminate called after throwing an instance of 'infinispan::hotrod::TransportException' what(): Failed to connect (host: 127.0.0.1 port: 11222) Operation now in progress
Alan Field <afield> updated the status of jira HRCPP-197 to Coding In Progress
Waiting to hear if Osamu's customer is satisfied with the fix.
(In reply to Alan Field from comment #5) > Waiting to hear if Osamu's customer is satisfied with the fix. As mailed, the customer will test with 6.5.1.GA. Since the reproducer works well now, this BZ can be closed.
Verified with JDG 6.5.1 CR1
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.