Bug 1255213 - Retry failed saying "Failed to connect" under high load
Summary: Retry failed saying "Failed to connect" under high load
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: CPP Client
Version: 6.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: CR1
: 6.5.1
Assignee: Alan Field
QA Contact: Alan Field
URL:
Whiteboard:
Depends On:
Blocks: 1258047 1259639
TreeView+ depends on / blocked
 
Reported: 2015-08-20 02:02 UTC by Osamu Nagano
Modified: 2025-02-10 03:48 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
: 1258047 1259639 (view as bug list)
Environment:
Last Closed: 2025-02-10 03:48:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
hotrodwrapper.cpp (4.08 KB, text/x-csrc)
2015-08-27 02:16 UTC, Osamu Nagano
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker HRCPP-197 0 Major Resolved C++ client failover does not work consistently 2017-07-11 02:43:25 UTC

Description Osamu Nagano 2015-08-20 02:02:10 UTC
Description of problem:
Against 2 nodes JDG cluster, a Hot Rod C++ client is keep operating get and put.  Then kill 1 node, the client sometimes fails to failover and leaves a error message like "Failed to connect (host: 127.0.1.1 port: 11222) Operation now in progress".


How reproducible:
The customer is the same as Bug 1228026.  Their client is Nginx/LuaJIT and a reproducing environment is attached there as "hotrodwrapper.zip".  Say one Siege session is 500 requests in 100 concurrent users (50 req/s on my machine) and a node is killed during the session.  About once per 10 sessions, 500 is returned which means a retry failed.


Steps to Reproduce:
1. Prepare 2 nodes JDG cluster and env of "hotrodwrapper.zip" in Bug 1228026.
2. Run "make siege".  This runs one Siege session (siege -r5 -c100 -lsiege.log -i -furls.txt).
3. Kill one node during the session.
4. Check error log, ./nginx/logs/error.log, to find the error message.


Actual results:
Siege reports 500 returned and error.log contains the following line.

  2015/08/11 16:53:39 [error] 31974#0: *600 [lua] hotrod.lua:20: Failed to connect (host: 127.0.1.1 port: 11222) Operation now in progress, client: 127.0.0.1, server: , request: "GET /hotrod/default/foo/foovalue HTTP/1.1", host: "127.0.0.1:8000"


Expected results:
No 500 requests and error messages.


Additional info:
The error message was generated when "connect" failed.  The customer said "send" also fails sometimes.

Comment 2 Osamu Nagano 2015-08-27 02:16:33 UTC
Created attachment 1067483 [details]
hotrodwrapper.cpp

Replace hotrodwrapper.cpp in "hotrodwrapper.zip" of Bug 1228026 with the attachment.  And modify "run" target in the Makefile as follows.

  run: main
          LD_LIBRARY_PATH=$(LUAJIT_LIB):$(HOTROD_LIB) HOTROD_LOG_LEVEL="TRACE" ./$(PROGRAM) >run.log 2>&1

Then "make run" will run the standalone main function.  It seems this program always fails to failover.

  TRACE [Socket.cpp:107] Trying to connect to 127.0.0.1 (127.0.0.1).
  DEBUG [Socket.cpp:134] Attempting connection to 127.0.0.1:11222
  DEBUG [Socket.cpp:147] Failed to connect to 127.0.0.1:11222
  terminate called after throwing an instance of 'infinispan::hotrod::TransportException'
    what():  Failed to connect (host: 127.0.0.1 port: 11222) Operation now in progress

Comment 3 JBoss JIRA Server 2015-08-28 15:52:30 UTC
Alan Field <afield> updated the status of jira HRCPP-197 to Coding In Progress

Comment 5 Alan Field 2015-09-08 20:11:21 UTC
Waiting to hear if Osamu's customer is satisfied with the fix.

Comment 6 Osamu Nagano 2015-09-09 00:36:24 UTC
(In reply to Alan Field from comment #5)
> Waiting to hear if Osamu's customer is satisfied with the fix.

As mailed, the customer will test with 6.5.1.GA.  Since the reproducer works well now, this BZ can be closed.

Comment 7 Alan Field 2015-09-09 13:09:52 UTC
Verified with JDG 6.5.1 CR1

Comment 10 Red Hat Bugzilla 2025-02-10 03:48:03 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.


Note You need to log in before you can comment on or make changes to this bug.