Bug 147807 - rhr2 network tests fail with "connection reset by peer " while doing ab (ApacheBench) through ssh
Summary: rhr2 network tests fail with "connection reset by peer " while doing ab (Apac...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ready Certification Tests
Classification: Retired
Component: rhr2
Version: 1.0
Hardware: ia64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Rob Landry
QA Contact: Rob Landry
URL:
Whiteboard:
Depends On:
Blocks: 143442
TreeView+ depends on / blocked
 
Reported: 2005-02-11 16:00 UTC by Syl DES
Modified: 2007-04-18 17:19 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-05-11 16:39:35 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2005:419 0 normal SHIPPED_LIVE Hardware Certification Suite bug fix update 2005-05-11 04:00:00 UTC

Description Syl DES 2005-02-11 16:00:04 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20041105 Firefox/1.0RC1 (Debian package 0.99+1.0RC1-4)

Description of problem:
Although network tests worked perfectly with previous version of rhr2
(rhr2-rhel4-1.0-10) on our platform, the modification applied to the
ApacheBench (ab) command line conducts to this error after about 40
minutes:
"Read from remote host xxxx: Connection reset by peer"

Systems are RHEL4 RC1.

iptables service is stopped on both machines, and we have a direct
connection through a cross cable (both 100Mb cards). So no NAT, no
firewall.

Here is my attempt to explain the problem:
The command line in the previous version of tests was
ssh -l root -x xxxx 'ab -c 218 -k -n 256 xxxx/httptest.file'

and the new command line is:

ssh -l root -x xxxx 'ab -c 30 -k -n 2000 xxxx/httptest.file'

The number of requests is bigger (-n option), so the opened ssh link
should stay alive longer, while the test being completed.

But it seems that a timeout occurs between 30-40 minutes (lack of
interaction keyboard/display), stopping the ssh connection.
Normally, ApacheBench should display text messages every 10% of
completed requests (this is a side effect which should normally
prevents the ssh to timeout) but the network connection on the apache
side is so solicited that theses messages are not received.

I found a workaround for this problem by adding a "-v 4" option (very
verbose mode) to the ab command line, thus completing successfully the
tests. This -v option sends a lot of messages through the ssh link, so
some of them are correctly received by ssh (in spite of network load)
and displayed, preventing ssh to timeout.

I saw that for some other users, the new ab command line with "-c 30
-k -n 2000" resolved their problems but this is not the case for
us...it worked perfectly before (no timeout).
(See bug 145570, bug 146826, bug 139965)

Version-Release number of selected component (if applicable):
rhr2-rhel4-1.0-14a

How reproducible:
Always

Steps to Reproduce:
1. Install a machine with needed packages
2. Run redhat-ready tests manually
3. Choose NETWORK tests
3. ApacheBench fails to complete
    

Actual Results:  After about 40 minutes, ssh/ab fails with "Read from
remote host xxxx: Connection reset by peer"

Expected Results:  ApacheBench should complete normally

Additional info:

Comment 1 Richard Li 2005-02-11 16:29:47 UTC
Thank you for the detailed bug report. We will investigate this issue and fix
it. We will accept certifications on -10 or -14a or with the -v4 patch.

Comment 2 Syl DES 2005-02-17 16:48:11 UTC
I made a mistake in the previous bug report, the ethernet adapter on
the machine where ab is executed is a Gigabit adapter. But as a result
of auto negotiation, adapter speed is downgraded to 100Mb/s (ethtool
shows that). File transmitted (httptest.file) is 12MB in size.
So I think that the fact the adapter is a Gigabit is not the reason
for tests to fail.

Maybe this could help: I tried to invert roles of the two machines,
and I got the same "Connection reset by peer" error.

When I'm launching the ab command manually (without using a ssh link)
ApacheBench completes successfully, so it tends to prove that ssh is
timing out somewhere...

Comment 3 Richard Li 2005-02-17 16:51:08 UTC
Just to clarify: the -v 4 option still enables the test to succeed in either case?

Comment 4 Syl DES 2005-02-18 13:52:15 UTC
Yes, the -v 4 option enables the test to succeed in either case.

Comment 5 Richard Li 2005-02-18 16:34:44 UTC
merci i've added -v 4 to the CVS, and this fix will go out in the next errata
release.

Comment 6 Syl DES 2005-02-22 11:54:37 UTC
Ok, thanks!

Comment 7 Richard Li 2005-05-11 16:39:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-419.html



Note You need to log in before you can comment on or make changes to this bug.