Bug 181484 - rhr lmbench lat_udp fails to send packets on ia32e with RHEL3 32/64 bit
rhr lmbench lat_udp fails to send packets on ia32e with RHEL3 32/64 bit
Status: CLOSED WONTFIX
Product: Red Hat Ready Certification Tests
Classification: Retired
Component: other (Show other bugs)
2
ia32e Linux
medium Severity medium
: ---
: ---
Assigned To: Will Woods
Rob Landry
:
Depends On:
Blocks: 191897
  Show dependency treegraph
 
Reported: 2006-02-14 11:22 EST by Matt Zywusko
Modified: 2008-07-17 18:03 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-17 18:03:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace log as requested. (1.56 MB, text/plain)
2006-02-23 14:56 EST, Matt Zywusko
no flags Details
new strace of lat_udp failure (42.28 KB, text/plain)
2006-09-07 14:46 EDT, Gary Case
no flags Details

  None (edit)
Description Matt Zywusko 2006-02-14 11:22:13 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1

Description of problem:
When running the rhr NETWORK2 suite of tests with RHEL3u4 and RHEL3u6 on an Intel IA32E (EM64T) system in both 32-bit and 64-bit, the lat_udp in the lmbench package test fails to generate any packets to send to the server. 

We used strace to determine that the test was not making any calls. We used tcpdump on both the client and the server to determine that no packets were either being received or being sent when the tool was run manually.

Other systems do not appear to exhibit this problem. Obtaining the latest version of lmbench from sourceforge and running the test on the failing platform allowed the test to pass.

Version-Release number of selected component (if applicable):
2.0.4-2

How reproducible:
Always

Steps to Reproduce:
1. Run 2 servers running RHEL3u4 or RHELu6 on an IA32E blade in either 32-bit or 64-bit. 
2. Download and install the rhr2 lmbench suite and install it
3. run /usr/lib/lmbench/bin/<platform>/lat_udp -s on the server
4. run /usr/lib/lmbench/bin/<platform>/lat_udp <server_ip> on the client
5. Wait for "recv timeout" failure message on the server.
  

Actual Results:  The lat_udp program failed to generate, send or receive any UDP packets when run with version 2.0.4-2.

Expected Results:  The latency of UDP traffic should have been calculated after the client and server exchanged network traffic.

Additional info:

lmbench version 3.05a has been proven to work as expected. I had a conversation with Mike Gahagan at RedHat who suggested I file this bug and ask for the rhr2 lmbench RPM to be updated so we can continue our certification testing. This platform's certification testing is blocked until this is resolved.
Comment 1 Will Woods 2006-02-14 16:57:24 EST
Updating to lmbench 3 is a good idea but not prudent for a single bugfix. I will
investigate the source of this particular problem.
Comment 2 Will Woods 2006-02-17 15:23:57 EST
I can't reproduce this on our test machines. Can you confirm that lat_udp wasn't sending packets? 

What does the output of strace -o lat_udp.log /usr/lib/lmbench/bin/<platform>/lat_udp <server_ip> 
show?

Are you sure the server is running? Does it work properly if you try using another (non-x86_64) machine 
as the client/server?
Comment 3 Matt Zywusko 2006-02-22 08:52:44 EST
I don't have that server up and running anymore at this moment so I don't have 
the strace output. I'll try to work on getting that. 

The server was running - I didn't touch it when I move to lmbnch v3 and 
everything worked fine. 

When we ran 'strace lat_udp <ip_addr>' on the failed client, we simply saw the 
timer tick. tcpdump on both the client and server showed no packets from 
strace but we saw the typical arp messages from _other_ systems on the net. 

The included packages worked without any problems on all our other platforms:

RHEL4 32&64bit AMD & EM64T
RHEL3 32&64bit AMD


Comment 4 Matt Zywusko 2006-02-23 14:56:08 EST
Created attachment 125131 [details]
strace log as requested.
Comment 5 Rainer Koenig 2006-03-22 11:30:15 EST
I can verify this bug on exactly one machine with an nVIDIA network chip. Other
machines with a Broadcom NIC don't show this problem. Even the same hardware
that has 2 network interfaces (1 Broadcom, 1 nVIDIA) shows it only with the
nVIDIA NIC.

In my case the lat_udp sends out packages, at least an ethereal session on the
server always catches some packets, sometimes more, sometimes less. And in
average every second run of lat_udp fails with "Recv timed out". 

I also built the lmbench-3.05-a5 from source, but even there lat_udp fails in
every second attempt.

My test network connection is quite simple, there is the client, a Gigabit
switch and the server and no other machines that produce any traffic. Its
practically an isolated network, but even there the test fails.

On the other hand I'm a bit puzzled why the failure of this lat_udp test is
causing the whole NETWORK2 test to fail. UDP is known as a protocol where data
can get lost, so UDP errors are part of the life. 

Comment 6 Will Woods 2006-04-06 17:36:37 EDT
Re: Comment #4: That strace log is successful, so it doesn't help me much. I'd
really like to see a log where the packets aren't being sent, so I can have some
clue where to start looking to find out why they aren't being sent.


Re: Comment #5: I agree that the udp test should allow a certain amount of
packet loss, but in our tests we experienced no packet loss at all. It seemed
reasonable that (on an otherwise quiet network) two machines should be able to
exchange ~7MB of UDP data without dropping packets, and our tests confirmed it,
so we let it go.

In addition, I find it strange that it only fails on certain hardware for you -
doesn't that indicate a hardware problem rather than a test problem?

If you are still seeing this behavior, and you are sure that this is a problem
with the lat_udp test, please file another bug. Since your symptoms are
different we should track that problem separately.
Comment 7 Gary Case 2006-09-07 14:44:41 EDT
Egenera has provided an strace of a failed NETWORK2 lat_udp test. I'm attaching
it to the ticket. 
Comment 8 Gary Case 2006-09-07 14:46:54 EDT
Created attachment 135799 [details]
new strace of lat_udp failure
Comment 9 Rob Landry 2008-07-17 18:03:21 EDT
rhr2 has been deprecated, closing these remaining bugs as WONTFIX.  Future bugs
against the "hts" test suite should be opened agains the "Red Hat Hardware
Certification Program" product selecting either "Test Suite (harness)" or "Test
Suite (tests)" components.

Note You need to log in before you can comment on or make changes to this bug.