Red Hat Bugzilla – Bug 181484
rhr lmbench lat_udp fails to send packets on ia32e with RHEL3 32/64 bit
Last modified: 2008-07-17 18:03:21 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:18.104.22.168) Gecko/20060111 Firefox/22.214.171.124
Description of problem:
When running the rhr NETWORK2 suite of tests with RHEL3u4 and RHEL3u6 on an Intel IA32E (EM64T) system in both 32-bit and 64-bit, the lat_udp in the lmbench package test fails to generate any packets to send to the server.
We used strace to determine that the test was not making any calls. We used tcpdump on both the client and the server to determine that no packets were either being received or being sent when the tool was run manually.
Other systems do not appear to exhibit this problem. Obtaining the latest version of lmbench from sourceforge and running the test on the failing platform allowed the test to pass.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Run 2 servers running RHEL3u4 or RHELu6 on an IA32E blade in either 32-bit or 64-bit.
2. Download and install the rhr2 lmbench suite and install it
3. run /usr/lib/lmbench/bin/<platform>/lat_udp -s on the server
4. run /usr/lib/lmbench/bin/<platform>/lat_udp <server_ip> on the client
5. Wait for "recv timeout" failure message on the server.
Actual Results: The lat_udp program failed to generate, send or receive any UDP packets when run with version 2.0.4-2.
Expected Results: The latency of UDP traffic should have been calculated after the client and server exchanged network traffic.
lmbench version 3.05a has been proven to work as expected. I had a conversation with Mike Gahagan at RedHat who suggested I file this bug and ask for the rhr2 lmbench RPM to be updated so we can continue our certification testing. This platform's certification testing is blocked until this is resolved.
Updating to lmbench 3 is a good idea but not prudent for a single bugfix. I will
investigate the source of this particular problem.
I can't reproduce this on our test machines. Can you confirm that lat_udp wasn't sending packets?
What does the output of strace -o lat_udp.log /usr/lib/lmbench/bin/<platform>/lat_udp <server_ip>
Are you sure the server is running? Does it work properly if you try using another (non-x86_64) machine
as the client/server?
I don't have that server up and running anymore at this moment so I don't have
the strace output. I'll try to work on getting that.
The server was running - I didn't touch it when I move to lmbnch v3 and
everything worked fine.
When we ran 'strace lat_udp <ip_addr>' on the failed client, we simply saw the
timer tick. tcpdump on both the client and server showed no packets from
strace but we saw the typical arp messages from _other_ systems on the net.
The included packages worked without any problems on all our other platforms:
RHEL4 32&64bit AMD & EM64T
RHEL3 32&64bit AMD
Created attachment 125131 [details]
strace log as requested.
I can verify this bug on exactly one machine with an nVIDIA network chip. Other
machines with a Broadcom NIC don't show this problem. Even the same hardware
that has 2 network interfaces (1 Broadcom, 1 nVIDIA) shows it only with the
In my case the lat_udp sends out packages, at least an ethereal session on the
server always catches some packets, sometimes more, sometimes less. And in
average every second run of lat_udp fails with "Recv timed out".
I also built the lmbench-3.05-a5 from source, but even there lat_udp fails in
every second attempt.
My test network connection is quite simple, there is the client, a Gigabit
switch and the server and no other machines that produce any traffic. Its
practically an isolated network, but even there the test fails.
On the other hand I'm a bit puzzled why the failure of this lat_udp test is
causing the whole NETWORK2 test to fail. UDP is known as a protocol where data
can get lost, so UDP errors are part of the life.
Re: Comment #4: That strace log is successful, so it doesn't help me much. I'd
really like to see a log where the packets aren't being sent, so I can have some
clue where to start looking to find out why they aren't being sent.
Re: Comment #5: I agree that the udp test should allow a certain amount of
packet loss, but in our tests we experienced no packet loss at all. It seemed
reasonable that (on an otherwise quiet network) two machines should be able to
exchange ~7MB of UDP data without dropping packets, and our tests confirmed it,
so we let it go.
In addition, I find it strange that it only fails on certain hardware for you -
doesn't that indicate a hardware problem rather than a test problem?
If you are still seeing this behavior, and you are sure that this is a problem
with the lat_udp test, please file another bug. Since your symptoms are
different we should track that problem separately.
Egenera has provided an strace of a failed NETWORK2 lat_udp test. I'm attaching
it to the ticket.
Created attachment 135799 [details]
new strace of lat_udp failure
rhr2 has been deprecated, closing these remaining bugs as WONTFIX. Future bugs
against the "hts" test suite should be opened agains the "Red Hat Hardware
Certification Program" product selecting either "Test Suite (harness)" or "Test
Suite (tests)" components.