Bug 322661 - NETWORK issue from Egenera pBlade
NETWORK issue from Egenera pBlade
Status: CLOSED CURRENTRELEASE
Product: Red Hat Hardware Certification Program
Classification: Red Hat
Component: Test Suite (tests) (Show other bugs)
5
All Linux
medium Severity medium
: ---
: ---
Assigned To: Greg Nichols
https://hardware.redhat.com/show.cgi?...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-07 22:55 EDT by Xu Bo
Modified: 2008-07-16 17:58 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-12-17 10:49:03 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Xu Bo 2007-10-07 22:55:52 EDT
Description of problem:
When the network speed value reported by ethtool is something unexpected, rather
than using the default value of 100Mb/s the python script can bomb out with a
divide by zero error. 

Version-Release number of selected component (if applicable):

hts-5.0-48

How reproducible:

Every time

Steps to Reproduce:
1. Install hts 
2. Run network portion of hts on a system that reports bad ethtool data
3. Network.py errors out.
  
Actual results:

Python error:
copy from  /tmp/tmpjNWxlYnfsdir/172.30.192.159/6 to /var/www/html/httptest.file
copy from  /tmp/tmpjNWxlYnfsdir/172.30.192.159/7 to /var/www/html/httptest.file
copy from  /tmp/tmpjNWxlYnfsdir/172.30.192.159/8 to /var/www/html/httptest.file
copy from  /tmp/tmpjNWxlYnfsdir/172.30.192.159/9 to /var/www/html/httptest.file
Traceback (most recent call last):
  File "./network.py", line 432, in ?
    returnValue = networkTest.do(sys.argv)
  File "/usr/lib/python2.4/site-packages/hts/test.py", line 225, in do
    return self.run()
  File "./network.py", line 412, in run
    returnValue = self.nfsTest()
  File "./network.py", line 372, in nfsTest
    print "%u mbit received in %u sec ( %e mbit/s)" % (mbit, rxtime, mbit/rxtime
)
ZeroDivisionError: float division
...finished running ./network.py, exit code=1

Expected results:

Completed test with either a PASS or FAIL result.

Additional info:
Bad ethtool output that's generated by this system:
[root@REDHAT-HTS-RHEL5-1 network]# ethtool eth0
Settings for eth0:
        Supported ports: [ ]
        Supported link modes:   
        Supports auto-negotiation: No
        Advertised link modes:  Not reported
        Advertised auto-negotiation: No
        Speed: Unknown! (0)
        Duplex: Half
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: off
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x08061924 (134617380)
        Link detected: yes
Comment 1 Xu Bo 2007-10-07 22:58:32 EDT
Forcibly setting the interface speed to 100 will work around the problem, so it
looks like all we need is better error handling (and then working with the
vendor to correct the real problem).

    def getInterfaceSpeed(self):       
        self.interfaceSpeed = 100
        return
        #skip the rest of this, trying to work around bad
        #data returning from ethtool
        for interfaceString in (self.interface, "p%s" % self.interface):
            ethtoolCommand = "ethtool %s | fgrep \"Speed\"" % interfaceString
            pipe = os.popen(ethtoolCommand)            line = pipe.readline()
            pipe.close()
            if line:
                pattern = re.compile("\d+")
                match = pattern.search(line)
                if match:
                    self.interfaceSpeed = string.atoi(match.group())
                    return
        # otherwise
        self.interfaceSpeed = 100
        print "interface speed is %u" % self.interfaceSpeed
Comment 2 Greg Nichols 2007-10-08 09:42:41 EDT
Fixed in R4
Comment 3 Gary Case 2007-10-09 13:57:30 EDT
This is the patch I used to get around the speed detection:

 diff -Naurp network.py.orig network.py
--- network.py.orig     2007-10-05 18:21:52.000000000 -0400
+++ network.py  2007-10-05 18:23:58.000000000 -0400
@@ -193,6 +193,10 @@ class NetworkTest(Test):        
         return returnValue == 0
     
     def getInterfaceSpeed(self):       
+        self.interfaceSpeed = 100
+        return
+        #skip the rest of this, trying to work around bad 
+        #data returning from ethtool
         for interfaceString in (self.interface, "p%s" % self.interface):
             ethtoolCommand = "ethtool %s | fgrep \"Speed\"" % interfaceString
             pipe = os.popen(ethtoolCommand)
Comment 4 Gary Case 2007-10-09 17:43:43 EDT
Hm...

There's something else going on here, but I don't know what it is yet. After
going through the ethtool output, I found that all Xen dom0s I tested (i686,
x86_64, ia64 and this Egenera i686 blade) output link status and nothing else.
Link speed is never displayed. This means my original hypothesis was wrong.

That being the case, you'd think that the forced self.interfaceSpeed = 100 line
wouldn't be necessary, as no system reports speed correctly in the scripts. But
on these blades, if you use the unmodified script, the dd command that creates
/var/www/html/httptest.file never stops, eventually filling up the entire hard
drive. So, I put the patch back in and that portion of the test runs.

Shortly after that portion of the test finishes, the system begins the NFS test
and immediately displays errors that are not present when run with the PAE kernel:

nfs: server 172.30.192.193 not responding, still trying

This repeats over and over until the test is killed. After some trial and error,
I determined that the problem is caused by the mount protocol. If you use UDP,
mounts hang after issuing a simple 'ls' command. If you switch to TCP, commands
work as expected. After modifying the network.py script further, changing the
protocol on the nfsopts line to tcp:

       nfsopts="rw,intr,rsize=12288,wsize=12288,tcp"

I can obtain a successful run of the NETWORK test.

Now we need to determine what is causing dd to run out of control and why UDP
mounts are unsuccessful.
Comment 5 Greg Nichols 2007-10-09 19:38:07 EDT
A little background: The reason the network test uses NFS mounted via UDP is to
test UDP.
Comment 6 Gary Case 2007-10-09 22:08:11 EDT
Well then, I would say that it's definitely doing its job. I wonder what's so
different about Egenera's architecture that would cause UDP traffic to fail? I'm
waiting to hear more from Egenera.

Note You need to log in before you can comment on or make changes to this bug.