From Bugzilla Helper: User-Agent: Mozilla/5.0 (compatible; Konqueror/3.2; Linux) (KHTML, like Gecko) Description of problem: I've been working with a customer on an 8way x440 connected w/ onboard tg3 to gigabit ethernet. Started with RHEL3-U1, and upgraded to the RHEL3-U2 errata kernel (2.4.21-15.EL) to resolve SCSI RAID performance issues (bug #104633). After rebooting, running scp (using blowfish) gets around 20-25MB/sec, and things run well as expected. However, shortly after starting Oracle9i, network performance begins to suffer. System load is neglagible on the box. Even after killing off Oracle, scp performance is terrible, running from 6MB/sec to as low as 1.8Mb/s. So far we've seen it happen about 3 times. Bringing the interface down and removing the module, then bringing the interface back up does not help. I've verified that the problem isn't traffic on the network, as I plugged my laptop (w/ gigabit) directly into the system before and after we started to see the problem and the throughput was the same in both cases (fine in one, terrible in another). Version-Release number of selected component (if applicable): kernel-smp-2.4.21-15.EL How reproducible: 3 of 3 times reproduced. Steps to Reproduce: 1. Boot linux-2.4.21-15.EL.smp kernel 2. Do some scp tests, note performance level 3. Start oracle 4. Wait for awhile (within 30 minutes) 5. Stop oracle 6. re-run scp tests Actual Results: Low performance noted (6-1.5MB/s) Expected Results: Normal performance noted (25-20MB/s) Additional info: Regards a customer issue, so quick feedback would be greatly appreciated. :)
If quick feedback is necessary, you would be wise to go through support (presuming you have an up to date entitlement), at https://www.redhat.com/apps/support/ - bugzilla is less for support than for bug fixes.
Ah,good point. I'm just a kernel dev helping out on this issue and bugzilla is the standard interface for me. I'll advise the customer to take the support path (although the immediacy has past, at the moment they're happy backing up to the RHEL3U1 kernel), however I'll continue to follow and work this bug as normally done in my testing and development context. thanks
So RHEL3-U1 did not show the slowdown? Thanks to your detailed report it is clear that the tg3 driver itself is not the problem, it seems to be something generic. Perhaps something to do with memory pressure. There is no way to reproduce this other than running Oracle?
Correct, we never saw the slowdown w/ RHEL3-U1. I have not been able to reproduce the issue outside of the customer's setup (which I no longer have access to), but I haven't really tried too hard. If I get a chance later, I'd like to try to write a few test cases that may strain the system in the same way (large number of open connections, lots of processes, etc).
Need addtional info from IBM.
IBM need details here to proceed further.
I'm still trying to reproduce this issue, however I've run into unrelated hardware problems. I'll update this bug as soon as I know more.
I have not been able to reproduce this issue, so I'm going to close this. It can be reopened if it is ever seen again.