From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040514 Description of problem: This issue has been raised after the "low memory" bug has been fixed. So, some preliminary information can be found in this previous issue BZ#121029. We used both kernel-2.4.21-15.5.EL and kernel-2.4.21-15.11.EL with same bad results. Machine hangs in less than one hour. Following comments/results were copied from BZ #121029. Tar file can be extracted from BZ #121029. trace files after HANG (with "echo m > /proc/sysrq-trigger") The attached "compressed tar" file contents trace files about dbgen HANG with a NS5160. This last test has been done with traces taken every 30s and including the "echo m > /proc/sysrq-trigger". The machine "broke" after 40 minutes. The tar file includes: - meminfo.sh: script that takes the traces - meminfo.txt: ouput from meminfo.sh - top.txt: result of the "top" command runned during the test - messages: /var/log/messages saved after rebooting the machine. Version-Release number of selected component (if applicable): kernel-2.4.21-15.11.EL How reproducible: Always Steps to Reproduce: 1. Get the "dbgen" test which has been sent over to RedHat. 2. Run this "dbgen" test. 3. More detailled information can be found in BZ #121029 Actual Results: Machine hangs everytime this test is performed. Expected Results: Machine should be loaded but should still be alive. Additional info: As SCSI card could be suspected (see #121029), please find its description: - SCSI Adapter= Adaptec Content of /proc/pci: Bus 4, device 1, function 0: SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 1). IRQ 54. Master Capable. Latency=64. Min Gnt=40.Max Lat=25. I/O at 0xc400 [0xc4ff]. Non-prefetchable 64 bit memory at 0xfa6fe000 [0xfa6fefff]. Bus 4, device 1, function 1: SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (#2) (rev 1). IRQ 55. Master Capable. Latency=64. Min Gnt=40.Max Lat=25. I/O at 0xc800 [0xc8ff]. Non-prefetchable 64 bit memory at 0xfa6ff000 [0xfa6fffff].
Hi Tim, Are you aware of such a known problem ? Could you reassign this issue to the right people in your team ? Thanks.
Pierre: As you saw in a mail message from Tim in response to your report of a problem with the LSI22320-R adapters on your IA64 platforms (on both RHEL2.1-U4 and RHEL3-U1), we've got a pre-beta RHEL 3 U3 kernel (call it "U3betaRC" for U3betaReleaseCandidate) available for you to test that may also address the issue that you've reported in this bugzilla. Location of the kernel: ftp://people.redhat.com/tburke/.pre_u3 We'll be waiting for your feedback.... Sue
I see from the syslogs that you have storage on mpt fusion, QLogic, and aic7xxx adapters. I gather that your system is installed on the mpt fusion disks, and the dbgen test is running exclusively on the aic7xxx disks. Are the QLogic disks idle? Please describe your storage configuration, and whether there is anything running other than dbgen. If you could run sysreport and post the results that would be helpful as well. Thanks.
The other problem with LSI22320-R adapters (IT #43391) prevented us to go further on testing the new kernel Tim provided to us. But I asked to get further information as you requested in your note above (Comment#3), though.
Created attachment 101637 [details] sysreport result The various ddgen processes write in various file systems. Some of them are in a SCSI disk subsystem (SR0812 - Chaparral accessed through an adaptec SCSI adapter (aic7xxx driver)), other are in a fibre channel disk subsystem (FDA2300 - NEC iStorage accessed through QLogic QLA2340 adapters (qla2300 driver)). In the sysreport given in attachment, the file systems used by dbgen processes are not present because we started other test on this server.
A workaround has been found for LSI22320-R adapters boot problem (IT #43391) and BZ #127385 has been opened to get it fix on RHEL3-U3. But we're still waiting a new RHEL3-betaU3 version to check potential enhancement on this current defect.
This issue has been fixed on RHEL3-U4 kernel (beta versions). It can closed now. We'll open another one if another "same" problem would be raised on G.A. version, but we don't expect such a regression ...
Thank you for the information, Pierre. I will revert the state of this bug to MODIFIED, since U4 is not yet released. It will automatically be changed to CLOSED/ERRATA by the Errata System when U4 becomes available on RHN.
Larry's fix was committed to the RHEL3 U4 patch pool on 18-Oct-2004 (in kernel version 2.4.21-22.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html