From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1a) Gecko/20020702
Description of problem:
It seems that HPT370A on Adaptec AHA 1200A causes both rh7.3 vanilla and
upgraded kernel-2.4.18-10 to deadlock under heavy disk IO.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Do nothing. Deadlock in 1 to 16 days.
2.rpm --rebuild kernel-2.4.18-10.src.rpm in a loop. Deadlock in 1 to 2 days
Actual Results: Deadlock. No OOPS or anything enlightning in any logs,
different count of disk-leds constantly on.
Expected Results: Normal operation.
Every piece of HW changed, exept the harddisks, but three.
(PSU, NICs, SDRAMs, UDMA cables, Motherboard & CPU and latest AHA1200A changed
from revision A to revision B).
Original HW configuration:
intel celeron 800MHz
abit sa6r htp370 on-board
adaptec 1200a rev a (had it's own irq)
4 x 20GB ibm disks
at-2500tx nic (rtl8138c)
HW configuration now:
intel celeron 1.2GHz
adaptec 1200a rev b (shares irq with promise card)
promise ultra133 tx2
3 x 20GB ibm disks
1 x 60GB ibm disk (replaced one 20GB disk because of smartctl reported 163
2 x at2500-tx (rtl8139c)
I did order a new promise card to replace adaptec alltogether, but since it's
delivery date might be even 3 weeks from now I compiled 2.4.20-pre9 kernel for
the beast and I'm now running rpm --rebuild kernel-2.4.18-10 in a loop again.
Should have been under reproduce 2.:
1 hour - 2 days. Not 1 - 2 days.
So far it seems that 2.4.20-pre9 solves this problem I have been hunting very
soon for four months.
After installing 2.4.20-pre9 the beast has survived 7 rpm--rebuild
kernel-2.4.18-10 and five consecutive full backups.
Deadlock soon after backup was the original problem ... which I successfully
had forgotten trying to find some HW fault instead of self upgrading kernel.
Well, now I have extraneous HW to build more servers :)
Still running tests, but with kernel-2.4.18-10 the system should have deadlocked
several times already.
Well, sometimes you just climb up the tree the wrong way.
The machine has now completed both various full-backups and rebuilds of kernel
with 2.4.20-pre9. Thus I think I found the reason at last, unless the following
week(s) let me know otherwise.
So far I think this long lasting problem is finally solved.