Description of problem: The customer uses a qla2300 HBA card to connect to a storage box. when one of memory chips becomes fault, the qla2300 card reports waring to /var/log/message and doesn't work. After replaced a new memory chip, the card become work then. The customer think that it's a big issue that HBA card doesn't work, so it's not enough to report warning message only one time. He think it will be good that HBA card can report warning when each i/o request arrived. The error message like this: > Uhhuh. NMI received. Dazed and confused, but trying to continue > > cpqphp: power fault interrupt > > cpqphp: power fault bit 1 set > > You probably have a hardware problem with your RAM chips qla2300 > > 0000:06:02.0: qla2xxx_eh_abort scsi(1:0:0:26): cmd_timeout_in_sec=0x1e. > > qla2300 0000:06:02.0: Parity error -- HCCR=ffff. > > qla2300 0000:06:02.0: Mailbox command timeout occured. Issuing ISP abort. > > qla2300 0000:06:02.0: Performing ISP error recovery - ha= f6f9c258. > > cpqphp: power fault interrupt > > cpqphp: power fault bit 0 set > > qla2300 0000:06:01.0: qla2xxx_eh_abort scsi(0:0:0:26): cmd_timeout_in_sec=0x1e. > > qla2300 0000:06:01.0: Parity error -- HCCR=ffff. > > qla2300 0000:06:01.0: Mailbox command timeout occured. Issuing ISP abort. > > qla2300 0000:06:01.0: Performing ISP error recovery - ha= f7e0c258. Version-Release number of selected component (if applicable): RHEL AS 4.4 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
This is a hardware problem. The operating system cannot be expected to continue to function properly when there are disk controller errors.