Bug 250630 - [rhts] ia64 mpt boot failure messages
[rhts] ia64 mpt boot failure messages
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Red Hat Kernel Manager
Martin Jenner
http://rhts.lab.boston.redhat.com/cgi...
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-08-02 11:52 EDT by Don Zickus
Modified: 2007-11-30 17:07 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-09-10 11:05:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Don Zickus 2007-08-02 11:52:19 EDT
Description of problem:

In an effort to clean up boot messages, the following message was noticed on
hp-frosty:

mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000000!
mptscsih: ioc0: Issue of TaskMgmt failed!

please see the above url for full machine info and the error message from dmesg.

Version-Release number of selected component (if applicable):
kernel-2.6.18-37.el5

How reproducible:
always

Steps to Reproduce:
1.boot latest 5.1 distro with the kernel
2.
3.
  
Actual results:


Expected results:
the boot messages should not have words like 'warning, fail, error'

Additional info:
Comment 1 Doug Chapman 2007-08-02 13:20:41 EDT
From what I can tell this error message is due to bad hardware.  It is on the
scsi channel where there are no drives so in this case it isn't causing an issue
(if there were drives on that channel they would be unreachable).  I will try to
fix the hardware.
Comment 2 Jeff Burke 2007-08-02 13:38:43 EDT
Doug,
 I think you may have missed the point on this BZ. We are trying to clean up
errors during boot.  Under "Normal" circumstances a system should _not_ report
"ERROR" or "FAILED!" if there is not an real issue. 

 Having those type of messages during a normal boot is only going to cause
un-needed support calls. Or worse people will start to ignore them and may miss
a real issue.

 Please create a patch that has valid messages to reflect what is actually
happening.

mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000000!
mptscsih: ioc0: Issue of TaskMgmt failed!
Comment 3 Doug Chapman 2007-08-02 13:47:48 EDT
(In reply to comment #2)
> Doug,
>  I think you may have missed the point on this BZ. We are trying to clean up
> errors during boot.  Under "Normal" circumstances a system should _not_ report
> "ERROR" or "FAILED!" if there is not an real issue. 
> 
>  Having those type of messages during a normal boot is only going to cause
> un-needed support calls. Or worse people will start to ignore them and may miss
> a real issue.
> 

Jeff,

It most certainly _is_ a real issue.  It is broken hardware.  It isn't causing
the system problems in this case just because there are no scsi drives on the
bad scsi channel.

This system (a zx6000) is technically not even supported anymore.  If you ask me
the right answer is to remove it from RHTS and retire the system.

Comment 4 Jeff Burke 2007-08-02 14:06:00 EDT
Doug,
    I did not say that it was not a real issue. I said "_if_ there is not an
real issue."

    I may have mis-understood your comments in #1 "From what I can tell this
error message is due to bad hardware." You did not say difinitivly that it was a
hardware issue. You said "From what I can tell". So I said "_if_ there is not an
real issue." 

    Regardless a system is either supported or not. Can you please work with Ron
P and Matt B. To determine if that hardware should remain in RHTS.

Jeff
Comment 5 Doug Chapman 2007-08-02 14:33:58 EDT
Matt, Ron,

Any concern about removing this from RHTS?  My reasons for pulling it are:

1) the hardware errors as described above
2) system is no longer supported by HP
3) no longer supported by Red Hat past RHEL3
4) we still have an rx2600 in rhts which is identical other than model name

- Doug
Comment 6 Matt Brodeur 2007-08-02 14:54:49 EDT
(In reply to comment #5)
> Matt, Ron,
> 
> Any concern about removing this from RHTS?  My reasons for pulling it are:
> 
> 1) the hardware errors as described above
> 2) system is no longer supported by HP
> 3) no longer supported by Red Hat past RHEL3
> 4) we still have an rx2600 in rhts which is identical other than model name

Is the other rx2600 supported by HP or RH for anything current?  I'll note that
both machines seem to work Just Fine with RHEL5.

If we pull it from RHTS I might recommend moving to the errata pool for as long
as it continues working.  They're pretty desperate for ia64.

   - Matt
Comment 7 Doug Chapman 2007-08-02 15:01:25 EDT
Matt,

Really any of HPs ia64 servers _should_ work just fine with RHEL5, they just
were never certified.  The rx2600 is officially supported on RHEL4 but HP has no
plans to certify them for RHEL5 since it has been replaced by rx2620 a few years
ago.

As far as the errata pool I agree that is a good idea if they are in need of
hardware and can make use of it.  As I mentioned before since the bad scsi
channel does not contain any disks it should not cause any real issues however
this means you will not be able to connect any external storage since that is
the bad channel.

Comment 8 Matt Brodeur 2007-08-03 18:13:52 EDT
(In reply to comment #7)
> 
> As far as the errata pool I agree that is a good idea if they are in need of
> hardware and can make use of it.

frosty has been pulled from RHTS and moved over to the errata team.


Comment 9 Jarod Wilson 2007-09-01 22:40:54 EDT
Was it ever determined if this was indeed a case of faulty hardware, or if we need to actually fix 
something?
Comment 10 Doug Chapman 2007-09-02 17:58:32 EDT
I have no reason to believe it was anything other than bad hardware.  We are not
seeing it on another system are we?

Comment 11 Jarod Wilson 2007-09-10 11:05:49 EDT
(In reply to comment #10)
> I have no reason to believe it was anything other than bad hardware.  We are not
> seeing it on another system are we?

Not that I'm aware of, no. I'm just going through, trying to resolve as many
rhel5 kernel bugs as possible. On the belief that it was indeed bad hardware,
closing NOTABUG.

Note You need to log in before you can comment on or make changes to this bug.