Bug 250630 - [rhts] ia64 mpt boot failure messages
Summary: [rhts] ia64 mpt boot failure messages
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Martin Jenner
URL: http://rhts.lab.boston.redhat.com/cgi...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-08-02 15:52 UTC by Don Zickus
Modified: 2007-11-30 22:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-09-10 15:05:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Don Zickus 2007-08-02 15:52:19 UTC
Description of problem:

In an effort to clean up boot messages, the following message was noticed on
hp-frosty:

mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000000!
mptscsih: ioc0: Issue of TaskMgmt failed!

please see the above url for full machine info and the error message from dmesg.

Version-Release number of selected component (if applicable):
kernel-2.6.18-37.el5

How reproducible:
always

Steps to Reproduce:
1.boot latest 5.1 distro with the kernel
2.
3.
  
Actual results:


Expected results:
the boot messages should not have words like 'warning, fail, error'

Additional info:

Comment 1 Doug Chapman 2007-08-02 17:20:41 UTC
From what I can tell this error message is due to bad hardware.  It is on the
scsi channel where there are no drives so in this case it isn't causing an issue
(if there were drives on that channel they would be unreachable).  I will try to
fix the hardware.


Comment 2 Jeff Burke 2007-08-02 17:38:43 UTC
Doug,
 I think you may have missed the point on this BZ. We are trying to clean up
errors during boot.  Under "Normal" circumstances a system should _not_ report
"ERROR" or "FAILED!" if there is not an real issue. 

 Having those type of messages during a normal boot is only going to cause
un-needed support calls. Or worse people will start to ignore them and may miss
a real issue.

 Please create a patch that has valid messages to reflect what is actually
happening.

mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000000!
mptscsih: ioc0: Issue of TaskMgmt failed!

Comment 3 Doug Chapman 2007-08-02 17:47:48 UTC
(In reply to comment #2)
> Doug,
>  I think you may have missed the point on this BZ. We are trying to clean up
> errors during boot.  Under "Normal" circumstances a system should _not_ report
> "ERROR" or "FAILED!" if there is not an real issue. 
> 
>  Having those type of messages during a normal boot is only going to cause
> un-needed support calls. Or worse people will start to ignore them and may miss
> a real issue.
> 

Jeff,

It most certainly _is_ a real issue.  It is broken hardware.  It isn't causing
the system problems in this case just because there are no scsi drives on the
bad scsi channel.

This system (a zx6000) is technically not even supported anymore.  If you ask me
the right answer is to remove it from RHTS and retire the system.



Comment 4 Jeff Burke 2007-08-02 18:06:00 UTC
Doug,
    I did not say that it was not a real issue. I said "_if_ there is not an
real issue."

    I may have mis-understood your comments in #1 "From what I can tell this
error message is due to bad hardware." You did not say difinitivly that it was a
hardware issue. You said "From what I can tell". So I said "_if_ there is not an
real issue." 

    Regardless a system is either supported or not. Can you please work with Ron
P and Matt B. To determine if that hardware should remain in RHTS.

Jeff


Comment 5 Doug Chapman 2007-08-02 18:33:58 UTC
Matt, Ron,

Any concern about removing this from RHTS?  My reasons for pulling it are:

1) the hardware errors as described above
2) system is no longer supported by HP
3) no longer supported by Red Hat past RHEL3
4) we still have an rx2600 in rhts which is identical other than model name

- Doug


Comment 6 Matt Brodeur 2007-08-02 18:54:49 UTC
(In reply to comment #5)
> Matt, Ron,
> 
> Any concern about removing this from RHTS?  My reasons for pulling it are:
> 
> 1) the hardware errors as described above
> 2) system is no longer supported by HP
> 3) no longer supported by Red Hat past RHEL3
> 4) we still have an rx2600 in rhts which is identical other than model name

Is the other rx2600 supported by HP or RH for anything current?  I'll note that
both machines seem to work Just Fine with RHEL5.

If we pull it from RHTS I might recommend moving to the errata pool for as long
as it continues working.  They're pretty desperate for ia64.

   - Matt


Comment 7 Doug Chapman 2007-08-02 19:01:25 UTC
Matt,

Really any of HPs ia64 servers _should_ work just fine with RHEL5, they just
were never certified.  The rx2600 is officially supported on RHEL4 but HP has no
plans to certify them for RHEL5 since it has been replaced by rx2620 a few years
ago.

As far as the errata pool I agree that is a good idea if they are in need of
hardware and can make use of it.  As I mentioned before since the bad scsi
channel does not contain any disks it should not cause any real issues however
this means you will not be able to connect any external storage since that is
the bad channel.



Comment 8 Matt Brodeur 2007-08-03 22:13:52 UTC
(In reply to comment #7)
> 
> As far as the errata pool I agree that is a good idea if they are in need of
> hardware and can make use of it.

frosty has been pulled from RHTS and moved over to the errata team.




Comment 9 Jarod Wilson 2007-09-02 02:40:54 UTC
Was it ever determined if this was indeed a case of faulty hardware, or if we need to actually fix 
something?

Comment 10 Doug Chapman 2007-09-02 21:58:32 UTC
I have no reason to believe it was anything other than bad hardware.  We are not
seeing it on another system are we?



Comment 11 Jarod Wilson 2007-09-10 15:05:49 UTC
(In reply to comment #10)
> I have no reason to believe it was anything other than bad hardware.  We are not
> seeing it on another system are we?

Not that I'm aware of, no. I'm just going through, trying to resolve as many
rhel5 kernel bugs as possible. On the belief that it was indeed bad hardware,
closing NOTABUG.


Note You need to log in before you can comment on or make changes to this bug.