Bug 246130 - [rhts] ia64 acpi complains of errors
Summary: [rhts] ia64 acpi complains of errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: ia64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Prarit Bhargava
QA Contact: Martin Jenner
URL: http://rhts.lab.boston.redhat.com/cgi...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-06-28 18:23 UTC by Don Zickus
Modified: 2008-05-21 14:45 UTC (History)
5 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 14:45:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch against kernel-2.6.18-36.el5.ia64 (869 bytes, patch)
2007-07-24 14:35 UTC, George Beshers
no flags Details | Diff
cleanup ACPI MADT no IOSAPIC warning (1.25 KB, patch)
2007-12-19 23:31 UTC, George Beshers
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Don Zickus 2007-06-28 18:23:08 UTC
Description of problem:
In my quest to rid the boot up of warnings, failures, and errors, I have run
into an acpi issue on ia64

ACPI: Error parsing MADT - no IOSAPIC entries

Version-Release number of selected component (if applicable):
kernel-2.6.18-32.el5

How reproducible:
boot the kernel on altix3

See the url above to the full dmesg log

I would like to either fix this error message or tone it down to something
meaningful to prevent customer support issues from cropping up.

Comment 1 Prarit Bhargava 2007-06-28 18:51:31 UTC
IIRC, this is an error that can be resolved by incorporating John Keller's
upstream ACPI work.

I've pinged jpk to see if knows what the proper solution is...

P.

Comment 2 Prarit Bhargava 2007-07-02 13:56:24 UTC
Just emailed back-and-forth with jpk.  There is no fix for this particular issue.

P.

Comment 3 Don Zickus 2007-07-17 19:50:11 UTC
Is there a particular reason why this can not be fixed?  Having "Error" pop up
during boot up might be cause for customer support calls and as such Jeff and I
have been trying to rid all boot up messages that say 'Error, Fail, Warning'.

If this is not a 'true' error, can we tone down the language ie

ACPI: parsing MADT - no IOSAPIC entries detected (skipping)

or something without the word Error if this is an acceptable path a production
machine can take?

I know this is a complete nitpick and only distros care about stuff like this,
but people have to be aware of what might happen when they start flinging these
key words around on those 'not in the know'


Comment 4 Prarit Bhargava 2007-07-17 23:31:17 UTC
dzickus, the problem then becomes diagnosing systems which actually do have this
as an error (think the xw4550 in my cube) and which won't boot without acpi=off.

P.

Comment 5 Prarit Bhargava 2007-07-17 23:33:15 UTC
>I know this is a complete nitpick and only distros care about stuff like this,
>but people have to be aware of what might happen when they start flinging these
>key words around on those 'not in the know'

That's why we have Knowledge Base articles & Release Notes :)

P.

Comment 6 George Beshers 2007-07-18 13:55:34 UTC
There is a PV (SGIism for BZ) open on this and I had tried to
get some action on it when Don first raised the question.  This
is the response I got:

> 
> 
> RedHat was asking about this.  If there is no planned upgrade
> to the PROM perhaps I should just submit a patch to cleanup
> the error message upstream?
> 

  AFAIK, there is no planned upgrade or fix that will
  get rid of this error message. We can't make use of the
  the IOSAPIC entries in the MADT on SN.

  Cleanup patch is probably the correct thing to do...


Comment 7 Don Zickus 2007-07-18 14:03:50 UTC
In regards to comment #4,

Are the ones failing pre-production hardware? or is post-production?  How are we
supposed to distinguish b/w legitimate error and a false positive?

*sigh* I really don't want to have to convince Jeff to create blacklist entries
for these errors in his tests.  At the same time I don't want to brush over this
error when it is real.  

Would it be better to use acpi=off on these test machines to remove the error?


Comment 8 George Beshers 2007-07-18 14:59:05 UTC
The SGI machines are post-production, RedHat certified systems.

The only thing I can suggest is a test for the box being an sn
(hence SGI) system before the error message is printed.

Prarit is likely to have a clearer understanding of this code
than I do, but I will take a look this afternoon.


Comment 9 Prarit Bhargava 2007-07-18 15:01:22 UTC
> Would it be better to use acpi=off on these test machines to remove the error?

No, it would not.  This error is pointing out that one table is bad/doesn't
exist.  The other tables are okay to use.

P.


Comment 10 George Beshers 2007-07-24 14:35:33 UTC
Created attachment 159852 [details]
patch against kernel-2.6.18-36.el5.ia64

As you can see it is really straight forward.

I have not tested it on anything but altix systems.

Comment 11 Don Zickus 2007-07-24 14:43:55 UTC
Ok, this patch looks reasonable (aside from the grammar - have not --> have no).
 Would it be possible to push something like this upstream? Do you think Tony
would go for it?

Comment 12 George Beshers 2007-07-24 15:35:27 UTC
Mildly skeptical.

Mostly as I am guessing he will push back to avoid making a precedent
of not getting the ACPI information correct.

However, let me see if anyone in SGI thinks differently first.


Comment 13 Prarit Bhargava 2007-07-24 15:38:44 UTC
Maybe, maybe not George.  The argument we're making is that SGI doesn't support
MADT tables at all.  So I can't see why Tony would argue with this.  IMO the
code currently doesn't distinguish between "broken" MADT and "not supported MADT
-- so that's a bug.

P.

Comment 14 George Beshers 2007-12-19 23:31:24 UTC
Created attachment 290080 [details]
cleanup ACPI MADT no IOSAPIC warning

Comment 15 Don Zickus 2008-01-03 19:42:03 UTC
posted to rhkl on 12/19

Comment 18 Bryan Stillwell 2008-01-10 22:45:04 UTC
I've seen this message show up on an HP rx3600 when booting a domU in rhel5.1.

Comment 19 Don Zickus 2008-01-11 20:02:01 UTC
in 2.6.18-66.el5dz_test
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 20 Bryan Stillwell 2008-01-11 20:23:58 UTC
Don,

The 2.6.18-66.el5 kernel doesn't fix this problem for me on the rx3600/domU. 
This is probably because the domU isn't an SGI sn2 machine like the patch checks
for with ia64_platform_is("sn2").

Bryan

Comment 21 Don Zickus 2008-01-14 01:26:29 UTC
Bryan this patch was specific against SGI sn2 machines because they don't have
ACPI tables (or not setup correctly).  The point of the patch was to eliminate
false positives.  Now if HP is seeing these errors how do I know they are not
real hardware issues?

Yeah, I know the bugzilla description probably wasn't specific enough to mention
I was really aiming at SGI boxes.  I think Prarit pointed out the HP uses ACPI
corrrectly, such that if a box raised this error then it needs immediate
hardware attention.  Prarit?  George?

-Don


Comment 22 George Beshers 2008-01-14 13:40:46 UTC
On Altix the ACPI tables were extended to handle NUMA
and some SGI specific stuff in ways that did not become
standard.   I have been told that due to resource
limitations there is really no chance that these
are getting fixed.

CC'ing Doug Chapman to speak to HP.
http://2ccnaumb.kdezmasdv.com/Gj1zJCgjjZ/MAlnYmVzaGVyc0Bjb3gubmV0/aaSrNv/2369_6633/q2lTT1

Comment 23 Bryan Stillwell 2008-01-14 17:09:47 UTC
Don,

I don't believe I've ever seen this problem show up on the actual hardware, just
within a Xen domU.  Does Xen even handle ACPI on ia64?

Bryan

Comment 24 Don Zickus 2008-01-14 18:18:36 UTC
Doug, maybe you can help me on this as you are move involved with xen on ia64
than I am. 

-Don


Comment 25 Doug Chapman 2008-01-14 19:36:27 UTC
We do neet ACPI for xen, you can't boot ia64 at all with out ACPI.  However, it
does appear that the ACPI tables in a pv ia64 xen guest do not create IOSAPIC
entries (just like how SGI doesn't create them).  So, perhaps the right way to
go is to modify George's patch:

+		if (ia64_platform_is("sn2"))

becomes

+		if (ia64_platform_is("sn2") || ia64_platform_is("xen"))

Comment 26 Don Zickus 2008-01-14 19:49:31 UTC
Unfortunately, I have already taken George's patch and I would push to see a new
bz for the xen piece.  I have a question though, in the SGI case they knew they
didn't create IOSAPIC entries and had no plans to address it either.  In the xen
case, is this an oversight, a xen bug, or is this intentional by xen-source?

It would be nice to have an explanation for the behavior before committing a
similar patch to SGIs.

-Don


Comment 27 Doug Chapman 2008-01-14 20:13:04 UTC
Don,

I chatted with Alex Williamson back at HP on this (who knows much more about
ia64-xen) and he says that not having ISOAPIC entries in xen was intentional and
he agrees with doing something similar to my suggestion in comment #25.

Since George's patch is already in the kernel I am happy to submit an
incremental patch.  Can I submit a patch against this BZ or would that cause
confusion?  If so I am happy to file another BZ to submit the patch against.



Comment 28 Don Zickus 2008-01-14 20:36:18 UTC
Thanks, Doug.  Post an incremental patch and use this bz. I'll set it back to
POST.  Make sure to explain in the description that xen did this intentionally.

Thanks.
-Don


Comment 29 George Beshers 2008-01-14 21:18:41 UTC
Consider this verified as of -67 on Altix platforms :).

George


Comment 30 Don Zickus 2008-01-21 17:27:13 UTC
in 2.6.18-71.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 32 Bryan Stillwell 2008-01-23 20:21:25 UTC
Don,

The -71.el5 kernel appears to fix the error message on ia64/xen domUs too.

Thanks,
Bryan

Comment 34 errata-xmlrpc 2008-05-21 14:45:12 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.