246130 – [rhts] ia64 acpi complains of errors

Bug 246130 - [rhts] ia64 acpi complains of errors

Summary: [rhts] ia64 acpi complains of errors

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.0
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Prarit Bhargava
QA Contact:	Martin Jenner
Docs Contact:
URL:	http://rhts.lab.boston.redhat.com/cgi...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-06-28 18:23 UTC by Don Zickus
Modified:	2008-05-21 14:45 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHBA-2008-0314
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-21 14:45:12 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch against kernel-2.6.18-36.el5.ia64 (869 bytes, patch) 2007-07-24 14:35 UTC, George Beshers	no flags	Details \| Diff
cleanup ACPI MADT no IOSAPIC warning (1.25 KB, patch) 2007-12-19 23:31 UTC, George Beshers	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0314	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5.2	2008-05-20 18:43:34 UTC

Description Don Zickus 2007-06-28 18:23:08 UTC

Description of problem:
In my quest to rid the boot up of warnings, failures, and errors, I have run
into an acpi issue on ia64

ACPI: Error parsing MADT - no IOSAPIC entries

Version-Release number of selected component (if applicable):
kernel-2.6.18-32.el5

How reproducible:
boot the kernel on altix3

See the url above to the full dmesg log

I would like to either fix this error message or tone it down to something
meaningful to prevent customer support issues from cropping up.

Comment 1 Prarit Bhargava 2007-06-28 18:51:31 UTC

IIRC, this is an error that can be resolved by incorporating John Keller's
upstream ACPI work.

I've pinged jpk to see if knows what the proper solution is...

P.

Comment 2 Prarit Bhargava 2007-07-02 13:56:24 UTC

Just emailed back-and-forth with jpk.  There is no fix for this particular issue.

P.

Comment 3 Don Zickus 2007-07-17 19:50:11 UTC

Is there a particular reason why this can not be fixed?  Having "Error" pop up
during boot up might be cause for customer support calls and as such Jeff and I
have been trying to rid all boot up messages that say 'Error, Fail, Warning'.

If this is not a 'true' error, can we tone down the language ie

ACPI: parsing MADT - no IOSAPIC entries detected (skipping)

or something without the word Error if this is an acceptable path a production
machine can take?

I know this is a complete nitpick and only distros care about stuff like this,
but people have to be aware of what might happen when they start flinging these
key words around on those 'not in the know'

Comment 4 Prarit Bhargava 2007-07-17 23:31:17 UTC

dzickus, the problem then becomes diagnosing systems which actually do have this
as an error (think the xw4550 in my cube) and which won't boot without acpi=off.

P.

Comment 5 Prarit Bhargava 2007-07-17 23:33:15 UTC

>I know this is a complete nitpick and only distros care about stuff like this,
>but people have to be aware of what might happen when they start flinging these
>key words around on those 'not in the know'

That's why we have Knowledge Base articles & Release Notes :)

P.

Comment 6 George Beshers 2007-07-18 13:55:34 UTC

There is a PV (SGIism for BZ) open on this and I had tried to
get some action on it when Don first raised the question.  This
is the response I got:

> 
> 
> RedHat was asking about this.  If there is no planned upgrade
> to the PROM perhaps I should just submit a patch to cleanup
> the error message upstream?
> 

  AFAIK, there is no planned upgrade or fix that will
  get rid of this error message. We can't make use of the
  the IOSAPIC entries in the MADT on SN.

  Cleanup patch is probably the correct thing to do...

Comment 7 Don Zickus 2007-07-18 14:03:50 UTC

In regards to comment #4,

Are the ones failing pre-production hardware? or is post-production?  How are we
supposed to distinguish b/w legitimate error and a false positive?

*sigh* I really don't want to have to convince Jeff to create blacklist entries
for these errors in his tests.  At the same time I don't want to brush over this
error when it is real.  

Would it be better to use acpi=off on these test machines to remove the error?

Comment 8 George Beshers 2007-07-18 14:59:05 UTC

The SGI machines are post-production, RedHat certified systems.

The only thing I can suggest is a test for the box being an sn
(hence SGI) system before the error message is printed.

Prarit is likely to have a clearer understanding of this code
than I do, but I will take a look this afternoon.

Comment 9 Prarit Bhargava 2007-07-18 15:01:22 UTC

> Would it be better to use acpi=off on these test machines to remove the error?

No, it would not.  This error is pointing out that one table is bad/doesn't
exist.  The other tables are okay to use.

P.

Comment 10 George Beshers 2007-07-24 14:35:33 UTC

Created attachment 159852 [details]
patch against kernel-2.6.18-36.el5.ia64

As you can see it is really straight forward.

I have not tested it on anything but altix systems.

Comment 11 Don Zickus 2007-07-24 14:43:55 UTC

Ok, this patch looks reasonable (aside from the grammar - have not --> have no).
 Would it be possible to push something like this upstream? Do you think Tony
would go for it?

Comment 12 George Beshers 2007-07-24 15:35:27 UTC

Mildly skeptical.

Mostly as I am guessing he will push back to avoid making a precedent
of not getting the ACPI information correct.

However, let me see if anyone in SGI thinks differently first.

Comment 13 Prarit Bhargava 2007-07-24 15:38:44 UTC

Maybe, maybe not George.  The argument we're making is that SGI doesn't support
MADT tables at all.  So I can't see why Tony would argue with this.  IMO the
code currently doesn't distinguish between "broken" MADT and "not supported MADT
-- so that's a bug.

P.

Comment 14 George Beshers 2007-12-19 23:31:24 UTC

Created attachment 290080 [details]
cleanup ACPI MADT no IOSAPIC warning

Comment 15 Don Zickus 2008-01-03 19:42:03 UTC

posted to rhkl on 12/19

Comment 18 Bryan Stillwell 2008-01-10 22:45:04 UTC

I've seen this message show up on an HP rx3600 when booting a domU in rhel5.1.

Comment 19 Don Zickus 2008-01-11 20:02:01 UTC

in 2.6.18-66.el5dz_test
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 20 Bryan Stillwell 2008-01-11 20:23:58 UTC

Don,

The 2.6.18-66.el5 kernel doesn't fix this problem for me on the rx3600/domU. 
This is probably because the domU isn't an SGI sn2 machine like the patch checks
for with ia64_platform_is("sn2").

Bryan

Comment 21 Don Zickus 2008-01-14 01:26:29 UTC

Bryan this patch was specific against SGI sn2 machines because they don't have
ACPI tables (or not setup correctly).  The point of the patch was to eliminate
false positives.  Now if HP is seeing these errors how do I know they are not
real hardware issues?

Yeah, I know the bugzilla description probably wasn't specific enough to mention
I was really aiming at SGI boxes.  I think Prarit pointed out the HP uses ACPI
corrrectly, such that if a box raised this error then it needs immediate
hardware attention.  Prarit?  George?

-Don

Comment 22 George Beshers 2008-01-14 13:40:46 UTC

On Altix the ACPI tables were extended to handle NUMA
and some SGI specific stuff in ways that did not become
standard.   I have been told that due to resource
limitations there is really no chance that these
are getting fixed.

CC'ing Doug Chapman to speak to HP.
http://2ccnaumb.kdezmasdv.com/Gj1zJCgjjZ/MAlnYmVzaGVyc0Bjb3gubmV0/aaSrNv/2369_6633/q2lTT1

Comment 23 Bryan Stillwell 2008-01-14 17:09:47 UTC

Don,

I don't believe I've ever seen this problem show up on the actual hardware, just
within a Xen domU.  Does Xen even handle ACPI on ia64?

Bryan

Comment 24 Don Zickus 2008-01-14 18:18:36 UTC

Doug, maybe you can help me on this as you are move involved with xen on ia64
than I am. 

-Don

Comment 25 Doug Chapman 2008-01-14 19:36:27 UTC

We do neet ACPI for xen, you can't boot ia64 at all with out ACPI.  However, it
does appear that the ACPI tables in a pv ia64 xen guest do not create IOSAPIC
entries (just like how SGI doesn't create them).  So, perhaps the right way to
go is to modify George's patch:

+		if (ia64_platform_is("sn2"))

becomes

+		if (ia64_platform_is("sn2") || ia64_platform_is("xen"))

Comment 26 Don Zickus 2008-01-14 19:49:31 UTC

Unfortunately, I have already taken George's patch and I would push to see a new
bz for the xen piece.  I have a question though, in the SGI case they knew they
didn't create IOSAPIC entries and had no plans to address it either.  In the xen
case, is this an oversight, a xen bug, or is this intentional by xen-source?

It would be nice to have an explanation for the behavior before committing a
similar patch to SGIs.

-Don

Comment 27 Doug Chapman 2008-01-14 20:13:04 UTC

Don,

I chatted with Alex Williamson back at HP on this (who knows much more about
ia64-xen) and he says that not having ISOAPIC entries in xen was intentional and
he agrees with doing something similar to my suggestion in comment #25.

Since George's patch is already in the kernel I am happy to submit an
incremental patch.  Can I submit a patch against this BZ or would that cause
confusion?  If so I am happy to file another BZ to submit the patch against.

Comment 28 Don Zickus 2008-01-14 20:36:18 UTC

Thanks, Doug.  Post an incremental patch and use this bz. I'll set it back to
POST.  Make sure to explain in the description that xen did this intentionally.

Thanks.
-Don

Comment 29 George Beshers 2008-01-14 21:18:41 UTC

Consider this verified as of -67 on Altix platforms :).

George

Comment 30 Don Zickus 2008-01-21 17:27:13 UTC

in 2.6.18-71.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 32 Bryan Stillwell 2008-01-23 20:21:25 UTC

Don,

The -71.el5 kernel appears to fix the error message on ia64/xen domUs too.

Thanks,
Bryan

Comment 34 errata-xmlrpc 2008-05-21 14:45:12 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.