Bug 246130 - [rhts] ia64 acpi complains of errors
[rhts] ia64 acpi complains of errors
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Prarit Bhargava
Martin Jenner
http://rhts.lab.boston.redhat.com/cgi...
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-28 14:23 EDT by Don Zickus
Modified: 2008-05-21 10:45 EDT (History)
5 users (show)

See Also:
Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 10:45:12 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch against kernel-2.6.18-36.el5.ia64 (869 bytes, patch)
2007-07-24 10:35 EDT, George Beshers
no flags Details | Diff
cleanup ACPI MADT no IOSAPIC warning (1.25 KB, patch)
2007-12-19 18:31 EST, George Beshers
no flags Details | Diff

  None (edit)
Description Don Zickus 2007-06-28 14:23:08 EDT
Description of problem:
In my quest to rid the boot up of warnings, failures, and errors, I have run
into an acpi issue on ia64

ACPI: Error parsing MADT - no IOSAPIC entries

Version-Release number of selected component (if applicable):
kernel-2.6.18-32.el5

How reproducible:
boot the kernel on altix3

See the url above to the full dmesg log

I would like to either fix this error message or tone it down to something
meaningful to prevent customer support issues from cropping up.
Comment 1 Prarit Bhargava 2007-06-28 14:51:31 EDT
IIRC, this is an error that can be resolved by incorporating John Keller's
upstream ACPI work.

I've pinged jpk to see if knows what the proper solution is...

P.
Comment 2 Prarit Bhargava 2007-07-02 09:56:24 EDT
Just emailed back-and-forth with jpk.  There is no fix for this particular issue.

P.
Comment 3 Don Zickus 2007-07-17 15:50:11 EDT
Is there a particular reason why this can not be fixed?  Having "Error" pop up
during boot up might be cause for customer support calls and as such Jeff and I
have been trying to rid all boot up messages that say 'Error, Fail, Warning'.

If this is not a 'true' error, can we tone down the language ie

ACPI: parsing MADT - no IOSAPIC entries detected (skipping)

or something without the word Error if this is an acceptable path a production
machine can take?

I know this is a complete nitpick and only distros care about stuff like this,
but people have to be aware of what might happen when they start flinging these
key words around on those 'not in the know'
Comment 4 Prarit Bhargava 2007-07-17 19:31:17 EDT
dzickus, the problem then becomes diagnosing systems which actually do have this
as an error (think the xw4550 in my cube) and which won't boot without acpi=off.

P.
Comment 5 Prarit Bhargava 2007-07-17 19:33:15 EDT
>I know this is a complete nitpick and only distros care about stuff like this,
>but people have to be aware of what might happen when they start flinging these
>key words around on those 'not in the know'

That's why we have Knowledge Base articles & Release Notes :)

P.
Comment 6 George Beshers 2007-07-18 09:55:34 EDT
There is a PV (SGIism for BZ) open on this and I had tried to
get some action on it when Don first raised the question.  This
is the response I got:

> 
> 
> RedHat was asking about this.  If there is no planned upgrade
> to the PROM perhaps I should just submit a patch to cleanup
> the error message upstream?
> 

  AFAIK, there is no planned upgrade or fix that will
  get rid of this error message. We can't make use of the
  the IOSAPIC entries in the MADT on SN.

  Cleanup patch is probably the correct thing to do...
Comment 7 Don Zickus 2007-07-18 10:03:50 EDT
In regards to comment #4,

Are the ones failing pre-production hardware? or is post-production?  How are we
supposed to distinguish b/w legitimate error and a false positive?

*sigh* I really don't want to have to convince Jeff to create blacklist entries
for these errors in his tests.  At the same time I don't want to brush over this
error when it is real.  

Would it be better to use acpi=off on these test machines to remove the error?
Comment 8 George Beshers 2007-07-18 10:59:05 EDT
The SGI machines are post-production, RedHat certified systems.

The only thing I can suggest is a test for the box being an sn
(hence SGI) system before the error message is printed.

Prarit is likely to have a clearer understanding of this code
than I do, but I will take a look this afternoon.
Comment 9 Prarit Bhargava 2007-07-18 11:01:22 EDT
> Would it be better to use acpi=off on these test machines to remove the error?

No, it would not.  This error is pointing out that one table is bad/doesn't
exist.  The other tables are okay to use.

P.
Comment 10 George Beshers 2007-07-24 10:35:33 EDT
Created attachment 159852 [details]
patch against kernel-2.6.18-36.el5.ia64

As you can see it is really straight forward.

I have not tested it on anything but altix systems.
Comment 11 Don Zickus 2007-07-24 10:43:55 EDT
Ok, this patch looks reasonable (aside from the grammar - have not --> have no).
 Would it be possible to push something like this upstream? Do you think Tony
would go for it?
Comment 12 George Beshers 2007-07-24 11:35:27 EDT
Mildly skeptical.

Mostly as I am guessing he will push back to avoid making a precedent
of not getting the ACPI information correct.

However, let me see if anyone in SGI thinks differently first.
Comment 13 Prarit Bhargava 2007-07-24 11:38:44 EDT
Maybe, maybe not George.  The argument we're making is that SGI doesn't support
MADT tables at all.  So I can't see why Tony would argue with this.  IMO the
code currently doesn't distinguish between "broken" MADT and "not supported MADT
-- so that's a bug.

P.
Comment 14 George Beshers 2007-12-19 18:31:24 EST
Created attachment 290080 [details]
cleanup ACPI MADT no IOSAPIC warning
Comment 15 Don Zickus 2008-01-03 14:42:03 EST
posted to rhkl on 12/19
Comment 18 Bryan Stillwell 2008-01-10 17:45:04 EST
I've seen this message show up on an HP rx3600 when booting a domU in rhel5.1.
Comment 19 Don Zickus 2008-01-11 15:02:01 EST
in 2.6.18-66.el5dz_test
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 20 Bryan Stillwell 2008-01-11 15:23:58 EST
Don,

The 2.6.18-66.el5 kernel doesn't fix this problem for me on the rx3600/domU. 
This is probably because the domU isn't an SGI sn2 machine like the patch checks
for with ia64_platform_is("sn2").

Bryan
Comment 21 Don Zickus 2008-01-13 20:26:29 EST
Bryan this patch was specific against SGI sn2 machines because they don't have
ACPI tables (or not setup correctly).  The point of the patch was to eliminate
false positives.  Now if HP is seeing these errors how do I know they are not
real hardware issues?

Yeah, I know the bugzilla description probably wasn't specific enough to mention
I was really aiming at SGI boxes.  I think Prarit pointed out the HP uses ACPI
corrrectly, such that if a box raised this error then it needs immediate
hardware attention.  Prarit?  George?

-Don
Comment 22 George Beshers 2008-01-14 08:40:46 EST
On Altix the ACPI tables were extended to handle NUMA
and some SGI specific stuff in ways that did not become
standard.   I have been told that due to resource
limitations there is really no chance that these
are getting fixed.

CC'ing Doug Chapman to speak to HP.
http://2ccnaumb.kdezmasdv.com/Gj1zJCgjjZ/MAlnYmVzaGVyc0Bjb3gubmV0/aaSrNv/2369_6633/q2lTT1
Comment 23 Bryan Stillwell 2008-01-14 12:09:47 EST
Don,

I don't believe I've ever seen this problem show up on the actual hardware, just
within a Xen domU.  Does Xen even handle ACPI on ia64?

Bryan
Comment 24 Don Zickus 2008-01-14 13:18:36 EST
Doug, maybe you can help me on this as you are move involved with xen on ia64
than I am. 

-Don
Comment 25 Doug Chapman 2008-01-14 14:36:27 EST
We do neet ACPI for xen, you can't boot ia64 at all with out ACPI.  However, it
does appear that the ACPI tables in a pv ia64 xen guest do not create IOSAPIC
entries (just like how SGI doesn't create them).  So, perhaps the right way to
go is to modify George's patch:

+		if (ia64_platform_is("sn2"))

becomes

+		if (ia64_platform_is("sn2") || ia64_platform_is("xen"))
Comment 26 Don Zickus 2008-01-14 14:49:31 EST
Unfortunately, I have already taken George's patch and I would push to see a new
bz for the xen piece.  I have a question though, in the SGI case they knew they
didn't create IOSAPIC entries and had no plans to address it either.  In the xen
case, is this an oversight, a xen bug, or is this intentional by xen-source?

It would be nice to have an explanation for the behavior before committing a
similar patch to SGIs.

-Don
Comment 27 Doug Chapman 2008-01-14 15:13:04 EST
Don,

I chatted with Alex Williamson back at HP on this (who knows much more about
ia64-xen) and he says that not having ISOAPIC entries in xen was intentional and
he agrees with doing something similar to my suggestion in comment #25.

Since George's patch is already in the kernel I am happy to submit an
incremental patch.  Can I submit a patch against this BZ or would that cause
confusion?  If so I am happy to file another BZ to submit the patch against.

Comment 28 Don Zickus 2008-01-14 15:36:18 EST
Thanks, Doug.  Post an incremental patch and use this bz. I'll set it back to
POST.  Make sure to explain in the description that xen did this intentionally.

Thanks.
-Don
Comment 29 George Beshers 2008-01-14 16:18:41 EST
Consider this verified as of -67 on Altix platforms :).

George
Comment 30 Don Zickus 2008-01-21 12:27:13 EST
in 2.6.18-71.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 32 Bryan Stillwell 2008-01-23 15:21:25 EST
Don,

The -71.el5 kernel appears to fix the error message on ia64/xen domUs too.

Thanks,
Bryan
Comment 34 errata-xmlrpc 2008-05-21 10:45:12 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.