Description of problem: In my quest to rid the boot up of warnings, failures, and errors, I have run into an acpi issue on ia64 ACPI: Error parsing MADT - no IOSAPIC entries Version-Release number of selected component (if applicable): kernel-2.6.18-32.el5 How reproducible: boot the kernel on altix3 See the url above to the full dmesg log I would like to either fix this error message or tone it down to something meaningful to prevent customer support issues from cropping up.
IIRC, this is an error that can be resolved by incorporating John Keller's upstream ACPI work. I've pinged jpk to see if knows what the proper solution is... P.
Just emailed back-and-forth with jpk. There is no fix for this particular issue. P.
Is there a particular reason why this can not be fixed? Having "Error" pop up during boot up might be cause for customer support calls and as such Jeff and I have been trying to rid all boot up messages that say 'Error, Fail, Warning'. If this is not a 'true' error, can we tone down the language ie ACPI: parsing MADT - no IOSAPIC entries detected (skipping) or something without the word Error if this is an acceptable path a production machine can take? I know this is a complete nitpick and only distros care about stuff like this, but people have to be aware of what might happen when they start flinging these key words around on those 'not in the know'
dzickus, the problem then becomes diagnosing systems which actually do have this as an error (think the xw4550 in my cube) and which won't boot without acpi=off. P.
>I know this is a complete nitpick and only distros care about stuff like this, >but people have to be aware of what might happen when they start flinging these >key words around on those 'not in the know' That's why we have Knowledge Base articles & Release Notes :) P.
There is a PV (SGIism for BZ) open on this and I had tried to get some action on it when Don first raised the question. This is the response I got: > > > RedHat was asking about this. If there is no planned upgrade > to the PROM perhaps I should just submit a patch to cleanup > the error message upstream? > AFAIK, there is no planned upgrade or fix that will get rid of this error message. We can't make use of the the IOSAPIC entries in the MADT on SN. Cleanup patch is probably the correct thing to do...
In regards to comment #4, Are the ones failing pre-production hardware? or is post-production? How are we supposed to distinguish b/w legitimate error and a false positive? *sigh* I really don't want to have to convince Jeff to create blacklist entries for these errors in his tests. At the same time I don't want to brush over this error when it is real. Would it be better to use acpi=off on these test machines to remove the error?
The SGI machines are post-production, RedHat certified systems. The only thing I can suggest is a test for the box being an sn (hence SGI) system before the error message is printed. Prarit is likely to have a clearer understanding of this code than I do, but I will take a look this afternoon.
> Would it be better to use acpi=off on these test machines to remove the error? No, it would not. This error is pointing out that one table is bad/doesn't exist. The other tables are okay to use. P.
Created attachment 159852 [details] patch against kernel-2.6.18-36.el5.ia64 As you can see it is really straight forward. I have not tested it on anything but altix systems.
Ok, this patch looks reasonable (aside from the grammar - have not --> have no). Would it be possible to push something like this upstream? Do you think Tony would go for it?
Mildly skeptical. Mostly as I am guessing he will push back to avoid making a precedent of not getting the ACPI information correct. However, let me see if anyone in SGI thinks differently first.
Maybe, maybe not George. The argument we're making is that SGI doesn't support MADT tables at all. So I can't see why Tony would argue with this. IMO the code currently doesn't distinguish between "broken" MADT and "not supported MADT -- so that's a bug. P.
Created attachment 290080 [details] cleanup ACPI MADT no IOSAPIC warning
posted to rhkl on 12/19
I've seen this message show up on an HP rx3600 when booting a domU in rhel5.1.
in 2.6.18-66.el5dz_test You can download this test kernel from http://people.redhat.com/dzickus/el5
Don, The 2.6.18-66.el5 kernel doesn't fix this problem for me on the rx3600/domU. This is probably because the domU isn't an SGI sn2 machine like the patch checks for with ia64_platform_is("sn2"). Bryan
Bryan this patch was specific against SGI sn2 machines because they don't have ACPI tables (or not setup correctly). The point of the patch was to eliminate false positives. Now if HP is seeing these errors how do I know they are not real hardware issues? Yeah, I know the bugzilla description probably wasn't specific enough to mention I was really aiming at SGI boxes. I think Prarit pointed out the HP uses ACPI corrrectly, such that if a box raised this error then it needs immediate hardware attention. Prarit? George? -Don
On Altix the ACPI tables were extended to handle NUMA and some SGI specific stuff in ways that did not become standard. I have been told that due to resource limitations there is really no chance that these are getting fixed. CC'ing Doug Chapman to speak to HP. http://2ccnaumb.kdezmasdv.com/Gj1zJCgjjZ/MAlnYmVzaGVyc0Bjb3gubmV0/aaSrNv/2369_6633/q2lTT1
Don, I don't believe I've ever seen this problem show up on the actual hardware, just within a Xen domU. Does Xen even handle ACPI on ia64? Bryan
Doug, maybe you can help me on this as you are move involved with xen on ia64 than I am. -Don
We do neet ACPI for xen, you can't boot ia64 at all with out ACPI. However, it does appear that the ACPI tables in a pv ia64 xen guest do not create IOSAPIC entries (just like how SGI doesn't create them). So, perhaps the right way to go is to modify George's patch: + if (ia64_platform_is("sn2")) becomes + if (ia64_platform_is("sn2") || ia64_platform_is("xen"))
Unfortunately, I have already taken George's patch and I would push to see a new bz for the xen piece. I have a question though, in the SGI case they knew they didn't create IOSAPIC entries and had no plans to address it either. In the xen case, is this an oversight, a xen bug, or is this intentional by xen-source? It would be nice to have an explanation for the behavior before committing a similar patch to SGIs. -Don
Don, I chatted with Alex Williamson back at HP on this (who knows much more about ia64-xen) and he says that not having ISOAPIC entries in xen was intentional and he agrees with doing something similar to my suggestion in comment #25. Since George's patch is already in the kernel I am happy to submit an incremental patch. Can I submit a patch against this BZ or would that cause confusion? If so I am happy to file another BZ to submit the patch against.
Thanks, Doug. Post an incremental patch and use this bz. I'll set it back to POST. Make sure to explain in the description that xen did this intentionally. Thanks. -Don
Consider this verified as of -67 on Altix platforms :). George
in 2.6.18-71.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Don, The -71.el5 kernel appears to fix the error message on ia64/xen domUs too. Thanks, Bryan
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html