Bug 451593
Summary: | Multiple outstanding ptc.g instruction support | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Luming Yu <luyu> | ||||
Component: | kernel | Assignee: | Luming Yu <luyu> | ||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 5.3 | CC: | bjorn.helgaas, dchapman, dzickus, fenghua.yu, glen.foster, jane.lv, peterm | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | ia64 | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-01-20 19:58:27 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 454962 | ||||||
Attachments: |
|
Description
Luming Yu
2008-06-16 02:31:33 UTC
Created attachment 309737 [details]
a back port
Do we know if this has passed muster in RH kernel? If so, can we get a devel_ack? This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. HP is also interested in this fix/patch since we have 256-way servers, too. Luming, The patch attached no longer applies to the recent RHEL5 kernels. Can you update it with something that will apply to -103 or -104 so we can test this? Also, if you have a kernel rpm already built can you please point me to it? thanks, - Doug To comment#7, The brew with the patch applies to -105 : http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1436575 Thanks, Luming It sounds like testing this patch only makes sens on Tukwilla systems. I don't have access to any tukwilla systems yet so I can't test it. in kernel-2.6.18-115.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 This bug has been marked for inclusion in the Red Hat Enterprise Linux 5.3 Release Notes. To aid in the development of relevant and accurate release notes, please fill out the "Release Notes" field above with the following 4 pieces of information: Cause: What actions or circumstances cause this bug to present. Consequence: What happens when the bug presents. Fix: What was done to fix the bug. Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') I think there are three patches in RHEL5.3s4 related to this issue, and I don't understand this one: linux-2.6-ia64-set-default-max_purges-1-regardless-of-pal-return.patch This is not upstream. It doesn't follow the coding style (missing spaces in "if"). It adds a bunch of boot-time noise (an extra line for every processor). The extra output seems to be two run-on sentences and contains a typo ("outsanding" instead of "outstanding"): cpu_init: PAL max_purges is overridden to 1 PALO is required for multiple outsanding ptc.g The number of outstanding TLB purges is a system-wide parameter, not a per-processor parameter, so if we print anything, we should print it exactly once, not once per processor. When there's no PALO, I don't see why we should ignore the max_purges value from PAL_VM_SUMMARY. >linux-2.6-ia64-set-default-max_purges-1-regardless-of-pal-return.patch
This patch is to address the concern of regression that could be introduced by the "Multiple outstanding ptc.g" patch.
"Comparing with old kernel , the patch would not cause any behavior changes on old platforms as to ptc.g instruction support.
Because, by default, the patch uses field max_purges in vm_info_2 from PAL : PAL_VM_SUMMARY.(please check kernel code cpu_init and Figure 11-74. Layout of vm_info_2 Return value in IA64 Software Developer's Manual vol. 2). On any platforms without multiple outstanding ptc.g instruction support , PAL_VM_SUMMARY should return 1 for max_purges .
The only chance that the patch could introduce regression on old platforms is that PAL_VM_SUMMARY doesn't return 1. Frankly speaking, we couldn't call this case "regression". Because it is broken firmware problem that we can just workaround it "with nptcg=1" or ask for hardware support."
I understand you don't want to suddenly start doing multiple outstanding purges on platforms in the field where that hasn't been tested. I think you should propose that for the upstream kernel, too, so this doesn't remain a RHEL-only quirk. I don't think there's any point in even checking the max_purges returned from PAL_VM_SUMMARY. You might just as well do this: if (ia64_pal_vm_summary(NULL, &vmi) == 0) { max_ctx = (1U << (vmi.pal_vm_info_2_s.rid_size - 3)) - 1; /* * Multiple outstanding purges haven't been validated on * already-shipping platforms, so it's too risky to enable * them based on PAL. We will enable them if a PALO table * exists. */ setup_ptcg_sem(1, NPTCG_FROM_PAL); ... That's much easier to understand. If you really want to print something, I think it'd be reasonable to print it *once*. I'm strongly opposed to the current behavior, which adds a whole bunch of useless and confusing console output on large systems where nothing is wrong. When a PALO exists, I think we also should print the max_purges in it. I agree that upstream needs similar patch. the good point of the current patch is that we don't see any error messages if "vmi.pal_vm_info_2_s.max_purges == 1" that is what old platform should return from their PAL calls. > the good point of the current patch is that we don't see any error
> messages if "vmi.pal_vm_info_2_s.max_purges == 1" that is what old
> platform should return from their PAL calls.
This is not true. I see this error message on a shipping BL860c.
PAL_VM_SUMMARY returns max_purges == 0, which is legal per the spec.
There is nothing useful the user can do as a result of this message.
I think we need:
1) A message when we parse a PALO table, saying how many outstanding
purges we will support, and
2) A message if there's no PALO but PAL_VM_SUMMARY returns
max_purges > 1. In this case, the platform claims to support
multiple purges, but we avoid them because they haven't been tested,
and we should tell the user we turned off this functionality.
You are right,I seem to miss this point : 0 is legal, and it means max_purges=1. (IA64 SPEC says so.) Yes, we should avoid those message when PAL return 0 here. We probably also need to provide info to user in dmesg as you said above. But I'm afraid we can do anything right now for 5.3. Although there is defect in dmesg, linux-2.6-ia64-set-default-max_purges-1-regardless-of-pal-return.patch still is good at preventing any possible functional regression.. Probably, we can polish dmesg in 5.4... An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html *** Bug 474321 has been marked as a duplicate of this bug. *** |