Bug 503048
Summary: | LS21 do not boot RT enabled kernels (not APIC issue) - ibm-ls21-7972-01.rhts.bos.redhat.com | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | David Sommerseth <davids> | ||||||||
Component: | realtime-kernel | Assignee: | Clark Williams <williams> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | David Sommerseth <davids> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 1.1 | CC: | bhu, lgoncalv, ovasik, pzijlstr, sassmann | ||||||||
Target Milestone: | 1.1.5 | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2009-07-14 19:12:08 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
David Sommerseth
2009-05-28 14:44:02 UTC
Created attachment 345780 [details]
Attempt to fix free_bootmem for ls21
void __init setup_node_bootmem()
...
printk(KERN_INFO "Bootmem setup node %d %016lx-%016lx\n", nodeid, start, end);
...
memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t));void __init
...
}
And:
free_bootmem(unsigned long addr, unsigned long size)
{
free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
}
As the log has the message:
Bootmem setup node 1 0000000000000000-000000007ffa5000
And given the code we try to act on NODE_DATA(0) even though what has been set was NODE_DATA(1), this code shoul fix the issue.
Created attachment 345803 [details]
Fixed version of the last patch
There was a missing hunk in the earlier patch. I am right now compiling a kernel to test this patch.
Created attachment 345835 [details]
Force free_bootmem to use the current node
Force free_bootmem to use the current node
David Sommerseth has noticed that a ls21 (7972) machine was not booting with
MRG RT V1 kernel 2.6.24.7-117.el5rt. He was able to get a kernel backtrace:
Bootmem setup node 1 0000000000000000-000000007ffa5000
PANIC: early exception rip ffffffff814ba27d error 0 cr2 73c8
Pid: 0, comm: swapper Not tainted 2.6.24.7-117.el5rt #1
Call Trace:
[<ffffffff814ba27d>] ? free_bootmem+0x14/0x22
[<ffffffff814bc584>] ? sparse_init+0x11d/0x196
[<ffffffff814b6178>] ? paging_init+0x41/0x94
[<ffffffff814acdb9>] ? setup_arch+0x471/0x4e4
[<ffffffff814a68c0>] ? start_kernel+0x76/0x329
[<ffffffff814a6119>] ? _sinittext+0x119/0x120
RIP free_bootmem+0x14/0x22
According to setup_node_bootmem(), the message "Bootmem setup node 1 ..." tells
us that the process is happening on node 1:
printk(KERN_INFO "Bootmem setup node %d %016lx-%016lx\n", nodeid, ...);
...
memset(NODE_DATA(nodeid), 0, sizeof(pg_data_t));void __init
But when free_bootmem is called it has node 0 hardcoded:
free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
Leading to the early panic scenario observed by David.
This is a simple fix, forcing free_bootmem() to act over the current node.
---
Added to kernel -118 for testing. In RHTS job 70101 (http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=70101) the problematic LS21 blade (ibm-ls21-7972-01.rhts.bos.redhat.com) has booted and ran 3 different -122 based kernels. RHTS job 70720 (http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=70720) the same box booted and tested 2 kernels of the -126 based kernels. Found attached patch as mrg-rt.git commit 16314b176e5ba7d2a8769c4a0ed8b89e4bd6813d in the 2.6.24.7-126 SRPM. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1157.html |