Bug 681796 - Pass "noefi acpi_rsdp=X" to the second kernel
Pass "noefi acpi_rsdp=X" to the second kernel
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kexec-tools (Show other bugs)
6.0
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Cong Wang
Chao Ye
:
Depends On: 723670
Blocks: 743047
  Show dependency treegraph
 
Reported: 2011-03-03 04:38 EST by Cong Wang
Modified: 2013-09-29 22:22 EDT (History)
7 users (show)

See Also:
Fixed In Version: kexec-tools-2_0_0-202_el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-12-06 13:19:05 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Proposed patch (913 bytes, patch)
2011-08-01 01:06 EDT, Cong Wang
no flags Details | Diff

  None (edit)
Description Cong Wang 2011-03-03 04:38:23 EST
Description of problem:

EFI is not necessary for kdump, but 2nd kernel cannot boot if noefi is
specified because 2nd kernel cannot find RSDP. Here is good explanation.
http://lists.infradead.org/pipermail/kexec/2010-March/003889.html

After introducing a new kernel parameter "acpi_addr=", we will have to
pass a correct RSDP addr to the second kernel additionally.

Expected results:
The second kernel should boot successfully on EFI machines.

Additional info:

if [ -f /sys/firmware/efi/systab ]
then
    if grep -q '^ACPI20=' /sys/firmware/efi/systab
    then
        acpi_addr=$(awk -F'=' '/^ACPI20=/ {print $2}' /sys/firmware/efi/systab)
    else
        acpi_addr=$(awk -F'=' '/^ACPI=/ {print $2}' /sys/firmware/efi/systab)
    fi
    # omit
    # pass "noefi acpi_addr=$acpi_addr" to the second kernel here...
fi
Comment 2 Vivek Goyal 2011-06-14 09:35:30 EDT
Do we still need this fix. I thought that Matthew Garret had agreed to experiment with booting first kernel in physical mode and never transitioning into virtual mode to make sure kexec and kdump works.
Comment 3 Matthew Garrett 2011-06-14 09:46:33 EDT
I spent a while experimenting. The conclusion is that physical mode (both our implementation of it and any other implementation I've been able to come up with) simply doesn't work for some firmware.
Comment 4 Vivek Goyal 2011-06-14 10:46:39 EDT
Should we then take the issue back to vendors and ask them to fix firmware.
Comment 5 Matthew Garrett 2011-06-14 10:57:12 EDT
I don't think there's any way that upstream are going to take any patches to default to physical mode, so continuing to do so ourselves means we'll be the only OS that behaves this way. I don't think firmware vendors are going to consider it a high priority.
Comment 6 Takao Indoh 2011-06-14 11:10:23 EDT
(In reply to comment #3)
> I spent a while experimenting. The conclusion is that physical mode (both our
> implementation of it and any other implementation I've been able to come up
> with) simply doesn't work for some firmware.

As you said, one problem which the current physical-mode patch has is that efi
pagetable has mapping of only EFI area, therefore I'm working on the patch to
map whole memory so that all virtual address is the same as physical
address. But you mean that it does not work on some firmware even if I
map whole memory?
Comment 7 Matthew Garrett 2011-06-14 11:16:03 EDT
Yes, some systems seem to fail in efivars even in that case.
Comment 8 Matthew Garrett 2011-07-05 11:46:43 EDT
I'm about to post a patch to change the kernel default back to virtual mode. What's the status of this bug?
Comment 9 Cong Wang 2011-07-06 02:55:02 EDT
This BZ is for kexec-tools, so we need to push the kernel patch first.
Comment 10 Takao Indoh 2011-07-20 15:18:27 EDT
(In reply to comment #9)
> This BZ is for kexec-tools, so we need to push the kernel patch first.

FYI, I opened bz723670 for kernel patch.
Comment 11 Cong Wang 2011-08-01 01:06:37 EDT
Created attachment 516065 [details]
Proposed patch
Comment 12 Cong Wang 2011-08-01 01:10:54 EDT
Brew build:
https://brewweb.devel.redhat.com/taskinfo?taskID=3528906

Takao, could you help to test the above patch which is included in this build?
Thanks!
Comment 13 Takao Indoh 2011-08-02 17:08:30 EDT
(In reply to comment #12)
> Brew build:
> https://brewweb.devel.redhat.com/taskinfo?taskID=3528906
> 
> Takao, could you help to test the above patch which is included in this build?

I tested using your patch and I found a problem. The system is reset suddenly
during 2nd kernel boot.  Here is the console log.

(snipped)
ftrace: allocating 20699 entries in 82 pages
DMAR: Host address width 44
DMAR: DRHD base: 0x000000fd000000 flags: 0x1
IOMMU fd000000: ver 1:0 cap c90780106f0462 ecap f020fe
DMAR: No RMRR found
DMAR: No ATSR found
IOAPIC id 0 under DRHD base 0xfd000000
Enabled Interrupt-remapping
Setting APIC routing to cluster x2apic
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU           E7540  @ 2.00GHz stepping 06
Performance Events: PEBS fmt1+, Nehalem events, Broken BIOS detected, complain to your hardware vendor.
[Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 53003c)
Intel PMU driver.
... version:                3
... bit width:              48
... generic registers:      4
... value mask:             0000ffffffffffff
... max period:             000000007fffffff
... fixed-purpose events:   3
... event mask:             000000070000000f
NMI watchdog enabled, takes one hw-pmu counter.
Booting Node   0, Processors  #1
(system reset here)

When I changed "nr_cpus=1" to "maxcpus=1", kdump works. So, I think your
acpi_rsdp patch itself is ok, but kdump does not work due to "nr_cpus=1".

What I used is:
kexec-tools: 2.0.0-196.el6.bz681796.x86_64
kernel: 2.6.32-131.0.15.el6.x86_64 with acpi_rsdp patch(bz723670) and
Matthew's three patches[1]

I should use the latest kernel but I cannot because on-site engineer
cannot access RH internal resources, brew, git tree, etc.

[1]
http://post-office.corp.redhat.com/archives/rhkernel-list/2011-July/msg00150.html
http://post-office.corp.redhat.com/archives/rhkernel-list/2011-July/msg00151.html
http://post-office.corp.redhat.com/archives/rhkernel-list/2011-July/msg00152.html
Comment 16 Chao Ye 2011-09-05 02:27:54 EDT
Tested on ibm-x3550m3-02.rhts.eng.nay.redhat.com under EFI boot:
============================================================
[root@ibm-x3550m3-02 ~]# rpm -q kernel kexec-tools
kernel-2.6.32-195.el6.x86_64
kexec-tools-2.0.0-199.el6.x86_64
[root@ibm-x3550m3-02 ~]# grep -v ^# /etc/kdump.conf 

[root@ibm-x3550m3-02 ~]# service kdump restart
Stopping kdump:[  OK  ]
Starting kdump:[  OK  ]
[root@ibm-x3550m3-02 ~]# touch /etc/kdump.conf 
[root@ibm-x3550m3-02 ~]# service kdump restart
Stopping kdump:[  OK  ]
Detected change(s) the following file(s):
  
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-195.el6.x86_64kdump.img
Starting kdump:[  OK  ]
[root@ibm-x3550m3-02 ~]# echo c > /proc/sysrq-trigger
----------------------------------------------------------------------------------------------------------
SysRq : Trigger a crash
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff813250a6>] sysrq_handle_crash+0x16/0x20
PGD 2783e8067 PUD 278175067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 5 
Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 vfat fat bnx2 cdc_ether usbnet mii microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 6154, comm: bash Not tainted 2.6.32-195.el6.x86_64 #1 IBM System x3550 M3 -[7944I21]-/69Y4438     
RIP: 0010:[<ffffffff813250a6>]  [<ffffffff813250a6>] sysrq_handle_crash+0x16/0x20
RSP: 0018:ffff880478141e18  EFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f87
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff880478141e18 R08: ffffffff81c00500 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81afac00 R14: 0000000000000286 R15: 0000000000000007
FS:  00007fce5bb3b700(0000) GS:ffff880287420000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000277565000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 6154, threadinfo ffff880478140000, task ffff880477888080)
Stack:
 ffff880478141e68 ffffffff81325362 ffff880477888080 ffff880200000000
<0> 0000000d80df4018 0000000000000002 ffff880478cadf00 00007fce5bb41000
<0> 0000000000000002 fffffffffffffffb ffff880478141e98 ffffffff8132541e
Call Trace:
 [<ffffffff81325362>] __handle_sysrq+0x132/0x1a0
 [<ffffffff8132541e>] write_sysrq_trigger+0x4e/0x50
 [<ffffffff811da9ae>] proc_reg_write+0x7e/0xc0
 [<ffffffff811760f8>] vfs_write+0xb8/0x1a0
 [<ffffffff810d4602>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81176b01>] sys_write+0x51/0x90
 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
Code: d0 88 81 63 f4 fc 81 c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 ad 21 77 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47 
RIP  [<ffffffff813250a6>] sysrq_handle_crash+0x16/0x20
 RSP <ffff880478141e18>
CR2: 0000000000000000
<===========================Hang


Tested on ibm-x3550m3-02.rhts.eng.nay.redhat.com with patch applyed under EFI boot:
============================================================
[root@ibm-x3550m3-02 ~]# touch /etc/kdump.conf 
[root@ibm-x3550m3-02 ~]# service kdump restart
Stopping kdump:[  OK  ]
Detected change(s) the following file(s):
  
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-195.el6.x86_64kdump.img
Starting kdump:[  OK  ]
[root@ibm-x3550m3-02 ~]# echo c > /proc/sysrq-trigger
----------------------------------------------------------------------------------------------------------

Making device-mapper control node
Scanning logical volumes
  Reading all physical volumes.  This may take a while...
  Found volume group "vg_ibmx3550m302" using metadata type lvm2
Activating logical volumes
  3 logical volume(s) in volume group "vg_ibmx3550m302" now active
Free memory/Total memory (free %): 74212 / 116052 ( 63.9472 )
Saving to the local filesystem /dev/mapper/vg_ibmx3550m302-lv_root
e2fsck 1.41.12 (17-May-2010)
/dev/mapper/vg_ibmx3550m302-lv_root: recovering journal
Clearing orphaned inode 1182921 (uid=0, gid=0, mode=0100600, size=4096)
Clearing orphaned inode 1182898 (uid=0, gid=0, mode=0100600, size=4096)
Clearing orphaned inode 1182897 (uid=0, gid=0, mode=0100600, size=4096)
Clearing orphaned inode 1182838 (uid=0, gid=0, mode=0100600, size=4096)
/dev/mapper/vg_ibmx3550m302-lv_root: clean, 85903/3276800 files, 734376/13107200 blocks
EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: 
Free memory/Total memory (free %): 73296 / 116052 ( 63.1579 )
Loading SELINUX policy
type=1404 audit(1315204028.109:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
type=1403 audit(1315204028.561:3): policy loaded auid=4294967295 ses=4294967295
Copying data                       : [ 75 %] 
<===============================Vmcore saved
Comment 18 Chao Ye 2011-09-19 01:50:35 EDT
Based on comment#13 and comment#16, change status to VERIFIED.
Comment 19 errata-xmlrpc 2011-12-06 13:19:05 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1532.html

Note You need to log in before you can comment on or make changes to this bug.