Bug 681796
Summary: | Pass "noefi acpi_rsdp=X" to the second kernel | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Cong Wang <amwang> | ||||
Component: | kexec-tools | Assignee: | Cong Wang <amwang> | ||||
Status: | CLOSED ERRATA | QA Contact: | Chao Ye <cye> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.0 | CC: | jfeeney, phan, qcai, rkhan, tindoh, tmuneda, vgoyal | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | kexec-tools-2_0_0-202_el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-12-06 18:19:05 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 723670 | ||||||
Bug Blocks: | 743047 | ||||||
Attachments: |
|
Description
Cong Wang
2011-03-03 09:38:23 UTC
Do we still need this fix. I thought that Matthew Garret had agreed to experiment with booting first kernel in physical mode and never transitioning into virtual mode to make sure kexec and kdump works. I spent a while experimenting. The conclusion is that physical mode (both our implementation of it and any other implementation I've been able to come up with) simply doesn't work for some firmware. Should we then take the issue back to vendors and ask them to fix firmware. I don't think there's any way that upstream are going to take any patches to default to physical mode, so continuing to do so ourselves means we'll be the only OS that behaves this way. I don't think firmware vendors are going to consider it a high priority. (In reply to comment #3) > I spent a while experimenting. The conclusion is that physical mode (both our > implementation of it and any other implementation I've been able to come up > with) simply doesn't work for some firmware. As you said, one problem which the current physical-mode patch has is that efi pagetable has mapping of only EFI area, therefore I'm working on the patch to map whole memory so that all virtual address is the same as physical address. But you mean that it does not work on some firmware even if I map whole memory? Yes, some systems seem to fail in efivars even in that case. I'm about to post a patch to change the kernel default back to virtual mode. What's the status of this bug? This BZ is for kexec-tools, so we need to push the kernel patch first. (In reply to comment #9) > This BZ is for kexec-tools, so we need to push the kernel patch first. FYI, I opened bz723670 for kernel patch. Created attachment 516065 [details]
Proposed patch
Brew build: https://brewweb.devel.redhat.com/taskinfo?taskID=3528906 Takao, could you help to test the above patch which is included in this build? Thanks! (In reply to comment #12) > Brew build: > https://brewweb.devel.redhat.com/taskinfo?taskID=3528906 > > Takao, could you help to test the above patch which is included in this build? I tested using your patch and I found a problem. The system is reset suddenly during 2nd kernel boot. Here is the console log. (snipped) ftrace: allocating 20699 entries in 82 pages DMAR: Host address width 44 DMAR: DRHD base: 0x000000fd000000 flags: 0x1 IOMMU fd000000: ver 1:0 cap c90780106f0462 ecap f020fe DMAR: No RMRR found DMAR: No ATSR found IOAPIC id 0 under DRHD base 0xfd000000 Enabled Interrupt-remapping Setting APIC routing to cluster x2apic ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel(R) Xeon(R) CPU E7540 @ 2.00GHz stepping 06 Performance Events: PEBS fmt1+, Nehalem events, Broken BIOS detected, complain to your hardware vendor. [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 53003c) Intel PMU driver. ... version: 3 ... bit width: 48 ... generic registers: 4 ... value mask: 0000ffffffffffff ... max period: 000000007fffffff ... fixed-purpose events: 3 ... event mask: 000000070000000f NMI watchdog enabled, takes one hw-pmu counter. Booting Node 0, Processors #1 (system reset here) When I changed "nr_cpus=1" to "maxcpus=1", kdump works. So, I think your acpi_rsdp patch itself is ok, but kdump does not work due to "nr_cpus=1". What I used is: kexec-tools: 2.0.0-196.el6.bz681796.x86_64 kernel: 2.6.32-131.0.15.el6.x86_64 with acpi_rsdp patch(bz723670) and Matthew's three patches[1] I should use the latest kernel but I cannot because on-site engineer cannot access RH internal resources, brew, git tree, etc. [1] http://post-office.corp.redhat.com/archives/rhkernel-list/2011-July/msg00150.html http://post-office.corp.redhat.com/archives/rhkernel-list/2011-July/msg00151.html http://post-office.corp.redhat.com/archives/rhkernel-list/2011-July/msg00152.html Tested on ibm-x3550m3-02.rhts.eng.nay.redhat.com under EFI boot: ============================================================ [root@ibm-x3550m3-02 ~]# rpm -q kernel kexec-tools kernel-2.6.32-195.el6.x86_64 kexec-tools-2.0.0-199.el6.x86_64 [root@ibm-x3550m3-02 ~]# grep -v ^# /etc/kdump.conf [root@ibm-x3550m3-02 ~]# service kdump restart Stopping kdump:[ OK ] Starting kdump:[ OK ] [root@ibm-x3550m3-02 ~]# touch /etc/kdump.conf [root@ibm-x3550m3-02 ~]# service kdump restart Stopping kdump:[ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-2.6.32-195.el6.x86_64kdump.img Starting kdump:[ OK ] [root@ibm-x3550m3-02 ~]# echo c > /proc/sysrq-trigger ---------------------------------------------------------------------------------------------------------- SysRq : Trigger a crash BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff813250a6>] sysrq_handle_crash+0x16/0x20 PGD 2783e8067 PUD 278175067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map CPU 5 Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 vfat fat bnx2 cdc_ether usbnet mii microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 6154, comm: bash Not tainted 2.6.32-195.el6.x86_64 #1 IBM System x3550 M3 -[7944I21]-/69Y4438 RIP: 0010:[<ffffffff813250a6>] [<ffffffff813250a6>] sysrq_handle_crash+0x16/0x20 RSP: 0018:ffff880478141e18 EFLAGS: 00010096 RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f87 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 RBP: ffff880478141e18 R08: ffffffff81c00500 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff81afac00 R14: 0000000000000286 R15: 0000000000000007 FS: 00007fce5bb3b700(0000) GS:ffff880287420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000277565000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 6154, threadinfo ffff880478140000, task ffff880477888080) Stack: ffff880478141e68 ffffffff81325362 ffff880477888080 ffff880200000000 <0> 0000000d80df4018 0000000000000002 ffff880478cadf00 00007fce5bb41000 <0> 0000000000000002 fffffffffffffffb ffff880478141e98 ffffffff8132541e Call Trace: [<ffffffff81325362>] __handle_sysrq+0x132/0x1a0 [<ffffffff8132541e>] write_sysrq_trigger+0x4e/0x50 [<ffffffff811da9ae>] proc_reg_write+0x7e/0xc0 [<ffffffff811760f8>] vfs_write+0xb8/0x1a0 [<ffffffff810d4602>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81176b01>] sys_write+0x51/0x90 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b Code: d0 88 81 63 f4 fc 81 c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 ad 21 77 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47 RIP [<ffffffff813250a6>] sysrq_handle_crash+0x16/0x20 RSP <ffff880478141e18> CR2: 0000000000000000 <===========================Hang Tested on ibm-x3550m3-02.rhts.eng.nay.redhat.com with patch applyed under EFI boot: ============================================================ [root@ibm-x3550m3-02 ~]# touch /etc/kdump.conf [root@ibm-x3550m3-02 ~]# service kdump restart Stopping kdump:[ OK ] Detected change(s) the following file(s): /etc/kdump.conf Rebuilding /boot/initrd-2.6.32-195.el6.x86_64kdump.img Starting kdump:[ OK ] [root@ibm-x3550m3-02 ~]# echo c > /proc/sysrq-trigger ---------------------------------------------------------------------------------------------------------- Making device-mapper control node Scanning logical volumes Reading all physical volumes. This may take a while... Found volume group "vg_ibmx3550m302" using metadata type lvm2 Activating logical volumes 3 logical volume(s) in volume group "vg_ibmx3550m302" now active Free memory/Total memory (free %): 74212 / 116052 ( 63.9472 ) Saving to the local filesystem /dev/mapper/vg_ibmx3550m302-lv_root e2fsck 1.41.12 (17-May-2010) /dev/mapper/vg_ibmx3550m302-lv_root: recovering journal Clearing orphaned inode 1182921 (uid=0, gid=0, mode=0100600, size=4096) Clearing orphaned inode 1182898 (uid=0, gid=0, mode=0100600, size=4096) Clearing orphaned inode 1182897 (uid=0, gid=0, mode=0100600, size=4096) Clearing orphaned inode 1182838 (uid=0, gid=0, mode=0100600, size=4096) /dev/mapper/vg_ibmx3550m302-lv_root: clean, 85903/3276800 files, 734376/13107200 blocks EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: Free memory/Total memory (free %): 73296 / 116052 ( 63.1579 ) Loading SELINUX policy type=1404 audit(1315204028.109:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 type=1403 audit(1315204028.561:3): policy loaded auid=4294967295 ses=4294967295 Copying data : [ 75 %] <===============================Vmcore saved Based on comment#13 and comment#16, change status to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1532.html |