Bug 668825
Summary: | Server cannot boot with kernel-2.6.32-85 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Yvugenfi <yvugenfi> | ||||||||||||||
Component: | kernel | Assignee: | bob picco <bpicco> | ||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Zhang Kexin <kzhang> | ||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||
Priority: | medium | ||||||||||||||||
Version: | 6.3 | CC: | arozansk, bpicco, dzickus, kzhang, moshiro, mst, mzywusko, pbunyan, peterm, yugzhang | ||||||||||||||
Target Milestone: | rc | Flags: | bpicco:
needinfo-
|
||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | ptam | ||||||||||||||||
Fixed In Version: | kernel-2.6.32-112.el6 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2011-05-23 20:35:40 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Description
Yvugenfi@redhat.com
2011-01-11 17:52:46 UTC
Created attachment 472861 [details]
sosreport output
Hi Yan, Is there any console output? A sosreport won't show me what is going with the hang. Cheers, Don Created attachment 472906 [details]
Boot output - first screen shot
Created attachment 472907 [details]
Boot output - second screen shot
Created attachment 472908 [details]
Boot output with "nousb" - first screen shot
Created attachment 472909 [details]
Boot output with "nousb" - second screen shot
In the page2_nousb.JPG, the following message can be seen. EFI Variables Facility v0.08 2004-May-17 BUG: unable to handle kernel paging request at 00000000ffc0004c. I guess that efivars_init(drivers/firmware/efivars.c) tried to access EFI variable via runtime service(get_next_variable) and it failed. The address 0xffc0004c seems to be physical address. In efi physical mode, only efi runtime code/data can be accessed with physical address. According to var/log/dmesg in sosreport, the following addresses are runtime code/data. 0xffc0004c is out of these ranges hence page fault occurred. [runtime service code] range=[0x000000007d5e6000-0x000000007d604000) (0MB) range=[0x000000007d641000-0x000000007d65f000) (0MB) range=[0x000000007f60c000-0x000000007f614000) (0MB) range=[0x000000007f63f000-0x000000007f68f000) (0MB) [runtime service data] range=[0x000000007f5ef000-0x000000007f601000) (0MB) range=[0x000000007f601000-0x000000007f60c000) (0MB) range=[0x000000007f614000-0x000000007f63f000) (0MB) Anybody knows what address 0xffc0004c is? Yan, could you get /proc/iomem in this server? 00000000-00000fff : reserved 00001000-0006bfff : System RAM 0006c000-0006cfff : ACPI Non-volatile Storage 0006d000-0009efff : System RAM 0009f000-0009ffff : ACPI Non-volatile Storage 00100000-7d5e5fff : System RAM 01000000-014cda67 : Kernel code 014cda68-01ba586f : Kernel data 01ce3000-01f9e077 : Kernel bss 02000000-0a0fffff : Crash kernel 7d5e6000-7d603fff : reserved 7d604000-7d640fff : System RAM 7d641000-7d65efff : reserved 7d65f000-7d7dafff : System RAM 7d7db000-7d88afff : reserved 7d88b000-7f5eefff : System RAM 7f5ef000-7f6defff : reserved 7f6df000-7f7defff : ACPI Non-volatile Storage 7f7df000-7f7fefff : ACPI Tables 7f7ff000-7f7fffff : System RAM 7f800000-7fffffff : RAM buffer 80000000-8fffffff : PCI MMCONFIG 0 [00-ff] 80000000-8fffffff : reserved 80000000-8fffffff : pnp 00:0a 90000000-901fffff : PCI Bus 0000:0b 90000000-901fffff : 0000:0b:00.0 92000000-95ffffff : PCI Bus 0000:10 92000000-93ffffff : 0000:10:00.0 92000000-93ffffff : bnx2 94000000-95ffffff : 0000:10:00.1 94000000-95ffffff : bnx2 96000000-96ffffff : PCI Bus 0000:06 96000000-96ffffff : PCI Bus 0000:07 96000000-96ffffff : 0000:07:00.0 96000000-965fffff : efifb 97000000-978fffff : PCI Bus 0000:06 97000000-978fffff : PCI Bus 0000:07 97000000-977fffff : 0000:07:00.0 97800000-97803fff : 0000:07:00.0 97900000-979fffff : PCI Bus 0000:0b 97900000-9790ffff : 0000:0b:00.0 97900000-9790ffff : mpt 97910000-97913fff : 0000:0b:00.0 97910000-97913fff : mpt 97a00000-97a03fff : 0000:00:16.0 97a00000-97a03fff : ioatdma 97a04000-97a07fff : 0000:00:16.1 97a04000-97a07fff : ioatdma 97a08000-97a0bfff : 0000:00:16.2 97a08000-97a0bfff : ioatdma 97a0c000-97a0ffff : 0000:00:16.3 97a0c000-97a0ffff : ioatdma 97a10000-97a13fff : 0000:00:16.4 97a10000-97a13fff : ioatdma 97a14000-97a17fff : 0000:00:16.5 97a14000-97a17fff : ioatdma 97a18000-97a1bfff : 0000:00:16.6 97a18000-97a1bfff : ioatdma 97a1c000-97a1ffff : 0000:00:16.7 97a1c000-97a1ffff : ioatdma 97a21000-97a213ff : 0000:00:1d.7 97a21000-97a213ff : ehci_hcd 97a21400-97a217ff : 0000:00:1a.7 97a21400-97a217ff : ehci_hcd 97a21800-97a218ff : 0000:00:1f.3 fc000000-fcffffff : pnp 00:0a fe710000-fe711fff : pnp 00:0a fe800000-fe9fffff : pnp 00:0a fea00000-feafffff : pnp 00:0a feb00000-febfffff : pnp 00:0a fec00000-fec00fff : IOAPIC 0 fec80000-fec80fff : IOAPIC 1 fed00000-fed003ff : HPET 0 fed00000-fed003ff : pnp 00:05 fed1c000-fed1ffff : reserved fed1c000-fed1ffff : pnp 00:0a fee00000-feefffff : pnp 00:0a fee00000-fee00fff : Local APIC ff800000-ffffffff : reserved ffc00000-ffffffff : pnp 00:0a 100000000-27fffffff : System RAM (In reply to comment #12) > In the page2_nousb.JPG, the following message can be seen. > > EFI Variables Facility v0.08 2004-May-17 > BUG: unable to handle kernel paging request at 00000000ffc0004c. > > I guess that efivars_init(drivers/firmware/efivars.c) tried to access EFI > variable via runtime service(get_next_variable) and it failed. > > The address 0xffc0004c seems to be physical address. In efi physical mode, > only efi runtime code/data can be accessed with physical address. According > to var/log/dmesg in sosreport, the following addresses are runtime > code/data. 0xffc0004c is out of these ranges hence page fault occurred. > > [runtime service code] > range=[0x000000007d5e6000-0x000000007d604000) (0MB) > range=[0x000000007d641000-0x000000007d65f000) (0MB) > range=[0x000000007f60c000-0x000000007f614000) (0MB) > range=[0x000000007f63f000-0x000000007f68f000) (0MB) > > [runtime service data] > range=[0x000000007f5ef000-0x000000007f601000) (0MB) > range=[0x000000007f601000-0x000000007f60c000) (0MB) > range=[0x000000007f614000-0x000000007f63f000) (0MB) > > Anybody knows what address 0xffc0004c is? Yan, could you get /proc/iomem in > this server? I looked at dmesg Wed or Thu. I want to do it again. The EFI memory descriptor indicates from attribute that it is EFI run time and the memory type is MMIO. This was my concern when I asked you about my patch back in middle of November. You didn't believe it to be an issue. Now I think it definitely is. My patch for commit 1deea99897b17206b3069b0e5ede7dafa068d117 has caused this problem. The commit corrected your use of EFI memory type but it exposed a larger issue. I should have looked at this far closer. Also the page table entries need to be uncached for MMIO. IBM EFI seems to use this MMIO location in EFI RTL. I'm willing to bet this could be true of other EFI implementations. I think you need to look at EFI virtual mode code which I did some. This analysis needs to be reverified. It was done very quickly. Unfortunately I won't have much time to look further until possibly Tuesday next week. I'm on the hook for another issue which is due on close of business Monday and a totally new and frustrating area for me (not kernel). bob Okay. I verified this. This IBM machine has three EFI memory desriptors for MMIO. Two are uncached and one is cached(?). The region in question is uncached. EFI: mem187: type=11, attr=0x8000000000000001, range=[0x00000000ff800000-0x0000000100000000) (8MB) EFI: mem189: type=11, attr=0x8000000000000001, range=[0x0000000080000000-0x0000000090000000) (256MB) EFI: mem190: type=11, attr=0x8000000000000000, range=[0x00000000fed1c000-0x00000000fed20000) (0MB) I've done a brew build of the patch: https://brewweb.devel.redhat.com/taskinfo?taskID=3043894 which is boot tested on DELL only. UEFI has to (at least it has been) be installed in lab. Well I'm not physically there today and with weather maybe not tomorrow either. thanx, bob Created attachment 474116 [details]
EFI RTL physical mode patch
I tested your patch on Fujitsu PRIMEQUEST(UEFI/x86_64). It booted normally and kdump worked. I verified this patch on boiler.eng.lab.tlv.redhat.com (IBM UEFI blade) thanx to Michael. It booted twice. Second boot was to test nvram update by efibootmgr which I know little about but boot timeout remained. Reassigning to Picco as he has a solution to the regression. Expect a patch posting this morning. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available on kernel-2.6.32-112.el6 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html |