Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 668825 - Server cannot boot with kernel-2.6.32-85
Server cannot boot with kernel-2.6.32-85
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
x86_64 Linux
medium Severity high
: rc
: ---
Assigned To: bob picco
Zhang Kexin
ptam
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-11 12:52 EST by Yan Vugenfirer
Modified: 2014-04-21 22:15 EDT (History)
10 users (show)

See Also:
Fixed In Version: kernel-2.6.32-112.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-23 16:35:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
bpicco: needinfo-


Attachments (Terms of Use)
sosreport output (1.66 MB, application/octet-stream)
2011-01-11 12:54 EST, Yan Vugenfirer
no flags Details
Boot output - first screen shot (129.41 KB, image/jpeg)
2011-01-11 17:11 EST, Yan Vugenfirer
no flags Details
Boot output - second screen shot (131.89 KB, image/jpeg)
2011-01-11 17:12 EST, Yan Vugenfirer
no flags Details
Boot output with "nousb" - first screen shot (128.23 KB, image/jpeg)
2011-01-11 17:14 EST, Yan Vugenfirer
no flags Details
Boot output with "nousb" - second screen shot (96.31 KB, image/jpeg)
2011-01-11 17:15 EST, Yan Vugenfirer
no flags Details
EFI RTL physical mode patch (1.52 KB, text/plain)
2011-01-18 13:04 EST, bob picco
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 07:58:07 EDT

  None (edit)
Description Yan Vugenfirer 2011-01-11 12:52:46 EST
Description of problem:
While trying to boot the server with kernel-2.6.32-85 and up, the boot process will hang.

Version-Release number of selected component (if applicable):
kernel-2.6.32-84 - boots OK
Starting from kernel-2.6.32-85 there are hangs.

How reproducible:
Each time.


Steps to Reproduce:
1. yum install -i kernel-firmware-2.6.32-85.el6.noarch.rpm
2. yum install -i kernel-2.6.32-85.el6.x86_64.rpm
3. reboot
  
Actual results:
Kernel is not booted

Expected results:
Kernel should boot

Additional info:
Will add sosreport file from the server.
Comment 2 Yan Vugenfirer 2011-01-11 12:54:52 EST
Created attachment 472861 [details]
sosreport output
Comment 3 Don Zickus 2011-01-11 15:44:13 EST
Hi Yan,

Is there any console output?  A sosreport won't show me what is going with the hang.

Cheers,
Don
Comment 4 Yan Vugenfirer 2011-01-11 17:11:21 EST
Created attachment 472906 [details]
Boot output - first screen shot
Comment 5 Yan Vugenfirer 2011-01-11 17:12:50 EST
Created attachment 472907 [details]
Boot output - second screen shot
Comment 6 Yan Vugenfirer 2011-01-11 17:14:00 EST
Created attachment 472908 [details]
Boot output with "nousb" - first screen shot
Comment 7 Yan Vugenfirer 2011-01-11 17:15:31 EST
Created attachment 472909 [details]
Boot output with "nousb" - second screen shot
Comment 12 Takao Indoh 2011-01-13 18:00:31 EST
In the page2_nousb.JPG, the following message can be seen.

EFI Variables Facility v0.08 2004-May-17
BUG: unable to handle kernel paging request at 00000000ffc0004c.

I guess that efivars_init(drivers/firmware/efivars.c) tried to access EFI
variable via runtime service(get_next_variable) and it failed.

The address 0xffc0004c seems to be physical address. In efi physical mode,
only efi runtime code/data can be accessed with physical address. According
to var/log/dmesg in sosreport, the following addresses are runtime
code/data. 0xffc0004c is out of these ranges hence page fault occurred.

[runtime service code]
range=[0x000000007d5e6000-0x000000007d604000) (0MB)
range=[0x000000007d641000-0x000000007d65f000) (0MB)
range=[0x000000007f60c000-0x000000007f614000) (0MB)
range=[0x000000007f63f000-0x000000007f68f000) (0MB)

[runtime service data]
range=[0x000000007f5ef000-0x000000007f601000) (0MB)
range=[0x000000007f601000-0x000000007f60c000) (0MB)
range=[0x000000007f614000-0x000000007f63f000) (0MB)

Anybody knows what address 0xffc0004c is? Yan, could you get /proc/iomem in this server?
Comment 13 Michael S. Tsirkin 2011-01-14 01:34:29 EST
00000000-00000fff : reserved
00001000-0006bfff : System RAM
0006c000-0006cfff : ACPI Non-volatile Storage
0006d000-0009efff : System RAM
0009f000-0009ffff : ACPI Non-volatile Storage
00100000-7d5e5fff : System RAM
  01000000-014cda67 : Kernel code
  014cda68-01ba586f : Kernel data
  01ce3000-01f9e077 : Kernel bss
  02000000-0a0fffff : Crash kernel
7d5e6000-7d603fff : reserved
7d604000-7d640fff : System RAM
7d641000-7d65efff : reserved
7d65f000-7d7dafff : System RAM
7d7db000-7d88afff : reserved
7d88b000-7f5eefff : System RAM
7f5ef000-7f6defff : reserved
7f6df000-7f7defff : ACPI Non-volatile Storage
7f7df000-7f7fefff : ACPI Tables
7f7ff000-7f7fffff : System RAM
7f800000-7fffffff : RAM buffer
80000000-8fffffff : PCI MMCONFIG 0 [00-ff]
  80000000-8fffffff : reserved
    80000000-8fffffff : pnp 00:0a
90000000-901fffff : PCI Bus 0000:0b
  90000000-901fffff : 0000:0b:00.0
92000000-95ffffff : PCI Bus 0000:10
  92000000-93ffffff : 0000:10:00.0
    92000000-93ffffff : bnx2
  94000000-95ffffff : 0000:10:00.1
    94000000-95ffffff : bnx2
96000000-96ffffff : PCI Bus 0000:06
  96000000-96ffffff : PCI Bus 0000:07
    96000000-96ffffff : 0000:07:00.0
      96000000-965fffff : efifb
97000000-978fffff : PCI Bus 0000:06
  97000000-978fffff : PCI Bus 0000:07
    97000000-977fffff : 0000:07:00.0
    97800000-97803fff : 0000:07:00.0
97900000-979fffff : PCI Bus 0000:0b
  97900000-9790ffff : 0000:0b:00.0
    97900000-9790ffff : mpt
  97910000-97913fff : 0000:0b:00.0
    97910000-97913fff : mpt
97a00000-97a03fff : 0000:00:16.0
  97a00000-97a03fff : ioatdma
97a04000-97a07fff : 0000:00:16.1
  97a04000-97a07fff : ioatdma
97a08000-97a0bfff : 0000:00:16.2
  97a08000-97a0bfff : ioatdma
97a0c000-97a0ffff : 0000:00:16.3
  97a0c000-97a0ffff : ioatdma
97a10000-97a13fff : 0000:00:16.4
  97a10000-97a13fff : ioatdma
97a14000-97a17fff : 0000:00:16.5
  97a14000-97a17fff : ioatdma
97a18000-97a1bfff : 0000:00:16.6
  97a18000-97a1bfff : ioatdma
97a1c000-97a1ffff : 0000:00:16.7
  97a1c000-97a1ffff : ioatdma
97a21000-97a213ff : 0000:00:1d.7
  97a21000-97a213ff : ehci_hcd
97a21400-97a217ff : 0000:00:1a.7
  97a21400-97a217ff : ehci_hcd
97a21800-97a218ff : 0000:00:1f.3
fc000000-fcffffff : pnp 00:0a
fe710000-fe711fff : pnp 00:0a
fe800000-fe9fffff : pnp 00:0a
fea00000-feafffff : pnp 00:0a
feb00000-febfffff : pnp 00:0a
fec00000-fec00fff : IOAPIC 0
fec80000-fec80fff : IOAPIC 1
fed00000-fed003ff : HPET 0
  fed00000-fed003ff : pnp 00:05
fed1c000-fed1ffff : reserved
  fed1c000-fed1ffff : pnp 00:0a
fee00000-feefffff : pnp 00:0a
  fee00000-fee00fff : Local APIC
ff800000-ffffffff : reserved
  ffc00000-ffffffff : pnp 00:0a
100000000-27fffffff : System RAM
Comment 14 bob picco 2011-01-14 07:34:34 EST
(In reply to comment #12)
> In the page2_nousb.JPG, the following message can be seen.
> 
> EFI Variables Facility v0.08 2004-May-17
> BUG: unable to handle kernel paging request at 00000000ffc0004c.
> 
> I guess that efivars_init(drivers/firmware/efivars.c) tried to access EFI
> variable via runtime service(get_next_variable) and it failed.
> 
> The address 0xffc0004c seems to be physical address. In efi physical mode,
> only efi runtime code/data can be accessed with physical address. According
> to var/log/dmesg in sosreport, the following addresses are runtime
> code/data. 0xffc0004c is out of these ranges hence page fault occurred.
> 
> [runtime service code]
> range=[0x000000007d5e6000-0x000000007d604000) (0MB)
> range=[0x000000007d641000-0x000000007d65f000) (0MB)
> range=[0x000000007f60c000-0x000000007f614000) (0MB)
> range=[0x000000007f63f000-0x000000007f68f000) (0MB)
> 
> [runtime service data]
> range=[0x000000007f5ef000-0x000000007f601000) (0MB)
> range=[0x000000007f601000-0x000000007f60c000) (0MB)
> range=[0x000000007f614000-0x000000007f63f000) (0MB)
> 
> Anybody knows what address 0xffc0004c is? Yan, could you get /proc/iomem in
> this server?

I looked at dmesg Wed or Thu. I want to do it again. The EFI memory descriptor
indicates from attribute that it is EFI run time and the memory type is MMIO.
This was my concern when I asked you about my patch back in middle of November.
You didn't believe it to be an issue. Now I think it definitely is. My patch
for commit 1deea99897b17206b3069b0e5ede7dafa068d117 has caused this problem. The commit corrected your use of EFI memory type but it exposed a larger issue. I should have looked
at this far closer. Also the page table entries need to be uncached for MMIO.

IBM EFI seems to use this MMIO location in EFI RTL. I'm willing to bet this
could be true of other EFI implementations. I think you need to look at EFI
virtual mode code which I did some.

This analysis needs to be reverified. It was done very quickly.

Unfortunately I won't have much time to look further until possibly Tuesday
next week. I'm on the hook for another issue which is due on close of business
Monday and a totally new and frustrating area for me (not kernel).

bob
Comment 15 bob picco 2011-01-18 13:02:55 EST
Okay. I verified this.

This IBM machine has three EFI memory desriptors for MMIO. Two are uncached and 
one is cached(?). The region in question is uncached.
EFI: mem187: type=11, attr=0x8000000000000001, range=[0x00000000ff800000-0x0000000100000000) (8MB)
EFI: mem189: type=11, attr=0x8000000000000001, range=[0x0000000080000000-0x0000000090000000) (256MB)
EFI: mem190: type=11, attr=0x8000000000000000, range=[0x00000000fed1c000-0x00000000fed20000) (0MB)

I've done a brew build of the patch:
https://brewweb.devel.redhat.com/taskinfo?taskID=3043894
which is boot tested on DELL only. UEFI has to (at least it has been)
be installed in lab. Well I'm not physically there today and with weather
maybe not tomorrow either.


thanx,

bob
Comment 16 bob picco 2011-01-18 13:04:53 EST
Created attachment 474116 [details]
EFI RTL physical mode patch
Comment 17 Takao Indoh 2011-01-18 17:58:18 EST
I tested your patch on Fujitsu PRIMEQUEST(UEFI/x86_64). It booted normally and kdump worked.
Comment 18 bob picco 2011-01-19 08:48:54 EST
I verified this patch on boiler.eng.lab.tlv.redhat.com (IBM UEFI blade) thanx to Michael. It booted twice. Second boot was to test nvram update by efibootmgr which I know little about but boot timeout remained.
Comment 19 Peter Martuccelli 2011-01-20 08:40:36 EST
Reassigning to Picco as he has a solution to the regression.  Expect a patch posting this morning.
Comment 21 RHEL Product and Program Management 2011-01-20 08:50:19 EST
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.
Comment 29 Aristeu Rozanski 2011-02-03 11:46:25 EST
Patch(es) available on kernel-2.6.32-112.el6
Comment 37 errata-xmlrpc 2011-05-23 16:35:40 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.