Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 233938

Summary:

x86_64 crash session on RHEL5 fails with read error during initialization

Product:

Red Hat Enterprise Linux 5

Reporter:

Eugene Teo (Security Response) <eteo>

Component:

crash

Assignee:

Dave Anderson <anderson>

Status:

CLOSED NOTABUG

QA Contact:

Severity:

medium

Docs Contact:

Priority:

medium

Version:

5.0

CC:

eteo

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-03-26 15:17:22 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
crash -d7 ./usr/lib/debug/lib/modules/2.6.9-22.0.1.ELsmp/vmlinux vmcore	none

Description Eugene Teo (Security Response) 2007-03-26 04:06:35 UTC

Description of problem:

When I run crash on RHEL5 with a RHEL4U2 kernel namelist, and dumpfile, it fails
to start with the following error:

# crash ./usr/lib/debug/lib/modules/2.6.9-22.0.1.ELsmp/vmlinux vmcore 
...
crash: read error: kernel virtual address: 1020385a004  type: "tss_struct ist array"

The vmcore file is from a 8-way server. I'm running crash on ProLiant DL360 G4p
that also has 8-cpus. This looks like BZ154566, but it was resolved in crash
3.10-13.10. The vmcore file I got from my customer is incomplete. I am not sure
if that is causing the problem. I have requested for a full kernel crash dump,
but looks like it may take awhile. Will you be able to verify?

Version-Release number of selected component (if applicable):
crash-4.0-3.14

How reproducible:
Always

Steps to Reproduce:
1. Run crash on x86_64 RHEL5 with x86_64 kernel namelist and kernel dumpfile.
2.
3.
  
Actual results:
crash fails during initialization with a "tss_struct ist array" read error.

Expected results:
crash session should come up normally.

Additional info:
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5 (Tikanga)
# uname -a
Linux dl360g4p.gsslab.rdu.redhat.com 2.6.18-8.1.1.el5xen #1 SMP Mon Feb 26
20:51:53 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
# file vmcore
vmcore: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from
'nux'

I have attached a crash -d7 log as well.

Comment 1 Eugene Teo (Security Response) 2007-03-26 04:06:35 UTC

Created attachment 150868 [details]
crash -d7 ./usr/lib/debug/lib/modules/2.6.9-22.0.1.ELsmp/vmlinux vmcore

Comment 2 Dave Anderson 2007-03-26 14:55:51 UTC

Thanks for the "-d7" log -- that's usually my first request...

If we strip out just dumpfile memory accesses, we see this:

<readmem: ffffffff804d51d0, KVADDR, "xtime", 16, (FOE), 9ef570>
<readmem: ffffffff803cc1a0, KVADDR, "system_utsname", 390, (ROE), 9efb5c>
<readmem: ffffffff803cc180, KVADDR, "linux_banner", 8, (FOE), 7fff58fe6c48>
<readmem: ffffffff80315dc2, KVADDR, "accessible check", 8, (ROE|Q), 7fff58fe68c8>
<readmem: ffffffff80315dc2, KVADDR, "readstring characters", 574, (ROE|Q),
7fff58fe58b0>
<readmem: ffffffff804d3080, KVADDR, "cpu_pda entry", 128, (FOE), a20540>
<readmem: ffffffff804d3100, KVADDR, "cpu_pda entry", 128, (FOE), a20540>
<readmem: ffffffff804d3180, KVADDR, "cpu_pda entry", 128, (FOE), a20540>
<readmem: ffffffff804d3200, KVADDR, "cpu_pda entry", 128, (FOE), a20540>
<readmem: ffffffff804d3280, KVADDR, "cpu_pda entry", 128, (FOE), a20540>
<readmem: ffffffff804d3300, KVADDR, "cpu_pda entry", 128, (FOE), a20540>
<readmem: ffffffff804d3380, KVADDR, "cpu_pda entry", 128, (FOE), a20540>
<readmem: ffffffff804d3400, KVADDR, "cpu_pda entry", 128, (FOE), a20540>
<readmem: 10010000084, KVADDR, "tss_struct ist array", 56, (FOE), 9fb090>
<readmem: 1020385a004, KVADDR, "tss_struct ist array", 56, (FOE), 9fb0c8>
crash: read error: kernel virtual address: 1020385a004  type: "tss_struct ist array"

The last kernel virtual address access at 1020385a004 failed.

The x86_64 has two "unity-mapped" virtual address spaces, one beginning
at ffffffff00000000 (__START_KERNEL_map) and the second one beginning
at 10000000000.  The first one maps the kernel's static text and data,
and the second one maps all physical memory into virtual memory.  In both
cases, the identifier can be stripped off, and that leaves the physical
memory address.  So the largest kernel text/data virtual address read
was at ffffffff804d51d0 ("xtime"), or 4d51d0 physical.  The last two reads
were generic virtual address accesses, the first one at 10010000084,
10000084 physical, was successfully read, while the second one at 1020385a004,
20385a004 physical, failed.

The netdump format is as simple as it gets -- it contains a page-sized
ELF header, followed by the contents of physical memory.  So the dumpfile
should be equal to the size of physical memory plus a page for the ELF
header data.

Since the last fatal read attempt was at 20385a004 physical, the dumpfile
would have to be over 8GB (0x200000000) in length.  The other addresses shown
for the "level4_pgt" page table addresses are all in the 15GB region,
so I guessing that this system is ~16GB.  So I'm presuming that the
vmcore-incomplete is too small -- just do an "ls -l" on it.

Comment 3 Eugene Teo (Security Response) 2007-03-26 15:01:52 UTC

Thanks for the analysis. I learnt a lot. Yes, the incomplete vmcore is only
4.8GB and I was expecting a 16GB vmcore.

Comment 4 Dave Anderson 2007-03-26 15:17:22 UTC

Yep, that's unfortunate...

Even if the crash code was hacked to skip the "ist" (interrupt stack)
initialization, it's doubtful that it would get too far beyond that
given that it's only got a quarter of the physical memory.

For 32-bit x86 systems, you can often analyze vmcore-incomplete
files as long as they at least contain all of "lowmem", i.e.,
at least 896MB.  You wouldn't be able to access module data since
that typically gets vmalloc'd out of highmem, but the crash session
will initialize, and, since all kernel stacks are in lowmem, you
could get backtraces for all tasks.  In fact, most commands work
just fine since kernel static data, slab memory, etc. comes out of
lowmem.  Highmem will only contain user-memory and vmalloc'd kernel
memory (mostly for modules).

But for 64-bit systems, stuff gets allocated from all over the
physical memory map, and despite this just being "ist" related,
it would invariably bump into another piece of critical data
if that were ignored.