Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1785126

Summary: dump-guest-memory failed due to Python Exception <class 'gdb.MemoryError'>
Product: Red Hat Enterprise Linux 8 Reporter: Xueqiang Wei <xuwei>
Component: gdbAssignee: Kevin Buettner <kevinb>
gdb sub component: system-version QA Contact: Michal Kolar <mkolar>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: chayang, coli, dsmith, gdb-bugs, jinzhao, juzhang, keiths, kevinb, lersek, marcandre.lureau, mcermak, ohudlick, qe-baseos-tools-bugs, virt-maint
Version: 8.2Keywords: Bugfix, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gdb-8.2-13.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1777751
: 1847164 1886602 (view as bug list) Environment:
Last Closed: 2021-05-18 15:46:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1777751, 1842691    
Bug Blocks: 1847164    
Attachments:
Description Flags
Patch for dump-guest-memory.py none

Comment 1 Xueqiang Wei 2019-12-19 08:45:18 UTC
Versions:
Host:
kernel-4.18.0-167.el8.x86_64
kernel-debuginfo-common-x86_64-4.18.0-167.el8.x86_64
kernel-debuginfo-4.18.0-167.el8.x86_64
kernel-debug-4.18.0-167.el8.x86_64
gdb-headless-8.2-8.el8.x86_64
gdbm-1.18-1.el8.x86_64
gdbm-libs-1.18-1.el8.x86_64
gdb-8.2-8.el8.x86_64
gcc-gdb-plugin-8.3.1-5.el8.x86_64
qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc
spice-server-0.14.2-1.el8.x86_64
seavgabios-bin-1.12.0-5.module+el8.2.0+4793+b09dd2fb.noarch
seabios-1.12.0-5.module+el8.2.0+4793+b09dd2fb.x86_64
seabios-bin-1.12.0-5.module+el8.2.0+4793+b09dd2fb.noarch
edk2-ovmf-20190829git37eef91017ad-4.el8.noarch

Guest:
kernel-4.18.0-160.el8.x86_64


Not hit it with seabois.   
Hit it with edk2.

Comment 2 Xueqiang Wei 2020-01-15 06:38:28 UTC
With the same environment mentioned in Comment 1, tested with dump-guest-core=off.

If set dump-guest-core=off: 1. hit this issue with seabios  2. hit this issue with edk2

(gdb) source /usr/share/qemu-kvm/dump-guest-memory.py 
(gdb) set height 0
(gdb) dump-guest-memory /tmp/vmcore X86_64
guest RAM blocks:
target_start     target_end       host_addr        message count
---------------- ---------------- ---------------- ------- -----
0000000000000000 00000000000a0000 00007f9613e00000 added       1
00000000000c0000 00000000000ca000 00007f9613ec0000 added       2
00000000000ca000 00000000000cd000 00007f9613eca000 joined      2
00000000000cd000 00000000000e8000 00007f9613ecd000 joined      2
00000000000e8000 00000000000f0000 00007f9613ee8000 joined      2
00000000000f0000 0000000000100000 00007f9613ef0000 joined      2
0000000000100000 0000000080000000 00007f9613f00000 joined      2
00000000f4000000 00000000f8000000 00007f960fc00000 added       3
00000000f8000000 00000000fc000000 00007f960ba00000 added       4
00000000fd010000 00000000fd012000 00007f971ba00000 added       5
00000000fffc0000 0000000100000000 00007f971be00000 added       6
0000000100000000 0000000180000000 00007f9693e00000 added       7
Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x7f970f87c000: 
Error occurred in Python command: Cannot access memory at address 0x7f970f87c000
(gdb) bt
#0  0x00007f9732829016 in ppoll () at /lib64/libc.so.6
#1  0x0000560d590351c5 in ppoll
    (__ss=0x0, __timeout=0x7ffd99034da0, __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/bits/poll2.h:77
#2  0x0000560d590351c5 in qemu_poll_ns
    (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=1675788861)
    at util/qemu-timer.c:334
#3  0x0000560d590360d5 in os_host_main_loop_wait (timeout=1675788861) at util/main-loop.c:233
#4  0x0000560d590360d5 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#5  0x0000560d58cfda57 in main_loop () at vl.c:1981
#6  0x0000560d58cfda57 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4820
(gdb)

Comment 3 Kevin Buettner 2020-02-11 01:00:26 UTC
Created attachment 1662350 [details]
Patch for dump-guest-memory.py

Comment 4 Kevin Buettner 2020-02-11 01:05:41 UTC
I was unable to create an environment for reproducing the bug on either RHEL 8.1 or RHEL 8.2. That said, I believe the problem exists in both 8.1 and 8.2; I just wasn't able to use virt-manager to create suitable VMs for testing.

I WAS able to reproduce the problem using Fedora 31.  Using virt-manager on Fedora 31, I created two VMs, one running RHEL 8.1 with firmware set to BIOS and the other with firmware set to OVMF (UEFI x86_64: /usr/share/edk2/ovmf/OVMF_CODE.fd). I then made these VMs run stand-alone, without the libvirt framework.  With that in place, I followed the instructions in the "Steps to reproduce" section of the bug report. I observed that the dump-guest-memory.py script was able to create a guest core dump from within GDB using a core dump from running the RHEL 8.1 BIOS VM.  When I tried the same thing using a core dump obtained from running the RHEL 8.1 OVMF VM, I observed the python exception noted in the bug report.

However, when I attempt to dump the guest memory from GDB that's attached to the running QEMU process (which is running the RHEL 8.1 OVMF VM), the dump-guest-memory script works as expected. After poking around some more, I confirmed that the memory region in question is absent from the core file, but was originally accessible when the process was running.

Again using GDB, I debugged the linux kernel on the virtualization host. This kernel was responsible for making the core dump of QEMU running the RHEL 8.1 OVMF VM.  Specifically, I looked at vma_dump_size() in fs/binfmt_elf.c to see why the memory region in question was not being dumped. When I stepped through this chunk of code for the problematic memory region...

	/* Dump segments that have been written to.  */
	if (vma->anon_vma && FILTER(ANON_PRIVATE))
		goto whole;
	if (vma->vm_file == NULL)
		return 0;

...I found that the "return 0" statement was being taken, which in turn causes the region in question to not be dumped.

I took another look at the memory region (from within GDB attached to the running QEMU process running RHEL 8.1 OVMF) and found that it was all 0. It seems likely that the region in question had been allocated, but never written to, so I think it makes sense for the kernel to not dump this region.

That being the case, the memory region in question is not available to GDB or the dump-guest-memory.py script, running within GDB. Thus, the behavior of GDB, throwing the exception for inaccessible memory, is correct.

I've made a minor modification to dump-guest-memory.py which causes it to write out zeroes for inaccessible memory.  As noted above, in this case, the memory is inaccessible due to the kernel choosing to not include it in the QEMU core dump.  I've added this patch as an attachment.

In conclusion, this problem is not a GDB bug. It might be argued that it's a kernel bug, but as described earlier I think the kernel's behavior in the case makes sense. Therefore, I think the bug is actually in the dump-guest-memory.py script which doesn't handle the case of the allocated (zeroed), but never-written-to region not being placed in the core dump.

Comment 5 Kevin Buettner 2020-02-11 04:10:06 UTC
(In reply to Xueqiang Wei from comment #2)
> With the same environment mentioned in Comment 1, tested with
> dump-guest-core=off.
> 
> If set dump-guest-core=off: 1. hit this issue with seabios  2. hit this
> issue with edk2

FWIW, dump-guest-memory.py won't work correctly with "dump-guest-core=off".  When the "off" setting is used, QEMU calls madvise(...,...,MADV_DONT_DUMP) for the guest memory regions.  So, when you cause a core dump to occur, those pages won't be dumped, leading all of the guest's memory being inaccessible.

Comment 6 Laszlo Ersek 2020-02-11 08:56:18 UTC
Per Kevin Buettner's analysis in <https://bugzilla.redhat.com/show_bug.cgi?id=1785126#c4>, the host kernel's process coredump logic is what decides to omit the never-written-to region from the qemu process coredump. The python script should expect and handle this possibility.

Kevin proposed a patch in <https://bugzilla.redhat.com/show_bug.cgi?id=1785126#c3>; we should upstream it and backport it.

Moving this BZ back to qemu-kvm.

Comment 7 Marc-Andre Lureau 2020-02-14 17:47:24 UTC
Kevin, can you send the patch to qemu-devel? thanks

Comment 8 Kevin Buettner 2020-02-14 19:51:46 UTC
(In reply to Marc-Andre Lureau from comment #7)
> Kevin, can you send the patch to qemu-devel? thanks

Yes, I can (and will) do that.

Comment 9 Kevin Buettner 2020-02-14 23:32:02 UTC
(In reply to Marc-Andre Lureau from comment #7)
> Kevin, can you send the patch to qemu-devel? thanks

Done.  See:

https://lists.nongnu.org/archive/html/qemu-devel/2020-02/msg03875.html

Comment 12 Kevin Buettner 2020-03-05 00:55:46 UTC
I've done further investigations and have found that there is a bug in BFD and GDB.

I've filed an upstream bug which may be found here:

https://sourceware.org/bugzilla/show_bug.cgi?id=25631

A patch series fixing this bug can be found starting here:

https://sourceware.org/ml/gdb-patches/2020-03/msg00106.html

(Though it's likely that several iterations will be necessary for the series to be approved.)

I've tested a GDB build using my patches against a QEMU core file and found that the original dump-guest-memory script works as expected.

Comment 13 Laszlo Ersek 2020-03-05 13:56:55 UTC
Thank you, Kevin -- does that mean we should move this BZ (and bug 1777751) to a different component? Thanks.

Comment 14 Kevin Buettner 2020-03-05 14:04:01 UTC
(In reply to Laszlo Ersek from comment #13)
> Thank you, Kevin -- does that mean we should move this BZ (and bug 1777751)
> to a different component? Thanks.

Yes, they should both be moved to gdb.

Comment 15 Laszlo Ersek 2020-03-05 15:44:01 UTC
Moving to gdb per <https://bugzilla.redhat.com/show_bug.cgi?id=1785126#c14>. Thanks!

Comment 17 Keith Seitz 2020-06-15 19:37:14 UTC
Moving to 8.4.

Comment 22 Michal Kolar 2020-11-17 20:36:48 UTC
Reproduced against gdb-8.2-12.el8 and verified against both gdb-8.2-13.el8 and gdb-8.2-14.el8.

Comment 25 Michal Kolar 2020-11-23 09:02:08 UTC
Verified against gdb-8.2-14.el8.

Comment 27 errata-xmlrpc 2021-05-18 15:46:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (gdb bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1836