RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1714162 - [Hyper-V][RHEL7.6] kexec-tools: kdump saves vmcore failed with enabled dynamic memory and login graphical mode
Summary: [Hyper-V][RHEL7.6] kexec-tools: kdump saves vmcore failed with enabled dynami...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kexec-tools
Version: 7.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Kairui Song
QA Contact: Emma Wu
URL:
Whiteboard:
Depends On: 1718771
Blocks: 1661416
TreeView+ depends on / blocked
 
Reported: 2019-05-27 09:31 UTC by HuijingHei
Modified: 2020-09-21 08:27 UTC (History)
15 users (show)

Fixed In Version: kexec-tools-2.0.15-33.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1718771 (view as bug list)
Environment:
Last Closed: 2019-08-06 12:55:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4370361 0 None None None 2019-08-24 17:25:18 UTC
Red Hat Product Errata RHBA-2019:2134 0 None None None 2019-08-06 12:55:40 UTC

Description HuijingHei 2019-05-27 09:31:43 UTC
Description of problem:
Gen2 vm on Hyper-V, login graphical mode with enabled dynamic memory, trigger kdump and kdump saving vmcore failed with vmcore-incomplete

Version-Release number of selected component (if applicable):
kernel 3.10.0-957.el7.x86_64
<Host> hyper-v windows
<Hyper-V Virtual Machine>
Generation: Gen2
Secure Boot: Disabled
The number of virtual CPUs: 4
Virtual memory: 4096MB (Dynamic memory enabled )

How reproducible:80%


Steps to Reproduce:
1. Create a Hyper-V virtual machine described in above.

2. Install RHEL7.6(kernel: 3.10.0-957.el7.x86_64) with Software Selection: [Server with GUI], and Kdump: enabled

3. Reboot OS after installation.

4. Login to RHEL graphical mode and execute the following command.
   # echo c > /proc/sysrq-trigger

Actual results:
--- console log ---
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
Checking for memory holes                         : [  0.0 %] /                                                                                                                                                      Checking for memory holes                         : [100.0 %] |                                                                                                                                                      Excluding unnecessary pages                       : [100.0 %] \                                                                                                                                                      Copying data                                      : [ 86.3 %] -           eta: 0                                                                                                                                     s
[    6.633284] traps: makedumpfile[1237] general protection ip:7f682e343d69 sp:                                                                                                                                     7ffd54fee178 error:0 in libc-2.17.so[7f682e1f0000+1c2000]
/lib/kdump-lib-initramfs.sh: line 86:  1237 Segmentation fault      $CORE_COLLEC                                                                                                                                     TOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
kdump: saving vmcore failed
[FAILED] Failed to start Kdump Vmcore Save Service.
--------------------
After the above console log, guest reboot and check there is the vmcore-incomplete, sometimes kdump failed to start

Expected results:
kdump should work with saving vmcore successfully


Additional info:
1. Change vm memory to 8G, the issue does not exist.
2. RHEL8 does not have the same issue.

Comment 2 Kairui Song 2019-05-27 15:32:33 UTC
This looks like a similar issue with:
https://bugzilla.redhat.com/show_bug.cgi?id=1644600

On the same VM, if you use latest RHEL-8.1 instead, is it still reproducible?

Comment 3 HuijingHei 2019-05-28 05:44:24 UTC
(In reply to Kairui Song from comment #2)
> This looks like a similar issue with:
> https://bugzilla.redhat.com/show_bug.cgi?id=1644600

For rhel7.6, with set-vmmemory can also result to vmcore-incomplete and similar console logs. Seems the same issue on rhel8.0(kexec-tools-2.0.15-21.el7_6.3.x86_64)

> 
> On the same VM, if you use latest RHEL-8.1 instead, is it still reproducible?
No, the issue does not exist on RHEL-8.1(20190523.0) with kexec-tools-2.0.19-3.el8.x86_64

Comment 5 David Hildenbrand 2019-05-29 12:19:41 UTC
Upstream: PG_offline essentially replaces PG_balloon.

However, in RHEL7, PG_balloon is still needed for other purposes: balloon compaction

We could

a) Backport b1123ea6d3b3d ("mm: balloon: use general non-lru movable page feature") and friends, to free up PG_balloon

b) Introduce a new MAPCOUNT value for PG_offline downstream, letting it co-exist with PG_balloon

c) Let it remain broken in RHEL7

I *guess* b) would be more feasible than a). I suspect that a) is quite involved.

Comment 6 Kairui Song 2019-06-10 08:01:03 UTC
(In reply to David Hildenbrand from comment #5)
> Upstream: PG_offline essentially replaces PG_balloon.
> 
> However, in RHEL7, PG_balloon is still needed for other purposes: balloon
> compaction
> 
> We could
> 
> a) Backport b1123ea6d3b3d ("mm: balloon: use general non-lru movable page
> feature") and friends, to free up PG_balloon
> 
> b) Introduce a new MAPCOUNT value for PG_offline downstream, letting it
> co-exist with PG_balloon
> 
> c) Let it remain broken in RHEL7
> 
> I *guess* b) would be more feasible than a). I suspect that a) is quite
> involved.

Thanks, I agree plan b is a feasible solution. I've cloned a bug for kernel fix, bz1718771, will you implement it for RHEL-7?

Comment 7 David Hildenbrand 2019-06-11 10:37:18 UTC
The kexec-tools backport should be pretty easy I assume.

I'll have a look at the 7.7? backport and let you know when I run into issues.

Comment 8 David Hildenbrand 2019-06-18 13:16:33 UTC
Testing with virtio-balloon without balloon compaction, not with Hyper-V, leaving that to the experts. To test with virtio-balloon, a special kernel build is required (CONFIG_BALLOON_COMPACTION=n).

-> Task info: https://brewweb.devel.redhat.com/taskinfo?taskID=22226003

[cloud-user@rhel7 ~]$ uname -a
Linux rhel7 3.10.0-1057.el7.test.x86_64 #1 SMP Tue Jun 18 07:21:55 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux

[root@rhel7 cloud-user]# grep "BALLOON_COMPACTION" /boot/config-3.10.0-1057.el7. 
config-3.10.0-1057.el7.test.x86_64
[root@rhel7 cloud-user]# grep "BALLOON_COMPACTION" /boot/config-3.10.0-1057.el7.test.x86_64
# CONFIG_BALLOON_COMPACTION is not set



1. Start a guest with 8GB of memory, modified kernel and custom built "makedumpfile" installed.

[cloud-user@rhel7 ~]$ cat /proc/meminfo 
MemTotal:        8008712 kB
MemFree:         7683636 kB
MemAvailable:    7630212 kB
Buffers:            2088 kB

2. Inflate the balloon (notice that the crashkernel area also consumes memory)

[dhildenb@virtlab412 ~]$ echo "balloon 700" | sudo nc -U /var/tmp/monitor
QEMU 2.12.0 monitor - type 'help' for more information
(qemu) balloon 700
[dhildenb@virtlab412 ~]$ echo "info balloon" | sudo nc -U /var/tmp/monitor
QEMU 2.12.0 monitor - type 'help' for more information
(qemu) info balloon
balloon: actual=700

[cloud-user@rhel7 ~]$ cat /proc/meminfo 
MemTotal:         336904 kB
MemFree:          136264 kB
MemAvailable:      23600 kB
Buffers:             724 kB

3. Modify /etc/kdump.conf to display verbose information when dumping

-> core_collector makedumpfile -l --message-level 31 -d 31

4. Restart kdump

[guest] $ systemctl restart kdump

5. Trigger a kernel crash

[guest] $ echo 1 > /proc/sys/kernel/sysrq
[guest] $ echo c > /proc/sysrq-trigger


Guest restarts into kdump kernel and performs the dump. Being quick to capture the output:

Original pages  : 0x00000000001f7514
  Excluded pages   : 0x00000000001e9150
    Pages filled with zero  : 0x0000000000006bad
    Non-private cache pages : 0x000000000000370a
    Private cache pages     : 0x000000000000000f
    User process data pages : 0x0000000000002a10
    Free pages              : 0x000000000000807a
    Hwpoison pages          : 0x0000000000000000
    Offline pages           : 0x00000000001d4400
  Remaining pages  : 0x000000000000e3c4
  (The number of pages is reduced to 2%.)
Memory Hole     : 0x0000000000048aec
--------------------------------------------------
Total pages     : 0x0000000000240000


8008712 kB - 336904 kB = 7671808 == 1917952 pages == 0x1D4400 pages

-> All inflated pages (offline) got excluded.


Similar approach will also work for testing under Hyper-V (inflate the balloon differently - enable dynamic memory).

In contrast to RHEL8, this will *not* work with
- XEN balloon - XEN patch to mark pages offline is not included
- virtio-balloon (with CONFIG_BALLOON_COMPACTION=y) - Pages are marked PageOffline() and PageBalloon() -> kdump cannot handle this yet.

So this really is only to fix Hyper-V.

Comment 10 Kairui Song 2019-06-19 03:23:44 UTC
Hi David,

Thanks for the work! I've backported your "[PATCH] exclude pages that are logically offline". But to get the patch merged we need three acks and blocker flag for RHEL-7.7.

Can you also give devel_ack to the kernel bug? The dependency issue is because that bug is cloned which have a default dependency, I'll fix that.

Comment 24 errata-xmlrpc 2019-08-06 12:55:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2134


Note You need to log in before you can comment on or make changes to this bug.