Bug 1714162
| Summary: | [Hyper-V][RHEL7.6] kexec-tools: kdump saves vmcore failed with enabled dynamic memory and login graphical mode | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | HuijingHei <hhei> | |
| Component: | kexec-tools | Assignee: | Kairui Song <kasong> | |
| Status: | CLOSED ERRATA | QA Contact: | Emma Wu <xiawu> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 7.6 | CC: | bhsharma, bhu, boyang, dhildenb, hhei, kasong, ldu, leiwang, ruyang, xialiu, xiaofwan, xiawu, xuli, yacao, yzheng | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | kexec-tools-2.0.15-33.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1718771 (view as bug list) | Environment: | ||
| Last Closed: | 2019-08-06 12:55:18 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1718771 | |||
| Bug Blocks: | 1661416 | |||
|
Description
HuijingHei
2019-05-27 09:31:43 UTC
This looks like a similar issue with: https://bugzilla.redhat.com/show_bug.cgi?id=1644600 On the same VM, if you use latest RHEL-8.1 instead, is it still reproducible? (In reply to Kairui Song from comment #2) > This looks like a similar issue with: > https://bugzilla.redhat.com/show_bug.cgi?id=1644600 For rhel7.6, with set-vmmemory can also result to vmcore-incomplete and similar console logs. Seems the same issue on rhel8.0(kexec-tools-2.0.15-21.el7_6.3.x86_64) > > On the same VM, if you use latest RHEL-8.1 instead, is it still reproducible? No, the issue does not exist on RHEL-8.1(20190523.0) with kexec-tools-2.0.19-3.el8.x86_64 Upstream: PG_offline essentially replaces PG_balloon.
However, in RHEL7, PG_balloon is still needed for other purposes: balloon compaction
We could
a) Backport b1123ea6d3b3d ("mm: balloon: use general non-lru movable page feature") and friends, to free up PG_balloon
b) Introduce a new MAPCOUNT value for PG_offline downstream, letting it co-exist with PG_balloon
c) Let it remain broken in RHEL7
I *guess* b) would be more feasible than a). I suspect that a) is quite involved.
(In reply to David Hildenbrand from comment #5) > Upstream: PG_offline essentially replaces PG_balloon. > > However, in RHEL7, PG_balloon is still needed for other purposes: balloon > compaction > > We could > > a) Backport b1123ea6d3b3d ("mm: balloon: use general non-lru movable page > feature") and friends, to free up PG_balloon > > b) Introduce a new MAPCOUNT value for PG_offline downstream, letting it > co-exist with PG_balloon > > c) Let it remain broken in RHEL7 > > I *guess* b) would be more feasible than a). I suspect that a) is quite > involved. Thanks, I agree plan b is a feasible solution. I've cloned a bug for kernel fix, bz1718771, will you implement it for RHEL-7? The kexec-tools backport should be pretty easy I assume. I'll have a look at the 7.7? backport and let you know when I run into issues. Testing with virtio-balloon without balloon compaction, not with Hyper-V, leaving that to the experts. To test with virtio-balloon, a special kernel build is required (CONFIG_BALLOON_COMPACTION=n). -> Task info: https://brewweb.devel.redhat.com/taskinfo?taskID=22226003 [cloud-user@rhel7 ~]$ uname -a Linux rhel7 3.10.0-1057.el7.test.x86_64 #1 SMP Tue Jun 18 07:21:55 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux [root@rhel7 cloud-user]# grep "BALLOON_COMPACTION" /boot/config-3.10.0-1057.el7. config-3.10.0-1057.el7.test.x86_64 [root@rhel7 cloud-user]# grep "BALLOON_COMPACTION" /boot/config-3.10.0-1057.el7.test.x86_64 # CONFIG_BALLOON_COMPACTION is not set 1. Start a guest with 8GB of memory, modified kernel and custom built "makedumpfile" installed. [cloud-user@rhel7 ~]$ cat /proc/meminfo MemTotal: 8008712 kB MemFree: 7683636 kB MemAvailable: 7630212 kB Buffers: 2088 kB 2. Inflate the balloon (notice that the crashkernel area also consumes memory) [dhildenb@virtlab412 ~]$ echo "balloon 700" | sudo nc -U /var/tmp/monitor QEMU 2.12.0 monitor - type 'help' for more information (qemu) balloon 700 [dhildenb@virtlab412 ~]$ echo "info balloon" | sudo nc -U /var/tmp/monitor QEMU 2.12.0 monitor - type 'help' for more information (qemu) info balloon balloon: actual=700 [cloud-user@rhel7 ~]$ cat /proc/meminfo MemTotal: 336904 kB MemFree: 136264 kB MemAvailable: 23600 kB Buffers: 724 kB 3. Modify /etc/kdump.conf to display verbose information when dumping -> core_collector makedumpfile -l --message-level 31 -d 31 4. Restart kdump [guest] $ systemctl restart kdump 5. Trigger a kernel crash [guest] $ echo 1 > /proc/sys/kernel/sysrq [guest] $ echo c > /proc/sysrq-trigger Guest restarts into kdump kernel and performs the dump. Being quick to capture the output: Original pages : 0x00000000001f7514 Excluded pages : 0x00000000001e9150 Pages filled with zero : 0x0000000000006bad Non-private cache pages : 0x000000000000370a Private cache pages : 0x000000000000000f User process data pages : 0x0000000000002a10 Free pages : 0x000000000000807a Hwpoison pages : 0x0000000000000000 Offline pages : 0x00000000001d4400 Remaining pages : 0x000000000000e3c4 (The number of pages is reduced to 2%.) Memory Hole : 0x0000000000048aec -------------------------------------------------- Total pages : 0x0000000000240000 8008712 kB - 336904 kB = 7671808 == 1917952 pages == 0x1D4400 pages -> All inflated pages (offline) got excluded. Similar approach will also work for testing under Hyper-V (inflate the balloon differently - enable dynamic memory). In contrast to RHEL8, this will *not* work with - XEN balloon - XEN patch to mark pages offline is not included - virtio-balloon (with CONFIG_BALLOON_COMPACTION=y) - Pages are marked PageOffline() and PageBalloon() -> kdump cannot handle this yet. So this really is only to fix Hyper-V. Hi David, Thanks for the work! I've backported your "[PATCH] exclude pages that are logically offline". But to get the patch merged we need three acks and blocker flag for RHEL-7.7. Can you also give devel_ack to the kernel bug? The dependency issue is because that bug is cloned which have a default dependency, I'll fix that. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2134 |