Bug 1181649
Summary: | makedumpfile: User process data pages are not excluded appropriately. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Tetsuo Handa <penguin-kernel> |
Component: | kexec-tools | Assignee: | kdump team <kdump-team-bugs> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.2 | CC: | anderson, bhe, fernando, mhuang, penguin-kernel |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-03-30 08:05:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tetsuo Handa
2015-01-13 14:29:22 UTC
I don't maintain makedumpfile, but I'm a member of the "kdump team" and noticed this BZ go by. I do own/maintain the crash utility, and since your results were certainly surprising, I thought I'd just try to reproduce it in a current RHEL7 environment. But running on a 3.10.0-221.el7 kernel with kexec-tools-2.0.7-14.el7, I could not reproduce this. I tested it with the same sequence of commands shown in your description, and the resultant dumpfile sizes are essentially the same, and the excluded page type counts are all in the same ballpark. I also tried it with lzo compression, which is typically much better than using the traditional zlib compression, and we use it by default now. Here are my results: Using zlib compression: memfill 0: kdump: dump target is /dev/mapper/rhel_dell--per410--01-root kdump: saving to /sysroot//var/crash/127.0.0.1-2015.01.14-11:37:50/ kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore STEP [Excluding unnecessary pages] : 0.117448 seconds STEP [Excluding unnecessary pages] : 0.117090 seconds STEP [Copying data ] : 4.732822 seconds STEP [Copying data ] : 0.135796 seconds Original pages : 0x00000000003f6511 Excluded pages : 0x00000000003d9b0a Pages filled with zero : 0x000000000000b0ae Cache pages : 0x0000000000001e8c Cache pages + private : 0x0000000000000001 User process data pages : 0x000000000006749e Free pages : 0x0000000000365731 Hwpoison pages : 0x0000000000000000 Remaining pages : 0x000000000001ca07 (The number of pages is reduced to 2%.) Memory Hole : 0x0000000000039aef -------------------------------------------------- Total pages : 0x0000000000430000 kdump: saving vmcore complete memfill 1: kdump: dump target is /dev/mapper/rhel_dell--per410--01-root kdump: saving to /sysroot//var/crash/127.0.0.1-2015.01.14-15:36:13/ kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore STEP [Excluding unnecessary pages] : 0.114320 seconds STEP [Excluding unnecessary pages] : 0.113896 seconds STEP [Copying data ] : 4.505355 seconds STEP [Copying data ] : 0.103653 seconds Original pages : 0x00000000003f6511 Excluded pages : 0x00000000003db0d8 Pages filled with zero : 0x000000000000b725 Cache pages : 0x0000000000001e34 Cache pages + private : 0x0000000000000001 User process data pages : 0x0000000000066e41 Free pages : 0x0000000000366d3d Hwpoison pages : 0x0000000000000000 Remaining pages : 0x000000000001b439 (The number of pages is reduced to 2%.) Memory Hole : 0x0000000000039aef -------------------------------------------------- Total pages : 0x0000000000430000 kdump: saving vmcore complete And the two dumpfile sizes from above are relatively close: $ du -sh 127.0.0.1-2015.01.14-11:37:50/vmcore 444M 127.0.0.1-2015.01.14-11:37:50/vmcore # du -sh 127.0.0.1-2015.01.14-15:36:13/vmcore 422M 127.0.0.1-2015.01.14-15:36:13/vmcore # Using lzo compression (makedumpfile -l), the compression is much more effective, but the results are still the same: memfill 0: kdump: dump target is /dev/mapper/rhel_dell--per410--01-root kdump: saving to /sysroot//var/crash/127.0.0.1-2015.01.14-15:51:59/ kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore STEP [Excluding unnecessary pages] : 0.115406 seconds STEP [Excluding unnecessary pages] : 0.115366 seconds STEP [Copying data ] : 1.130029 seconds STEP [Copying data ] : 0.034387 seconds Original pages : 0x00000000003f6511 Excluded pages : 0x00000000003da16b Pages filled with zero : 0x000000000000af6c Cache pages : 0x0000000000001e3a Cache pages + private : 0x0000000000000001 User process data pages : 0x00000000000670c7 Free pages : 0x00000000003662fd Hwpoison pages : 0x0000000000000000 Remaining pages : 0x000000000001c3a6 (The number of pages is reduced to 2%.) Memory Hole : 0x0000000000039aef -------------------------------------------------- Total pages : 0x0000000000430000 kdump: saving vmcore complete memfill 1: kdump: dump target is /dev/mapper/rhel_dell--per410--01-root kdump: saving to /sysroot//var/crash/127.0.0.1-2015.01.14-15:57:57/ kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore STEP [Excluding unnecessary pages] : 0.114234 seconds STEP [Excluding unnecessary pages] : 0.114295 seconds STEP [Copying data ] : 1.061543 seconds STEP [Copying data ] : 0.036045 seconds Original pages : 0x00000000003f6511 Excluded pages : 0x00000000003daf44 Pages filled with zero : 0x000000000000b6f7 Cache pages : 0x0000000000001e2d Cache pages + private : 0x0000000000000042 User process data pages : 0x0000000000067295 Free pages : 0x0000000000366749 Hwpoison pages : 0x0000000000000000 Remaining pages : 0x000000000001b5cd (The number of pages is reduced to 2%.) Memory Hole : 0x0000000000039aef -------------------------------------------------- Total pages : 0x0000000000430000 kdump: saving vmcore complete Note that the dumpfile size is reduced to about a quarter of the size of the zlib dumpfiles: # du -sh 127.0.0.1-2015.01.14-15:51:59/vmcore 127.0.0.1-2015.01.14-15:57:57/vmcore 98M 127.0.0.1-2015.01.14-15:51:59/vmcore 95M 127.0.0.1-2015.01.14-15:57:57/vmcore # I checked the makedumpfile sources, and see that user-space, page cache, and free pages will be checked for and filtered first. Only then are zero-filled pages checked. So the zero-fill check should never even "see" the memfill 0 or 1 memory pages, because they would be recognized as user-space pages first. Furthermore, given that the memfill program fills 1610612736 bytes (0x60000 pages) with 0/1, the dumpfile statistics should show at least that many user process pages. And in my tests above, the user process page counts were as expected, regardless whether they were filled with 0 or 1: User process data pages : 0x000000000006749e User process data pages : 0x0000000000066e41 User process data pages : 0x00000000000670c7 User process data pages : 0x0000000000067295 The strange thing about your results for memfill 0 is that you see 0x615e2 zero-filled pages, and only 0x30c4 user space pages, which doesn't make sense: > ---------- messages for filled with 0 case ---------- > kdump: saving vmcore > STEP [Excluding unnecessary pages] : 0.017042 seconds > STEP [Excluding unnecessary pages] : 0.011780 seconds > STEP [Copying data ] : 1.856561 seconds > > Original pages : 0x00000000000711bf > Excluded pages : 0x0000000000068d16 > Pages filled with zero : 0x00000000000615e2 > Cache pages : 0x00000000000008b6 > Cache pages + private : 0x0000000000000000 > User process data pages : 0x00000000000030c4 > Free pages : 0x0000000000003dba > Hwpoison pages : 0x0000000000000000 > Remaining pages : 0x00000000000084a9 > (The number of pages is reduced to 7%.) > Memory Hole : 0x000000000000ee41 It almost looks like the user-space pages were not recognized as such, and were subsequently recognized as zero-filled pages? The same thing goes for your memfill 1 case, because again, why is the user space page count so low?: > ---------- messages for filled with 1 case ---------- > kdump: saving vmcore > STEP [Excluding unnecessary pages] : 0.014904 seconds > STEP [Excluding unnecessary pages] : 0.013190 seconds > STEP [Copying data ] : 20.808219 seconds > > Original pages : 0x00000000000711bf > Excluded pages : 0x0000000000009959 > Pages filled with zero : 0x0000000000001d8b > Cache pages : 0x0000000000000b35 > Cache pages + private : 0x0000000000000000 > User process data pages : 0x000000000000360d > Free pages : 0x0000000000003a8c > Hwpoison pages : 0x0000000000000000 > Remaining pages : 0x0000000000067866 > (The number of pages is reduced to 91%.) > Memory Hole : 0x000000000000ee41 > -------------------------------------------------- > Total pages : 0x0000000000080000 > > kdump: saving vmcore complete So it appears that all of your memfill pages were captured in both dumpfiles instead of being filtered as user space pages. And given that, it makes sense the the "memfill 1" dump would be much larger in size, because the 0x615e2 zero-fill pages all share/point-to a single zero-filled dumpfile page, whereas all the "memfill 1" pages would each have their own compressed page in the dumpfile. Thank you for testing. I confirmed that updating kexec-tools package to 2.0.7-13.el7 fixes this problem in RHEL 7. Note that below results used default configuration (i.e. size difference is smaller than #1 because compression option is enabled). kexec-tools-2.0.4-32.el7_0.5.x86_64.rpm 33720 /var/crash/127.0.0.1-2015.01.15-10:49:03 (filled with 0 case) 48812 /var/crash/127.0.0.1-2015.01.15-10:50:21 (filled with 1 case) kexec-tools-2.0.7-13.el7.x86_64.rpm 25288 /var/crash/127.0.0.1-2015.01.15-10:58:48 (filled with 0 case) 25056 /var/crash/127.0.0.1-2015.01.15-10:59:45 (filled with 1 case) kexec-tools-2.0.0-280.el6.x86_64.rpm 24096 /var/crash/127.0.0.1-2015-01-15-11:32:25 (filled with 0 case) 38196 /var/crash/127.0.0.1-2015-01-15-11:33:23 (filled with 1 case) Therefore, I'd like to wait for updated kexec-tools package in RHEL 6. (Would you change product selection from RHEL 7 to RHEL 6?) > Therefore, I'd like to wait for updated kexec-tools package in RHEL 6. > (Would you change product selection from RHEL 7 to RHEL 6?) As I mentioned in comment #1, I am not the makedumpfile maintainer and therefore I prefer not to get in the way. I also prefer not get involved in bugzilla flag/version modifications. I would suggest closing this bugzilla and opening a new RHEL6 bugzilla, and include the updated details that you see on a RHEL6 system. That being said, I also tried this test on a RHEL6 machine running 2.6.32-504.el6 along with kexec-tools-2.0.0-280.el6, and I see something similar: 66M /var/crash/127.0.0.1-2015-01-15-11:11:36/vmcore (memfill 0) 81M /var/crash/127.0.0.1-2015-01-15-11:29:50/vmcore (memfill 1) The size difference is fairly trivial, but the "memfill 1" dumpfile is almost 25% larger, which is kind of interesting. However, what is more interesting is that in RHEL6, there is the possibility of user pages getting written to the dumpfile when transparent hugepages are enabled. When that happens, I suspect that the user-page filtering does not work because of the page flags used to identify user pages are only seen in the first "head" page, and not seen in any of the "tail" pages. For example, using the memfill program, I can read many of the user pages in the "malloc" region. Here 7f10ee800000 and 7f10ee801000 are in that region -- I cannot read the first one, but I can read the second and subsequent pages in the transparent hugepage: crash> rd 7f10ee800000 rd: page excluded: user virtual address: 7f10ee800000 type: "64-bit UVADDR" crash> rd 7f10ee801000 7f10ee801000: 0101010101010101 ........ crash> rd 7f10ee802000 7f10ee802000: 0101010101010101 ........ crash> rd 7f10ee803000 7f10ee803000: 0101010101010101 ........ crash> Makedumpfile recognize that 7f10ee800000 is a user page by the page flags, but which also has the "head" flag set, because it is the first 4K page in a 2MB transparent hugepage: crash> vtop 7f10ee800000 VIRTUAL PHYSICAL 7f10ee800000 115000000 PML: 11e1eb7f0 => 11ee1e067 PUD: 11ee1e218 => 11e7b2067 PMD: 11e7b2ba0 => 80000001150000e7 PAGE: 115000000 (2MB) PTE PHYSICAL FLAGS 80000001150000e7 115000000 (PRESENT|RW|USER|ACCESSED|DIRTY|PSE|NX) VMA START END FLAGS FILE ffff88011db84250 7f10ee6fa000 7f114e6fb000 100073 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0003c98000 115000000 ffff88011e118fd9 7f10ee800 1 c0000000104068 uptodate,lru,active,head,swapbacked crash> But the subsequent page(s) in the hugepage only have the "tail" flag set, so the RHEL6 version makedumpfile (makedumpfile-1.3.5 based) apparently didn't recognize it as a user-space page: crash> vtop 7f10ee801000 VIRTUAL PHYSICAL 7f10ee801000 115001000 PML: 11e1eb7f0 => 11ee1e067 PUD: 11ee1e218 => 11e7b2067 PMD: 11e7b2ba0 => 80000001150000e7 PAGE: 115000000 (2MB) PTE PHYSICAL FLAGS 80000001150000e7 115000000 (PRESENT|RW|USER|ACCESSED|DIRTY|PSE|NX) VMA START END FLAGS FILE ffff88011db84250 7f10ee6fa000 7f114e6fb000 100073 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0003c98038 115001000 0 7f33ae502 0 c0000000008000 tail crash> Now, I see the same kernel page-flag behavior in RHEL7, so it would seem that the version of makedumpfile used in RHEL7 (makedumpfile-1.5.6 based) has support to recognize "tail" pages of transparent hugepages. But a quick check of the sources shows that RHEL6 also seems to have head/tail recognition code in place as well, so I don't know exactly why it's not recognizing the tail pages. Maybe there was a patch to that code area that was only put in RHEL7? I have no idea. In any case, I leave that to the makedumpfile maintainer to check... Correction: with respect to my speculation w/respect to page.flags, as it turns out, they are not even used for filtering user pages. In both RHEL6 and RHEL7 versions of makedumpfile, the user page check does not look at the page.flags, but rather the contents of the page.mapping address: /* * Exclude the data page of the user process. */ else if ((info->dump_level & DL_EXCLUDE_USER_DATA) && isAnon(mapping)) { if (clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle)) pfn_user++; } And in both RHEL6 and RHEL7, isAnon() is identical: static inline int isAnon(unsigned long mapping) { return ((unsigned long)mapping & PAGE_MAPPING_ANON) != 0; } where PAGE_MAPPING_ANON is the same as the kernel: #define PAGE_MAPPING_ANON (1) where the 1-bit in the page.mapping address is "borrowed" for use as a flag. So looking at the RHEL6 examples in my last comment, you can see the 1-bit set in the "MAPPING" address of the first 4K page of the transparent hugepage: PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0003c98000 115000000 ffff88011e118fd9 7f10ee800 1 c0000000104068 uptodate,lru,active,head,swapbacked But the second and subsequent 4K pages in the hugepage have NULL page.mapping pointers: PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0003c98038 115001000 0 7f33ae502 0 c0000000008000 tail So upon first glance, that would seem to be the reason that the second and subsequent user pages were not filtered. However, in RHEL7, the page.mapping values are the similar in nature, for example, here for 7fe54f000000 (head) and 7fe54f001000 (tail): crash> vtop 7fe54f000000 VIRTUAL PHYSICAL 7fe54f000000 3f2800000 PML: 42293d7f8 => 41f9a0067 PUD: 41f9a0ca8 => 42288b067 PMD: 42288b3c0 => 80000003f28000e7 PAGE: 3f2800000 (2MB) PTE PHYSICAL FLAGS 80000003f28000e7 3f2800000 (PRESENT|RW|USER|ACCESSED|DIRTY|PSE|NX) VMA START END FLAGS FILE ffff88041f9646c0 7fe52a1e5000 7fe58a1e6000 100073 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea000fca0000 3f2800000 ffff88041f8f4f01 7fe54f000 1 2fffff00084068 uptodate,lru,active,head,swapbacked crash> vtop 7fe54f001000 VIRTUAL PHYSICAL 7fe54f001000 3f2801000 PML: 42293d7f8 => 41f9a0067 PUD: 41f9a0ca8 => 42288b067 PMD: 42288b3c0 => 80000003f28000e7 PAGE: 3f2800000 (2MB) PTE PHYSICAL FLAGS 80000003f28000e7 3f2800000 (PRESENT|RW|USER|ACCESSED|DIRTY|PSE|NX) VMA START END FLAGS FILE ffff88041f9646c0 7fe52a1e5000 7fe58a1e6000 100073 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea000fca0040 3f2801000 0 0 0 2fffff00008000 tail crash> But the second and subsequent pages gets filtered in RHEL7: crash> rd 7fe54f000000 rd: page excluded: user virtual address: 7fe54f000000 type: "64-bit UVADDR" crash> rd 7fe54f001000 rd: page excluded: user virtual address: 7fe54f001000 type: "64-bit UVADDR" crash> rd 7fe54f002000 rd: page excluded: user virtual address: 7fe54f002000 type: "64-bit UVADDR" crash> rd 7fe54f003000 rd: page excluded: user virtual address: 7fe54f003000 type: "64-bit UVADDR" crash> So I don't even understand how it works in RHEL7? ;-(
> So I don't even understand how it works in RHEL7? ;-(
As it turns out, it does not work in RHEL7 either...
For a simplified test, I did kdump with core_collector set to "cp" to
create a copy of /proc/vmcore. Then I tested makedumpfile with the
vmcore copy. And as it turns out, it is possible that user pages
may not get filtered when the user data area is assigned a transparent
hugepage. Here's a RHEL7 example:
crash> sys | grep RELEASE
RELEASE: 3.10.0-221.el7.x86_64
crash> rd -u 7fcc14c00000
rd: page excluded: user virtual address: 7fcc14c00000 type: "64-bit UVADDR"
crash> rd -u 7fcc14c01000
7fcc14c01000: 0101010101010101 ........
crash> vtop 7fcc14c00000
VIRTUAL PHYSICAL
7fcc14c00000 40fc00000
PML: 41d9ee7f8 => 4128b4067
PUD: 4128b4980 => 423ac0067
PMD: 423ac0530 => 800000040fc000e7
PAGE: 40fc00000 (2MB)
PTE PHYSICAL FLAGS
800000040fc000e7 40fc00000 (PRESENT|RW|USER|ACCESSED|DIRTY|PSE|NX)
VMA START END FLAGS FILE
ffff880412ae9e60 7fcc14b2a000 7fcc74b2b000 100073
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea00103f0000 40fc00000 ffff8800cab0d581 7fcc14c00 1 2fffff00084068 uptodate,lru,active,head,swapbacked
crash> vtop 7fcc14c01000
VIRTUAL PHYSICAL
7fcc14c01000 40fc01000
PML: 41d9ee7f8 => 4128b4067
PUD: 4128b4980 => 423ac0067
PMD: 423ac0530 => 800000040fc000e7
PAGE: 40fc00000 (2MB)
PTE PHYSICAL FLAGS
800000040fc000e7 40fc00000 (PRESENT|RW|USER|ACCESSED|DIRTY|PSE|NX)
VMA START END FLAGS FILE
ffff880412ae9e60 7fcc14b2a000 7fcc74b2b000 100073
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea00103f0040 40fc01000 0 2 0 2fffff00008000 tail
crash>
In theory, this issue has been addressed in the upstream version of
makedumpfile:
commit e8b4f93b3260defe86f5e13ca7536c07f2e32914
Author: Atsushi Kumagai <kumagai-atsushi.nec.co.jp>
Date: Thu Aug 21 08:55:54 2014 +0900
[PATCH v4] Exclude unnecessary hugepages.
There are 2 types of hugepages in the kernel, the both should be
excluded as user pages.
1. Transparent huge pages (THP)
All the pages are anonymous pages (at least for now), so we should
just get how many pages are in the corresponding hugepage.
It can be gotten from the page->lru.prev of the second page in the
hugepage.
2. Hugetlbfs pages
The pages aren't anonymous pages but kind of user pages, we should
exclude also these pages in any way.
Luckily, it's possible to detect these pages by looking the
page->lru.next of the second page in the hugepage. This idea came
from the kernel's PageHuge().
The number of pages can be gotten in the same way as THP.
Changelog:
v4:
- Cleaned up according to Petr's and Baoquan's comments.
v3:
- Cleaned up according to Petr's comments.
- Fix misdetection of hugetlb pages.
v2:
- Rebased to "Generic multi-page exclusion".
Signed-off-by: Atsushi Kumagai <kumagai-atsushi.nec.co.jp>
> In theory, this issue has been addressed in the upstream version of > makedumpfile: > > commit e8b4f93b3260defe86f5e13ca7536c07f2e32914 > Author: Atsushi Kumagai <kumagai-atsushi.nec.co.jp> > Date: Thu Aug 21 08:55:54 2014 +0900 FYI, I built and tested the upstream version of makedumpfile from git://git.code.sf.net/p/makedumpfile/code, and verified that all 4k pages in a transparent hugepage are filtered. And the git commit above is included in the makedumpfile-1.5.7 release at http://sourceforge.net/projects/makedumpfile/files/makedumpfile/1.5.7 The most recent RHEL7 version of kexec-tools is kexec-tools-2.0.7-15.el7, which was built on 1/13/15, has been updated to include makedumpfile-1.5.7. So this issue will be fixed in the RHEL7.1 kexec-tools errata. (In reply to Dave Anderson from comment #9) > The most recent RHEL7 version of kexec-tools is kexec-tools-2.0.7-15.el7, > which > was built on 1/13/15, has been updated to include makedumpfile-1.5.7. So > this issue will be fixed in the RHEL7.1 kexec-tools errata. I see. This issue will be fixed in the RHEL7.1 GA release. Thank you for your time. kexec-tools-2.0.4-32.el7_0.5.x86_64.rpm includes makedumpfile: version 1.5.4 (released on 3 Jul 2013) kexec-tools-2.0.7-13.el7.x86_64.rpm includes makedumpfile: version 1.5.7 (released on 18 Sep 2014) Now, I'd like to wait for a fix for RHEL 6. Should I open a new entry? > Now, I'd like to wait for a fix for RHEL 6. Should I open a new entry?
Yes.
(In reply to Dave Anderson from comment #11) > > Now, I'd like to wait for a fix for RHEL 6. Should I open a new entry? > > Yes. Hi, everyone. This issue is same as the bz1068674. And we plan to solve it in the rhel6.7. This is huge page filtering bug and has been fixed in rhel7.1. So close it as CURRENTRELEASE. Please add comment or reopen it if any concern. |