Bug 1342744 - crashdumping does not work with modern kernels.
Summary: crashdumping does not work with modern kernels.
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kexec-tools
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pratyush Anand
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-04 15:24 UTC by Oleg Drokin
Modified: 2016-07-13 04:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-13 04:24:22 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
rename all page.count references to page.refcount (2.27 KB, patch)
2016-06-30 21:22 UTC, Oleg Drokin
no flags Details | Diff

Description Oleg Drokin 2016-06-04 15:24:53 UTC
Description of problem:
Trying to crashdump a modern kernel, e.g. 4.7.0-rc1 only produces dmesg, but not the actual core file.

I think this started some revisions ago, but can only test with 4.7, so filing this as rawhide for now.

Dropping the dumprd into console and running makedumpfile manually I see this:

kdump:/# makedumpfile -l -d 31 /proc/vmcore /kdumproot/var/crash///192.168.10.19
2-2016-06-04-10:40:46/vmcore-incomplete
The kernel version is not supported.
The makedumpfile operation may be incomplete.
get_mem_map: Can't distinguish the memory type.

makedumpfile Failed.


It looks like upstream makedumpfile from sourceforge devel branch also lacks the support for newest kernels,
but I think what's really missing here is a fallback.
How about doing something like gzip of the /proc/kcore if the makedumpfile failed instead of just bailing out completely?
I had good results with 
gzip -c /proc/vmcore >/kdumproot/var/crash///192.168.10.192-2016-06-04-10:40:46/vmcore-incomplete.gz

Saves space and still usable after gunzip if really needed.

Comment 1 Oleg Drokin 2016-06-30 21:22:36 UTC
Created attachment 1174736 [details]
rename all page.count references to page.refcount

Apparently the problem is that the struct page's count member was renamed to refcount in recent kernels which makes makedumpfile highly confused.

I use this simple patch to fix the situation for me, but it does not really
work for older kernels then of course, so something better is needed.

Also the fallback if nothing worked to just dump the whole /proc/vmcore is likely still desirable to future-proof the system a bit.

Comment 2 Pratyush Anand 2016-07-01 06:54:30 UTC
makedumpfile upstream devel branch already has a patch to resolve it:

commit 2c21d4656e8d3c2af2b1e14809d076941ae69e96
Author: Vitaly Kuznetsov <vkuznets>
Date:   Fri Jun 17 18:41:26 2016 +0900

    [PATCH v2] Support _count -> _refcount rename in struct page
    
    _count member was renamed to _refcount in linux commit 0139aa7b7fa12
    ("mm: rename _count, field of the struct page, to _refcount") and this
    broke makedumpfile. The reason for making the change was to find all users
    accessing it directly and not through the recommended API. I tried
    suggesting to revert the change but failed, I see no other choice than to
    start supporting both _count and _refcount in makedumpfile.
    
    Signed-off-by: Vitaly Kuznetsov <vkuznets>

Comment 5 Dave Young 2016-07-13 04:24:22 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=14879746

Merged in rawhide branch.


Note You need to log in before you can comment on or make changes to this bug.