Bug 691632

Summary:	readmem: Can't read the dump memory(/proc/vmcore). Cannot allocate memory
Product:	Red Hat Enterprise Linux 6	Reporter:	Chao Ye <cye>
Component:	kexec-tools	Assignee:	Cong Wang <amwang>
Status:	CLOSED ERRATA	QA Contact:	Kernel Dump QE <kernel-dump-qe>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	6.1	CC:	czhang, gli, gouyang, phan, rkhan
Target Milestone:	rc	Keywords:	Regression
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	kexec-tools-2_0_0-179_el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-05-19 14:16:14 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Chao Ye 2011-03-29 05:05:09 UTC

Description of problem:
Kdump failed to save vmcore at 100%, from console log:
==========================================================
mdadm: No arrays found in config file or automatically 
Free memory/Total memory (free %): 60404 / 113108 ( 53.4038 ) 
Scanning logical volumes 
  Reading all physical volumes.  This may take a while... 
  Found volume group "vg_amddinar03" using metadata type lvm2 
Activating logical volumes 
  3 logical volume(s) in volume group "vg_amddinar03" now active 
Free memory/Total memory (free %): 59420 / 113108 ( 52.5339 ) 
Saving to the local filesystem UUID=b970a0e6-6816-4fc1-8303-50daedaa22e7 
e2fsck 1.41.12 (17-May-2010) 
/dev/mapper/vg_amddinar03-lv_root: recovering journal 
Clearing orphaned inode 1703962 (uid=0, gid=0, mode=0100600, size=4096) 
Clearing orphaned inode 1703947 (uid=0, gid=0, mode=0100600, size=4096) 
Clearing orphaned inode 1703946 (uid=0, gid=0, mode=0100600, size=4096) 
Clearing orphaned inode 1703945 (uid=0, gid=0, mode=0100600, size=4096) 
/dev/mapper/vg_amddinar03-lv_root: clean, 99587/3276800 files, 1076613/13107200 blocks 
EXT4-fs (dm-0): mounted filesystem with ordered data mode 
Free memory/Total memory (free %): 57428 / 113108 ( 50.7727 ) 
Loading SELINUX policy 
type=1404 audit(1301367761.578:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 
type=1403 audit(1301367762.349:3): policy loaded auid=4294967295 ses=4294967295 
 Checking for memory holes          : [  0 %]  Checking for memory holes          : [100 %]  Excluding unnecessary pages        : [ 29 %]  Excluding unnecessary pages        : [ 85 %]  Excluding unnecessary pages        : [100 %] readmem: Can't read the dump memory(/proc/vmcore). Cannot allocate memory 
readmem: type_addr: 0, addr:ffff88000004b4c0, size:4 
_exclude_free_page: Can't get nr_zones. 
create_2nd_bitmap: Can't exclude unnecessary pages. 
dropping to initramfs shell 
exiting this shell will reboot your system 
/ #

Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-174.el6

How reproducible:
found on amd-dinar-03.lab.bos.redhat.com

Steps to Reproduce:
1.Install RHEL6.1-20110323.1
2.Trigger crash via kernel panic
3.
  
Actual results:
failed to save vmcore

Expected results:
kdump works fine

Additional info:
https://beaker.engineering.redhat.com/recipes/138291

Comment 3 Cong Wang 2011-03-29 05:26:07 UTC

I bet this is due to:

        vaddr = ioremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE);
        if (!vaddr)
                return -ENOMEM;

It is really confusing that it returns ENOMEM for read(2)...

Did you see any kernel messages? If not, try to add "ignore_loglevel" to the second kernel.

Comment 5 Cong Wang 2011-03-29 05:42:36 UTC

Might be related with:

commit 23da4563659382c6da6ac1fdf70099fe12010894
Author: Neil Horman <nhorman>
Date:   Thu Oct 14 18:54:30 2010 -0400

    [kdump] kexec: accelerate vmcore copies by marking oldmem in /proc/vmcore as cached

which is merged in kernel-2.6.32-85.el6~35, so does kernel-2.6.32-84.el6 have this problem on that machine?

Comment 6 Cong Wang 2011-03-29 06:18:08 UTC

readmem: type_addr: 0, addr:ffff880000040d40, size:4 

BIOS-provided physical RAM map: 
 BIOS-e820: 0000000000000100 - 00000000000a0000 (usable) 
 BIOS-e820: 0000000000100000 - 00000000cffa8000 (usable) 
...
user-defined physical RAM map: 
 user: 0000000000000000 - 0000000000001000 (reserved) 
 user: 0000000000001000 - 00000000000a0000 (usable) 
...
Seems 640K related...

Comment 9 Cong Wang 2011-04-01 02:32:41 UTC

*** Bug 690362 has been marked as a duplicate of this bug. ***

Comment 11 Chao Ye 2011-04-21 01:59:16 UTC

Tested on amd-dinar-03.lab.bos.redhat.com with latest build:
======================================================================
[root@amd-dinar-03 ~]# rpm -q kernel kexec-tools
kernel-2.6.32-131.0.1.el6.x86_64
kexec-tools-2.0.0-186.el6.x86_64
[root@amd-dinar-03 ~]# tail /etc/kdump.conf 
#kdump_post /var/crash/scripts/kdump-post.sh
#extra_bins /usr/bin/lftp
#disk_timeout 30
#extra_modules gfs2
#options modulename options
#default shell

ext4 /dev/mapper/vg_amddinar03-lv_root
core_collector makedumpfile -c --message-level 1 -d 31
default shell
[root@amd-dinar-03 ~]# service kdump restart
Stopping kdump:[  OK  ]
Detected change(s) the following file(s):
  
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-131.0.1.el6.x86_64kdump.img
Your running kernel is using more than 70% of the amount of space you reserved for kdump, you should consider increasing your crashkernel reservation[WARNING]
Starting kdump:[  OK  ]
[root@amd-dinar-03 ~]# insmod /mnt/tests/kernel/kdump/crash-crasher/crasher/crasher.ko
[root@amd-dinar-03 ~]# echo 0 > /proc/crasher
--------------------------------------------------------------------

Free memory/Total memory (free %): 187912 / 241096 ( 77.9407 )
Scanning logical volumes
  Reading all physical volumes.  This may take a while...
  Found volume group "vg_amddinar03" using metadata type lvm2
Activating logical volumes
  3 logical volume(s) in volume group "vg_amddinar03" now active
Free memory/Total memory (free %): 186948 / 241096 ( 77.5409 )
Saving to the local filesystem /dev/mapper/vg_amddinar03-lv_root
e2fsck 1.41.12 (17-May-2010)
/dev/mapper/vg_amddinar03-lv_root: recovering journal
Clearing orphaned inode 2895036 (uid=0, gid=0, mode=0100600, size=4096)
Clearing orphaned inode 2895000 (uid=0, gid=0, mode=0100600, size=4096)
Clearing orphaned inode 2894999 (uid=0, gid=0, mode=0100600, size=4096)
Clearing orphaned inode 2894990 (uid=0, gid=0, mode=0100600, size=4096)
/dev/mapper/vg_amddinar03-lv_root: clean, 122390/3276800 files, 8423377/13107200 blocks
EXT4-fs (dm-0): mounted filesystem with ordered data mode
Free memory/Total memory (free %): 185768 / 241096 ( 77.0515 )
Loading SELINUX policy
type=1404 audit(1303351095.630:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
type=1403 audit(1303351096.448:3): policy loaded auid=4294967295 ses=4294967295
Copying data                       : [100 %] 
Saving core complete
md: stopping all md devices.
sd 0:0:1:0: [sda] Synchronizing SCSI cache
Restarting system.
machine restart


Change status to VERIFIED

Comment 12 errata-xmlrpc 2011-05-19 14:16:14 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0736.html