Bug 438799 - [5.2][xen] Dom0 kernel reset when starting kdump service
Summary: [5.2][xen] Dom0 kernel reset when starting kdump service
Keywords:
Status: CLOSED DUPLICATE of bug 433554
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.2
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-03-25 12:25 UTC by Qian Cai
Modified: 2008-03-25 12:44 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-03-25 12:44:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Qian Cai 2008-03-25 12:25:52 UTC
Description of problem:
When starting kdump service in Dom0, the following error occurred, and then the
box reset.

[root@ibm-alishan ~]# PCI-DMA: Out of SW-IOMMU space for 61440 bytes at device
0000:00:08.0
ata3: timeout waiting for ADMA IDLE, stat=0x400
ata3: timeout waiting for ADMA LEGACY, stat=0x400
ata3.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x2 frozen
ata3.00: cmd 61/e8:00:15:5b:c0/01:00:01:00:00/40 tag 0 ncq 249856 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/e8:08:2d:56:c0/01:00:01:00:00/40 tag 1 ncq 249856 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/08:10:fd:59:c0/00:00:01:00:00/40 tag 2 ncq 4096 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/10:18:05:5a:c0/01:00:01:00:00/40 tag 3 ncq 139264 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/a8:20:fd:5c:c0/00:00:01:00:00/40 tag 4 ncq 86016 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/e8:28:a5:5d:c0/01:00:01:00:00/40 tag 5 ncq 249856 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/e8:30:45:54:c0/01:00:01:00:00/40 tag 6 ncq 249856 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/e8:38:15:58:c0/01:00:01:00:00/40 tag 7 ncq 249856 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/e8:40:8d:5f:c0/01:00:01:00:00/40 tag 8 ncq 249856 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/30:48:75:61:c0/00:00:01:00:00/40 tag 9 ncq 24576 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: cmd 61/e8:50:a5:61:c0/01:00:01:00:00/40 tag 10 ncq 249856 out
         res 50/00:a8:fd:5c:c0/00:00:01:00:00/40 Emask 0x40 (internal error)
ata3.00: status: { DRDY }
ata3: timeout waiting for ADMA LEGACY clear and IDLE, stat=0x0
a

Sometimes, it is possible to see the same problem while doing normal IO
operation (i.e., installing a RPM package). Looks like a hardware problem.
However, if using a RHEL 5.1 versions of kernel and kernel-xen (2.6.18-53.el5),
at least, it is possible to successfully start kdump service without any
problem. The problem can be reproduced reliably (more than 5 times) for kernel
and kernel-xen 2.6.18-86.el5 version. I can see it failed during "mkdumprd"
stage of kdump init script,

/etc/init.d/kdump start ---> /sbin/mkdumprd

shell debug output from mkdumprd 

...
+ '[' 'touch /sysroot/fastboot' == -n ']'
+ echo 'touch /sysroot/fastboot'
+ emit 'echo Switching to new root and running init.'
+ NONL=
+ '[' 'echo Switching to new root and running init.' == -n ']'
+ echo 'echo Switching to new root and running init.'
+ emit 'exec switch_root /sysroot /sbin/init'
+ NONL=
+ '[' 'exec switch_root /sysroot /sbin/init' == -n ']'
+ echo 'exec switch_root /sysroot /sbin/init'
+ chmod +x /tmp/initrd.PP3494/init
+ cd /tmp/initrd.PP3494
+ findall .
+ echo nash-find .
+ cpio --quiet -c -o
+ /sbin/nash --force --quiet
+ '[' -n 1 ']'
+ gzip -9
PCI-DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:08.0
ata3: timeout waiting for ADMA IDLE, stat=0x1440
ata3: timeout waiting for ADMA LEGACY, stat=0x1440
+ rm -rf /tmp/initrd.PP3494 /tmp/initrd.img.Us3495
ata3.00: exception Emask 0x0 SAct 0x3fffe SErr 0x0 action 0x2 frozen
ata3.00: cmd 61/e8:08:cd:77:bf/01:00:01:00:00/40 tag 1 ncq 249856 out
         res 50/00:e8:6d:79:c2/00:01:01:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
...

Version-Release number of selected component (if applicable):
RHEL5.2-Server-20080320.0 (x86_64)
kernel-2.6.18-el5
kernel-xen-2.6.18-el5xen
kexec-tools-1.102pre-15.el5

How reproducible:
Always (at least more than 5 times) on ibm-alishan.rhts.boston.redhat.com

Steps to Reproduce:
1. configured Dom0 with crashkernel=128M@32M
2. service kdump start

Additional info:
Looks like this machine broken badly, Dom0 kernel can reset during the init stage,
Bringing up loopback interface:  [  OK  ]
Bringing up interface eth0:
Determining IP information for eth0...

Comment 1 Chris Lalancette 2008-03-25 12:44:22 UTC
This looks like it is probably a dup of 433554; we are working on it there, so
I'll close it here.  If this issue continues after we resolve that one, please
re-open this BZ.

Chris Lalancette

*** This bug has been marked as a duplicate of 433554 ***


Note You need to log in before you can comment on or make changes to this bug.