Bug 623181
Summary: | [RHEL6] Dumping core of RHEL6 i386 PV guest immediately after it is created got error | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Yufang Zhang <yuzhang> | ||||||
Component: | xen | Assignee: | Xen Maintainance List <xen-maint> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.6 | CC: | ddutile, drjones, xen-maint | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2010-08-13 09:18:22 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Created attachment 438204 [details]
xend.log
Add another additional info: Cannot reproduce this bug with RHEL6 snapshot8(20100722.0) i386 PV guest . Re-test this bug on a i386 host, didn't hit this problem. I can't reproduce this at all. I've tried matching everything described here (host versions HV/userspace, guest kernel version, memory config, etc.), rebooting before each try, but it always works for me. Am I missing an ingredient to get it to reproduce? Are you still able to reproduce it every time? What about with later guest kernels like -63? (In reply to comment #4) > I can't reproduce this at all. I've tried matching everything described here > (host versions HV/userspace, guest kernel version, memory config, etc.), > rebooting before each try, but it always works for me. > > Am I missing an ingredient to get it to reproduce? Are you still able to > reproduce it every time? What about with later guest kernels like -63? I think I know the origin of this bug: dump core for the VM intermediately after it is created. Using the following command, I can always reproduce this bug: # xm cr /tmp/xm-test.conf && xm dump-core vm1 Using config file "/tmp/xm-test.conf". Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst Started domain vm1 Dumping core of domain: vm1 ... Error: Failed to dump core: (1, 'Internal error', 'p2m_size < nr_pages -1 (0 < 1ffff') Usage: xm dump-core [-L|--live] [-C|--crash] <Domain> [Filename] Dump core for a specific domain. Waiting for a while and dump core for the VM, you may not hit the problem. The time you should wait for depends on the memory size of the VM. For example, for a 1024M VM, using the following command: # xm cr /tmp/xm-test.conf && sleep 1 && xm dump-core vm1 wouldn't hit the problem. But using the the following the command: # xm cr /tmp/xm-test.conf && sleep 0.5 && xm dump-core vm1 would hit the problem. However, this problem only exists for RHEL6 i386 guest on RHEL5.6 Xen x86_64 host. We didn't hit this problem for other guests. We didn't hit problem for previous version of RHEL6 i386 guest neither. I would test this problem with the new kernel. Still hit this problem after guest kernel to -63 Change the summary of this bug for clarification. Re-test this bug with reproducer in comment #5. Both rhel5 (i386 and x86_64) and rhel6(i386 and x86_64) guests could hit this problem with the "# xm cr && xm dump-core" command. Furthermore, previous versions of RHEL6 PV guests could also hit this problem with the producer. Based on comment 8 we now see this bug is addressing the inability to capture a core early in the boot. I think there are many reasons that wouldn't work. It might be an interesting exercise to try and find out the earliest point a core can be captured by setting xen up to auto-dump guests and then booting a kernel that panics early, but I don't see that exercise as being a high priority. Mainly because even if it's not as early as we might like it to be, we probably wouldn't be able to fix it. IMO the priority of the bug should be very low, and likely it will be closed as WONTFIX. Try to trigger a kernel panic at boot time to check if this problem have any impacts on dumping core automatically for the crashed guest. Tested with the following two scenarios: (1) Edit the guest grub to a wrong root filesystem (2) In a RHEL6 x86_64 PV guest, downgrading the kernel package to -59 which will trigger a crash on RHEL5.6 host. In either scenario, core file is generated automatically when the guests crashes. No error output is founded in xend.log. So it seems that such crashes are not too early so that we cannot hit the problem. Nice work. Thanks for those extra tests. I think that's satisfactory for dump-core. I'm closing this as NOTABUG. Just an added note: (1) hooking up the necessary hypervisor callback to dump when panic() invokes is relatively early in kernel boot, but it does take a wee-bit of time. (2) for hvm guests w/pv-hvm, that time is expanded until xen-platform-pci (virtual xenbus pci device) is configured, at which point, xen-dumps will occur when a panic() occurs. until then, you'll get same scenario as early pv kernels. |
Created attachment 438203 [details] xm dmesg logs Description of problem: When we try to dump core a RHEL 6 i386 PV guest with memory size >= 1G, xm dump-core command failed with error output. Version-Release number of selected component (if applicable): kernel-xen-devel-2.6.18-210.el5 xen-3.0.3-115.el5 xen-debuginfo-3.0.3-115.el5 xen-devel-3.0.3-115.el5 kernel-xen-2.6.18-210.el5 xen-libs-3.0.3-115.el5 RHEL 6 PV guest: snapshot 10 (20100807.0) kernel-2.6.32-59.1.el6.i386 How reproducible: Not always but quite easy to reproduce Steps to Reproduce: 1. Reboot the host to get fresh environment 2. Create a RHEL6 i386 PV guest with memory=1024 maxmem=2048 3. Try to dump core the PV guest via command xm dump-core Actual results: # xm dump-core vm1 Dumping core of domain: vm1 ... Error: Failed to dump core: (1, 'Internal error', 'p2m_size < nr_pages -1 (0 < 3ffff') Usage: xm dump-core [-L|--live] [-C|--crash] <Domain> [Filename] Dump core for a specific domain. Expected results: Could dump core successfully Additional info: (1) Cannot reproduce this bug when set guest memory size as 512M (2) RHEL6 x86_64 PV guest didn't hit this problem (3) Other PV guest(RHEL5, RHEL4) didn't hit this problem (4) When you do a dump core for a other PV guest and succeed, then you cannot reproduce this bug on the same host, unless you reboot the host and repeat Step1 to Step 3. (4) xm dmesg is attached. xend.log is uploaded soon.