Bug 819284

Summary: RHEL5 Xen: hypervisor conring_size too small when loglevel=all, missing logs
Product: Red Hat Enterprise Linux 5 Reporter: Pasi Karkkainen <pasik>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.8CC: drjones, leiwang, lersek, moli, qguan, qwan, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-09 06:52:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pasi Karkkainen 2012-05-06 09:29:06 UTC
Description of problem:

When using the following Xen hypervisor (xen.gz) boot cmdline options: "dom0_mem=2048M loglvl=all guest_loglvl=all iommu=1"

You end up missing lines in the Xen dmesg. The default conring buffer size is too small, and "xm dmesg" won't show all the messages from the beginning of the hypervisor boot because there's no room to store all the lines. This happens on systems with many CPUs and/or with an IOMMU.

Version-Release number of selected component (if applicable):
RHEL 5.8 / 2.6.18-308.1.1.el5.

How reproducible:
Always.

Steps to Reproduce:
1. Add the following options to grub.conf: "dom0_mem=2048M loglvl=all guest_loglvl=all iommu=1".
2. reboot the system.
3. check "xm dmesg" and notice a lot of messages missing from the beginning. You can compare the "xm dmesg" output to serial console output.
  
Actual results:
"xm dmesg" begins with the following lines:

301 base: 0xfed00000
(XEN) [VT-D]dmar.c:468: Host address width 40
(XEN) [VT-D]dmar.c:477: found ACPI_DMAR_DRHD
(XEN) [VT-D]dmar.c:336: dmaru->address = fed90000
(XEN) [VT-D]dmar.c:293: found IOAPIC: bdf = 0:1e.1
(XEN) [VT-D]dmar.c:293: found IOAPIC: bdf = 0:13.0
(XEN) [VT-D]dmar.c:345: found INCLUDE_ALL
(XEN) [VT-D]dmar.c:481: found ACPI_DMAR_RMRR
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1a.7
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1d.7
(XEN) [VT-D]dmar.c:481: found ACPI_DMAR_RMRR
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1a.0
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1a.1
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1a.2
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1d.0
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1d.1
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1d.2
(XEN) [VT-D]dmar.c:481: found ACPI_DMAR_RMRR
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1a.0
(XEN) [VT-D]dmar.c:481: found ACPI_DMAR_RMRR
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1a.1
(XEN) [VT-D]dmar.c:481: found ACPI_DMAR_RMRR
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1d.0
(XEN) [VT-D]dmar.c:481: found ACPI_DMAR_RMRR
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1d.1
(XEN) [VT-D]dmar.c:481: found ACPI_DMAR_RMRR
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1d.2
(XEN) [VT-D]dmar.c:481: found ACPI_DMAR_RMRR
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1a.7
(XEN) [VT-D]dmar.c:481: found ACPI_DMAR_RMRR
(XEN) [VT-D]dmar.c:287: found endpoint: bdf = 0:1d.7
(XEN) [VT-D]dmar.c:485: found ACPI_DMAR_ATSR
(XEN) [VT-D]dmar.c:274: found bridge: bdf = 0:1.0  sec = 1  sub = 1
(XEN) [VT-D]dmar.c:274: found bridge: bdf = 0:3.0  sec = 2  sub = 2
(XEN) [VT-D]dmar.c:274: found bridge: bdf = 0:7.0  sec = 3  sub = 5
(XEN) [VT-D]dmar.c:274: found bridge: bdf = 0:9.0  sec = 6  sub = 6
(XEN) [VT-D]dmar.c:274: found bridge: bdf = 0:a.0  sec = 7  sub = 7
(XEN) Intel VT-d has been enabled
...

So clearly the messages from the beginning of the boot are missing.


Expected results:
All the information should be shown, including the messages from the beginning of the boot.

Additional info:

This happens because the Xen conring_size is too small in RHEL5 Xen hypervisor. Upstream Xen has an option called "conring_size=" which can be used to set (grow) the conring buffer size and this problem doesn't happen.

Comment 1 Laszlo Ersek 2012-05-07 08:00:49 UTC
Generally whenever we want to do anything with the Xen dmesg, we capture it over the serial console (sometimes with "sync_console" on the hv command line), and ask customers to do the same, since that's the only "sure" way to save it (if there are boot problems eg.)

If you redirect the messages to "/var/log/xen/console/hypervisor.log" with setting XENCONSOLED_LOG_HYPERVISOR=yes in "/etc/sysconfig/xend", does the truncation still occur? (I would guess so; by the time xend starts in the boot process we must have lost the first messages.)

On the surface this seems to be an easy convenience backport, but a quick hg blame + grep identified the following patches:

http://xenbits.xensource.com/hg/xen-unstable.hg/rev/19543
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/20130
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/20133
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/20374
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/21038
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/21225

These patches appear a bit turbulent for functionality I'd expect to be "simple" and "convenience".

Xen team, thoughts?

Comment 2 Andrew Jones 2012-05-09 06:52:54 UTC
I thought about doubling the ring size in the HV once because it appears that the tools already support 32k. However, we dropped the idea since the benefit wasn't worth the risk. As Laszlo said, we can already capture all logs by using serial. I'm closing this as WONTFIX. It's too late in RHEL5's lifecycle to churn much code for minimal feature gain.