591784 – RHEL 6 x86_64 beta VM's don't boot correctly in RHEL 6 x86_64 beta

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 591784 - RHEL 6 x86_64 beta VM's don't boot correctly in RHEL 6 x86_64 beta

Summary: RHEL 6 x86_64 beta VM's don't boot correctly in RHEL 6 x86_64 beta

Keywords:
Status:	CLOSED DUPLICATE of bug 566261
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.0
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Karen Noel
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	601782 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-05-13 06:38 UTC by Justin Clift
Modified:	2013-01-09 11:14 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-06-07 15:34:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Showing hang during startup. (19.95 KB, image/png) 2010-05-13 06:39 UTC, Justin Clift	no flags	Details
top showing cpu pegged. (117.94 KB, image/png) 2010-05-13 06:40 UTC, Justin Clift	no flags	Details
qemu log file. (2.64 KB, text/plain) 2010-05-13 06:40 UTC, Justin Clift	no flags	Details
All of /var/log/messages from when the VM was rebooted. (1.87 KB, text/plain) 2010-05-13 06:41 UTC, Justin Clift	no flags	Details
Graphical console output while serial console redirection is in place. (34.76 KB, image/png) 2010-05-14 16:58 UTC, Justin Clift	no flags	Details
Serial console log file (15.76 KB, text/plain) 2010-05-14 16:58 UTC, Justin Clift	no flags	Details
View All

Description Justin Clift 2010-05-13 06:38:45 UTC

Description of problem:

On servers running RHEL 6 beta x86_64, no installation of RHEL6 beta x86_64 as a virtual machine will boot successfully.

The installation of virtual machines is flawless, however no subsequent boot (including the first boot) makes it through to completion.

In every case, the boot hangs shortly into the boot process.  Sometimes pegging a cpu at 100%, sometimes not at all.  (inconsistent)

RHEL 5.4 x86_64 VM's on the same hosts work fine.

Screenshots attached showing boot failures.  qemu log file attached, and the portion of /var/log/messages from when the vm was booted.


Version-Release number of selected component (if applicable):

$ rpm -qa | grep qemu
qemu-kvm-tools-0.12.1.2-2.17.el6.x86_64
qemu-kvm-0.12.1.2-2.17.el6.x86_64
gpxe-roms-qemu-0.9.7-6.2.el6.noarch
qemu-img-0.12.1.2-2.17.el6.x86_64
$ rpm -qa | grep libvirt
libvirt-cim-0.5.8-1.el6.x86_64
fence-virtd-libvirt-0.2.1-3.el6.x86_64
libvirt-qpid-0.2.17-7.el6.x86_64
libvirt-java-0.4.1-1.el6.noarch
libvirt-client-0.7.6-2.el6.x86_64
libvirt-python-0.7.6-2.el6.x86_64
libvirt-0.7.6-2.el6.x86_64
$


How reproducible:

Every time (unfortunately)


Steps to Reproduce:
1. Install RHEL 6 beta x86_64 as a virtual machine through virt-manager (GUI).
   Any installation type will do (ie minimal).
2. Reboot as per normal at the end of installation, optionally press escape during the startup to view the startup log.
3. Hang occurs here during startup.

  
Actual results:

Hang during bootup process.


Expected results:

RHEL 6 VM's should start and function normally.


Additional info:

When installing the VM, the OS type was set to "Linux", and the OS Version was set to "Red Hat Enterprise Linux 6".

Using different types of backend storage for the VM makes no difference (ie local disk, iSCSI, network block disk, etc).

Comment 1 Justin Clift 2010-05-13 06:39:53 UTC

Created attachment 413644 [details]
Showing hang during startup.

Comment 2 Justin Clift 2010-05-13 06:40:21 UTC

Created attachment 413645 [details]
top showing cpu pegged.

Comment 3 Justin Clift 2010-05-13 06:40:51 UTC

Created attachment 413646 [details]
qemu log file.

Comment 4 Justin Clift 2010-05-13 06:41:42 UTC

Created attachment 413647 [details]
All of /var/log/messages from when the VM was rebooted.

Comment 5 Justin Clift 2010-05-13 06:43:00 UTC

As a thought, I'm open to further suggestions on how to collect relevant info and log messages.

The VM's in question don't get far enough into the boot process to write to /var/log/messages on the VM disk, so no useful information is there.  (I checked, just in case)

Comment 7 Justin Clift 2010-05-13 06:52:44 UTC

Useful additional info.  When starting with the kernel option "init=/bin/bash"
to bypass the init scripts, the kernel loads fine and gives a working bash
prompt at the appropriate place.

Looks like the cause is somewhere in the init script chain.

Comment 8 RHEL Program Management 2010-05-13 07:59:37 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 9 Bill Nottingham 2010-05-13 14:38:38 UTC

What happens if you remove 'rhgb' and/or 'quiet' from the boot arguments? You might also try attaching a serial console to the virtual machine and getting kernel messages.

Comment 10 Justin Clift 2010-05-14 16:57:14 UTC

Thanks Bill.  Using the serial console allows more to be seen, but there's no smoking gun.

Bits of interest:

 + Using init=/bin/bash and dropping through directly works every time.
   Not useful for much other than enabling/disabling scripts.

 + Disabling *all* of the init service scripts also allows the boot to
   complete every time on a "Minimal" installation, however even that
   doesn't work if done with a "Desktop" installation.

After noticing that disabling all the init scripts on a Minimal installation allowed the boot to get to the end and function, I suspected the problem to be one of the init scripts.

So I then went through the process of enabling them 1 at a time to see which caused it.  None of them individually. :(

Leaving them all on, then disabling only "udev-post", "lvm2-monitor" and "network" sometimes allows a vm to function.  Even that is inconsistent though, with vm's still hanging roughly 1/3 of the time.

This is all on a server that's running RHEL 5.4 vm's with no issue, so no idea what the cause of this really is at this stage.

Attaching:

  + A screenshot of the latest interesting error message on a vm console during boot.  This is from a "Desktop" installation, with all of the init scripts disabled.
  + The complete serial console output from the same boot, showing it getting up to the same point.

Maybe there's a smoking gun in the serial console output I'm not seeing?

Comment 11 Justin Clift 2010-05-14 16:58:01 UTC

Created attachment 414110 [details]
Graphical console output while serial console redirection is in place.

Comment 12 Justin Clift 2010-05-14 16:58:51 UTC

Created attachment 414111 [details]
Serial console log file

Comment 13 Justin Clift 2010-05-15 02:36:51 UTC

Avi, would having remote root access to these boxes (via ssh) help?

Comment 14 Avi Kivity 2010-05-16 15:30:12 UTC

Yes please.

Comment 15 Justin Clift 2010-05-17 14:33:37 UTC

Thanks Avi.  Details for remote login have just been emailed to you. :)

Comment 16 Eric Hagberg 2010-05-17 20:32:03 UTC

I saw something like this, and it appeared to be related to interaction between the virtio_balloon driver and kvm. After that driver loaded (via start_udev in rc.sysinit) /proc/meminfo showed about 170Mb instead of the 4Gb for MemTotal.

If I put the virtio_balloon driver in the blacklist config file (in the guest vm, after booting with "init=/bin/bash") so it didn't load, then all was well.

Comment 17 Justin Clift 2010-05-18 01:17:09 UTC

Thanks Eric, initial cursory testing of black listing the virtio_balloon driver looks promising.

Another symptom that had been occurring was that *sometimes* a VM would show Out Of Memory (OOM) error during the init, with processes being automatically killed free up ram.  (and strange subsequent errors in the boot log as a consequence of that)

When manually entering (via init=/bin/bash) a VM showing those symptoms, top display generally displayed just under 90MB of RAM.

Just tried fresh installations of RHEL 6 vm's here, and with the virtio_balloon driver black listed after the install (prior to reboot) things worked perfectly.

Then going into the same VM's and removing the black list entry caused them to peg the CPU at 100% and hang during boot every time.

I'll test this in more depth today.

Comment 18 Justin Clift 2010-05-18 16:42:35 UTC

I've run through the install and boot process with just under a hundred individual VM's today, mostly using kickstart, and blacklisting virtio_balloon is definitely the make-or-break thing here.

With virtio_balloon still being loaded, they *all* fail at some point during boot or shortly afterwards.

With virtio_balloon blacklisted, no problems are encountered (related to this bug anyway).

Comment 19 Mikolaj Kucharski 2010-05-27 00:50:38 UTC

I experience exactly the same problem with Fedora 12, Fedora 13 and Ubuntu 10.04 as a guest. Is there anyway to disable virtio balloon in qemu-kvm via libvirt (in xml)?

host machine: redhat-release-6-6.0.0.24.el6.x86_64

Comment 20 Justin Clift 2010-05-27 03:41:05 UTC

Hi Mikolaj, are you using kickstart scripts for building the VM's?

I haven't looked into using libvirt to disable the balloon driver, but adding this %post installation snippet to kickstart scripts for building VMs works here:

  # Post installation script
  %post
  echo "blacklist virtio_balloon" >> /etc/modprobe.d/blacklist.conf
  %end

Hope that helps. :)

Comment 21 Kai Meyer 2010-05-27 15:17:10 UTC

I can echo Justin's finding exactly. With virtio_balloon, the machine crashes as soon as the module is loaded. With out it, the machines install and boot flawlessly.

I wonder if it is related to the issue I see in virsh-dump where somebody is forgetting to convert from MB->KB, or visa versa:

From libvirt:
[root@kvm0 ~]# virsh dumpxml pxe | grep -i mem
  <memory>524288</memory>
  <currentMemory>536870912</currentMemory>

From the configured XML file:
[root@kvm0 ~]# grep -i mem /etc/libvirt/qemu/pxe.xml 
  <memory>524288</memory>
  <currentMemory>524288</currentMemory>


Running VMs have the correct memory value, but times 1024. Virt-manager shows the exact same thing. I don't have enough memory to run a VM where the configured memory times 1024 is less than the amount of memory on the server, so I can't test to see if that would run properly. If libvirt thinks that the running memory is "Out of bounds" for the amount of physical memory available, I could see this causing the issue with the virtio_balloon kernel module.

Comment 22 Mikolaj Kucharski 2010-05-28 09:22:54 UTC

I see that in Ubuntu 10.04 (libvirt-bin 0.7.5-5ubuntu27) qemu-kvm is started without:

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3

parameter like it is on RHEL6 (libvirt-0.7.6-2.el6.x86_64). Justin thanks for the tip, I figured that out, but I would like to be able/know how to pass option:

-balloon none

to qemu-kvm via libvirt or to remove above ``-device virtio...'' default completely. Is this possible?

Comment 23 Daniel Berrangé 2010-06-07 15:34:09 UTC

As Eric notes in comment #16 the problem is with the balloon driver. There was an unexpected units change in the QEMU balloon monitor from kilobytes to bytes, and the corresponding change to libvirt missed the release.

Disabling the guest balloon driver is probably the easiest quick hack workaround that I know of
 
The real fix is tracked in bug 566261.

*** This bug has been marked as a duplicate of bug 566261 ***

Comment 24 Alex Williamson 2010-06-10 14:06:04 UTC

*** Bug 601782 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.