Red Hat Bugzilla – Bug 591784
RHEL 6 x86_64 beta VM's don't boot correctly in RHEL 6 x86_64 beta
Last modified: 2013-01-09 06:14:58 EST
Description of problem:
On servers running RHEL 6 beta x86_64, no installation of RHEL6 beta x86_64 as a virtual machine will boot successfully.
The installation of virtual machines is flawless, however no subsequent boot (including the first boot) makes it through to completion.
In every case, the boot hangs shortly into the boot process. Sometimes pegging a cpu at 100%, sometimes not at all. (inconsistent)
RHEL 5.4 x86_64 VM's on the same hosts work fine.
Screenshots attached showing boot failures. qemu log file attached, and the portion of /var/log/messages from when the vm was booted.
Version-Release number of selected component (if applicable):
$ rpm -qa | grep qemu
$ rpm -qa | grep libvirt
Every time (unfortunately)
Steps to Reproduce:
1. Install RHEL 6 beta x86_64 as a virtual machine through virt-manager (GUI).
Any installation type will do (ie minimal).
2. Reboot as per normal at the end of installation, optionally press escape during the startup to view the startup log.
3. Hang occurs here during startup.
Hang during bootup process.
RHEL 6 VM's should start and function normally.
When installing the VM, the OS type was set to "Linux", and the OS Version was set to "Red Hat Enterprise Linux 6".
Using different types of backend storage for the VM makes no difference (ie local disk, iSCSI, network block disk, etc).
Created attachment 413644 [details]
Showing hang during startup.
Created attachment 413645 [details]
top showing cpu pegged.
Created attachment 413646 [details]
qemu log file.
Created attachment 413647 [details]
All of /var/log/messages from when the VM was rebooted.
As a thought, I'm open to further suggestions on how to collect relevant info and log messages.
The VM's in question don't get far enough into the boot process to write to /var/log/messages on the VM disk, so no useful information is there. (I checked, just in case)
Useful additional info. When starting with the kernel option "init=/bin/bash"
to bypass the init scripts, the kernel loads fine and gives a working bash
prompt at the appropriate place.
Looks like the cause is somewhere in the init script chain.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release. Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release. This request is not yet committed for
What happens if you remove 'rhgb' and/or 'quiet' from the boot arguments? You might also try attaching a serial console to the virtual machine and getting kernel messages.
Thanks Bill. Using the serial console allows more to be seen, but there's no smoking gun.
Bits of interest:
+ Using init=/bin/bash and dropping through directly works every time.
Not useful for much other than enabling/disabling scripts.
+ Disabling *all* of the init service scripts also allows the boot to
complete every time on a "Minimal" installation, however even that
doesn't work if done with a "Desktop" installation.
After noticing that disabling all the init scripts on a Minimal installation allowed the boot to get to the end and function, I suspected the problem to be one of the init scripts.
So I then went through the process of enabling them 1 at a time to see which caused it. None of them individually. :(
Leaving them all on, then disabling only "udev-post", "lvm2-monitor" and "network" sometimes allows a vm to function. Even that is inconsistent though, with vm's still hanging roughly 1/3 of the time.
This is all on a server that's running RHEL 5.4 vm's with no issue, so no idea what the cause of this really is at this stage.
+ A screenshot of the latest interesting error message on a vm console during boot. This is from a "Desktop" installation, with all of the init scripts disabled.
+ The complete serial console output from the same boot, showing it getting up to the same point.
Maybe there's a smoking gun in the serial console output I'm not seeing?
Created attachment 414110 [details]
Graphical console output while serial console redirection is in place.
Created attachment 414111 [details]
Serial console log file
Avi, would having remote root access to these boxes (via ssh) help?
Thanks Avi. Details for remote login have just been emailed to you. :)
I saw something like this, and it appeared to be related to interaction between the virtio_balloon driver and kvm. After that driver loaded (via start_udev in rc.sysinit) /proc/meminfo showed about 170Mb instead of the 4Gb for MemTotal.
If I put the virtio_balloon driver in the blacklist config file (in the guest vm, after booting with "init=/bin/bash") so it didn't load, then all was well.
Thanks Eric, initial cursory testing of black listing the virtio_balloon driver looks promising.
Another symptom that had been occurring was that *sometimes* a VM would show Out Of Memory (OOM) error during the init, with processes being automatically killed free up ram. (and strange subsequent errors in the boot log as a consequence of that)
When manually entering (via init=/bin/bash) a VM showing those symptoms, top display generally displayed just under 90MB of RAM.
Just tried fresh installations of RHEL 6 vm's here, and with the virtio_balloon driver black listed after the install (prior to reboot) things worked perfectly.
Then going into the same VM's and removing the black list entry caused them to peg the CPU at 100% and hang during boot every time.
I'll test this in more depth today.
I've run through the install and boot process with just under a hundred individual VM's today, mostly using kickstart, and blacklisting virtio_balloon is definitely the make-or-break thing here.
With virtio_balloon still being loaded, they *all* fail at some point during boot or shortly afterwards.
With virtio_balloon blacklisted, no problems are encountered (related to this bug anyway).
I experience exactly the same problem with Fedora 12, Fedora 13 and Ubuntu 10.04 as a guest. Is there anyway to disable virtio balloon in qemu-kvm via libvirt (in xml)?
host machine: redhat-release-6-188.8.131.52.el6.x86_64
Hi Mikolaj, are you using kickstart scripts for building the VM's?
I haven't looked into using libvirt to disable the balloon driver, but adding this %post installation snippet to kickstart scripts for building VMs works here:
# Post installation script
echo "blacklist virtio_balloon" >> /etc/modprobe.d/blacklist.conf
Hope that helps. :)
I can echo Justin's finding exactly. With virtio_balloon, the machine crashes as soon as the module is loaded. With out it, the machines install and boot flawlessly.
I wonder if it is related to the issue I see in virsh-dump where somebody is forgetting to convert from MB->KB, or visa versa:
[root@kvm0 ~]# virsh dumpxml pxe | grep -i mem
From the configured XML file:
[root@kvm0 ~]# grep -i mem /etc/libvirt/qemu/pxe.xml
Running VMs have the correct memory value, but times 1024. Virt-manager shows the exact same thing. I don't have enough memory to run a VM where the configured memory times 1024 is less than the amount of memory on the server, so I can't test to see if that would run properly. If libvirt thinks that the running memory is "Out of bounds" for the amount of physical memory available, I could see this causing the issue with the virtio_balloon kernel module.
I see that in Ubuntu 10.04 (libvirt-bin 0.7.5-5ubuntu27) qemu-kvm is started without:
parameter like it is on RHEL6 (libvirt-0.7.6-2.el6.x86_64). Justin thanks for the tip, I figured that out, but I would like to be able/know how to pass option:
to qemu-kvm via libvirt or to remove above ``-device virtio...'' default completely. Is this possible?
As Eric notes in comment #16 the problem is with the balloon driver. There was an unexpected units change in the QEMU balloon monitor from kilobytes to bytes, and the corresponding change to libvirt missed the release.
Disabling the guest balloon driver is probably the easiest quick hack workaround that I know of
The real fix is tracked in bug 566261.
*** This bug has been marked as a duplicate of bug 566261 ***
*** Bug 601782 has been marked as a duplicate of this bug. ***