Bug 591784
Summary: | RHEL 6 x86_64 beta VM's don't boot correctly in RHEL 6 x86_64 beta | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Justin Clift <justin> |
Component: | qemu-kvm | Assignee: | Karen Noel <knoel> |
Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | urgent | Docs Contact: | |
Priority: | low | ||
Version: | 6.0 | CC: | alex.williamson, berrange, hagberg, kai, mikolaj, notting, tudor.georgescu, virt-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-06-07 15:34:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Justin Clift
2010-05-13 06:38:45 UTC
Created attachment 413644 [details]
Showing hang during startup.
Created attachment 413645 [details]
top showing cpu pegged.
Created attachment 413646 [details]
qemu log file.
Created attachment 413647 [details]
All of /var/log/messages from when the VM was rebooted.
As a thought, I'm open to further suggestions on how to collect relevant info and log messages. The VM's in question don't get far enough into the boot process to write to /var/log/messages on the VM disk, so no useful information is there. (I checked, just in case) Useful additional info. When starting with the kernel option "init=/bin/bash" to bypass the init scripts, the kernel loads fine and gives a working bash prompt at the appropriate place. Looks like the cause is somewhere in the init script chain. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. What happens if you remove 'rhgb' and/or 'quiet' from the boot arguments? You might also try attaching a serial console to the virtual machine and getting kernel messages. Thanks Bill. Using the serial console allows more to be seen, but there's no smoking gun. Bits of interest: + Using init=/bin/bash and dropping through directly works every time. Not useful for much other than enabling/disabling scripts. + Disabling *all* of the init service scripts also allows the boot to complete every time on a "Minimal" installation, however even that doesn't work if done with a "Desktop" installation. After noticing that disabling all the init scripts on a Minimal installation allowed the boot to get to the end and function, I suspected the problem to be one of the init scripts. So I then went through the process of enabling them 1 at a time to see which caused it. None of them individually. :( Leaving them all on, then disabling only "udev-post", "lvm2-monitor" and "network" sometimes allows a vm to function. Even that is inconsistent though, with vm's still hanging roughly 1/3 of the time. This is all on a server that's running RHEL 5.4 vm's with no issue, so no idea what the cause of this really is at this stage. Attaching: + A screenshot of the latest interesting error message on a vm console during boot. This is from a "Desktop" installation, with all of the init scripts disabled. + The complete serial console output from the same boot, showing it getting up to the same point. Maybe there's a smoking gun in the serial console output I'm not seeing? Created attachment 414110 [details]
Graphical console output while serial console redirection is in place.
Created attachment 414111 [details]
Serial console log file
Avi, would having remote root access to these boxes (via ssh) help? Yes please. Thanks Avi. Details for remote login have just been emailed to you. :) I saw something like this, and it appeared to be related to interaction between the virtio_balloon driver and kvm. After that driver loaded (via start_udev in rc.sysinit) /proc/meminfo showed about 170Mb instead of the 4Gb for MemTotal. If I put the virtio_balloon driver in the blacklist config file (in the guest vm, after booting with "init=/bin/bash") so it didn't load, then all was well. Thanks Eric, initial cursory testing of black listing the virtio_balloon driver looks promising. Another symptom that had been occurring was that *sometimes* a VM would show Out Of Memory (OOM) error during the init, with processes being automatically killed free up ram. (and strange subsequent errors in the boot log as a consequence of that) When manually entering (via init=/bin/bash) a VM showing those symptoms, top display generally displayed just under 90MB of RAM. Just tried fresh installations of RHEL 6 vm's here, and with the virtio_balloon driver black listed after the install (prior to reboot) things worked perfectly. Then going into the same VM's and removing the black list entry caused them to peg the CPU at 100% and hang during boot every time. I'll test this in more depth today. I've run through the install and boot process with just under a hundred individual VM's today, mostly using kickstart, and blacklisting virtio_balloon is definitely the make-or-break thing here. With virtio_balloon still being loaded, they *all* fail at some point during boot or shortly afterwards. With virtio_balloon blacklisted, no problems are encountered (related to this bug anyway). I experience exactly the same problem with Fedora 12, Fedora 13 and Ubuntu 10.04 as a guest. Is there anyway to disable virtio balloon in qemu-kvm via libvirt (in xml)? host machine: redhat-release-6-6.0.0.24.el6.x86_64 Hi Mikolaj, are you using kickstart scripts for building the VM's? I haven't looked into using libvirt to disable the balloon driver, but adding this %post installation snippet to kickstart scripts for building VMs works here: # Post installation script %post echo "blacklist virtio_balloon" >> /etc/modprobe.d/blacklist.conf %end Hope that helps. :) I can echo Justin's finding exactly. With virtio_balloon, the machine crashes as soon as the module is loaded. With out it, the machines install and boot flawlessly. I wonder if it is related to the issue I see in virsh-dump where somebody is forgetting to convert from MB->KB, or visa versa: From libvirt: [root@kvm0 ~]# virsh dumpxml pxe | grep -i mem <memory>524288</memory> <currentMemory>536870912</currentMemory> From the configured XML file: [root@kvm0 ~]# grep -i mem /etc/libvirt/qemu/pxe.xml <memory>524288</memory> <currentMemory>524288</currentMemory> Running VMs have the correct memory value, but times 1024. Virt-manager shows the exact same thing. I don't have enough memory to run a VM where the configured memory times 1024 is less than the amount of memory on the server, so I can't test to see if that would run properly. If libvirt thinks that the running memory is "Out of bounds" for the amount of physical memory available, I could see this causing the issue with the virtio_balloon kernel module. I see that in Ubuntu 10.04 (libvirt-bin 0.7.5-5ubuntu27) qemu-kvm is started without: -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 parameter like it is on RHEL6 (libvirt-0.7.6-2.el6.x86_64). Justin thanks for the tip, I figured that out, but I would like to be able/know how to pass option: -balloon none to qemu-kvm via libvirt or to remove above ``-device virtio...'' default completely. Is this possible? As Eric notes in comment #16 the problem is with the balloon driver. There was an unexpected units change in the QEMU balloon monitor from kilobytes to bytes, and the corresponding change to libvirt missed the release. Disabling the guest balloon driver is probably the easiest quick hack workaround that I know of The real fix is tracked in bug 566261. *** This bug has been marked as a duplicate of bug 566261 *** *** Bug 601782 has been marked as a duplicate of this bug. *** |