Hide Forgot
Description of problem: Apologies if this has been reported, but I'm struggling to find a similar case. I'm testing OpenStack components on both Ubuntu and RHEL/CentOS virtual machines and am finding that spinning up a qemu instance w/ -no-kvm flag on one of these RHEL/CentOS virtual machines results in the instance taking several minutes to boot. If I boot w/ the -nographic flag, I can see that the instance is blocked on "Booting from Hard Disk..." or "GRUB loading, please wait..." After waiting for some time the instance finally boots and is then usable. I am not having this issue when booting instances on Ubuntu 12.04. Version-Release number of selected component (if applicable): qemu-kvm-0.12.1.2-2.415.el6_5.3.x86_64 qemu-kvm-0.12.1.2-2.415.el6.x86_64 How reproducible: Always. Steps to Reproduce: 1. Boot instance w/ -no-kvm flag (/usr/libexec/qemu-kvm -no-kvm -drive file=cirros-0.3.1-x86_64-disk.img -nographic for example, though note that OpenStack nova generates a much more complicated qemu command) 2. 3. Actual results: Instance blocks for several minutes and then boots. Expected results: Boots immediately. Additional info: Downgrading to qemu-kvm-0.12.1.2-2.355.el6_4.9.x86_64 results in instances booting as expected. The issue appears to have been introduced after this package.
I forgot to mention -- qemu-kvm seems to peg the CPU while the instance is waiting to boot: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13932 root 20 0 372m 21m 4812 R 100.0 0.5 3:29.72 qemu-kvm Please let me know what additional information I can provide to help resolve this issue.
Matt, thanks for taking the time to enter a bug report with us. We appreciate the feedback and look to use reports such as this to guide our efforts at improving our products. That being said, we're not able to guarantee the timeliness or suitability of a resolution for issues entered here because this is not a mechanism for requesting support. If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain it receives the proper attention and prioritization to assure a timely resolution. For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto
Matt: -no-kvm might work, but it's not supported in RHEL[1]. I'll keep this bug open for a while as we try to figure out the reason for the behavior regression, but it's of low priority. 1. See https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Administration_Guide/ch19s02.html)
Hi, Matt Would you please post the complete "much more complicated qemu command"? As I just tried qemu-kvm-0.12.1.2-2.355.el6_4.9.x86_64 and qemu-kvm-0.12.1.2-2.415.el6.x86_64, both result in the same phenomenon: failed to boot up after a few minutes (10 minutes so far). My command line: /usr/libexec/qemu-kvm -cpu SandyBridge -M rhel6.4.0 -no-kvm -m 1G -smp 1,sockets=1,cores=1,threads=1 -name rhel6.5-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -drive file=/mnt/rhel6.5-64-qzhang.qcow2,if=none,id=drive-disk0,format=qcow2,cache=none,id=disk0 -device virtio-blk-pci,drive=drive-disk0,bus=pci.0,addr=0x5,scsi=off,id=disk-0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device scsi-cd,drive=drive-ide0-1-0,bus=scsi0.0,id=scsi-cdrom,bootindex=1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:11,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=channel1,path=/tmp/helloworld11,server,nowait -device virtserialport,chardev=channel1,name=port1,bus=virtio-serial0.0,id=port1 -chardev socket,id=channel2,path=/tmp/helloworld12,server,nowait -device virtserialport,chardev=channel2,name=port2,bus=virtio-serial0.0,id=port2 -device usb-tablet,id=input0 -vnc :12 -vga std Thanks, Qunfang
Hello Qunfang, I don't have the command generated by OpenStack nova on hand at the minute. I'll try to obtain and update this bug when I do. That said, simply running the following minimal command does demonstrate the issue (for me): /usr/libexec/qemu-kvm -no-kvm -drive file=cirros-0.3.1-x86_64-disk.img -nographic On a side note, I am pulling that cirros image from http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img, however I have tested other images to ensure the issue is not specific to the cirros image. Did your instances eventually boot? Regards, Matt
Hi Qunfang, Apologies for the delayed reply. I think I may be chasing two different issues here. Initially, I was spawning qemu instances through OpenStack nova and noticed that my tests were failing as qemu instances were not booting in a timely fashion. I then resorted to booting up instances w/ qemu directly (so I could see the console immediately) using the minimal qemu flags documented in my original post and it was there that I saw that instances were taking several minutes to boot when using packages later than qemu-kvm-0.12.1.2-2.355.el6_4.9.x86_64. When using qemu-kvm-0.12.1.2-2.355.el6_4.9.x86_64, instances booted immediately. I have just gone back to spawning up instances through OpenStack nova and it appears that the instances never actually boot there, irrespective of how long I leave them. Nova believes the instances spawn successfully but I can't get any console output on them and they do not become available on the network. Here are the qemu flags being passed by nova: /usr/bin/qemu-system-x86_64 -name instance-00000007 -S -M rhel6.5.0 -cpu Opteron_G5,+bmi1,+perfctr_nb,+perfctr_core,+topoext,+nodeid_msr,+tce,+lwp,+wdt,+skinit,+ibs,+osvw,+cr8legacy,+extapic,+cmp_legacy,+fxsr_opt,+mmxext,+monitor,+ht,+vme -no-kvm -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid a4334aea-1497-442e-8882-9cb793bdb87a -smbios type=1,manufacturer=Red Hat Inc.,product=OpenStack Nova,version=2013.2-5.el6,serial=96fc4fd3-db98-f1ba-22d7-4a49c1e05085,uuid=a4334aea-1497-442e-8882-9cb793bdb87a -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000007.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/a4334aea-1497-442e-8882-9cb793bdb87a/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=34,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=fa:16:3e:1e:dc:ab,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/a4334aea-1497-442e-8882-9cb793bdb87a/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 192.168.0.54:0 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 To complicate matters further, I had some tests pass on a CentOS/RHEL VM running qemu-img-0.12.1.2-2.415.el6_5.3.x86_64. Upon further inspection, I noticed that this VM was on a host w/ "AMD Opteron(tm) Processor 4170 HE" CPUs, while the failing tests were being run on a hypervisor w/ "AMD Opteron(tm) Processor 4332 HE" CPUs. As I noticed that qemu on the successful tests was using "-cpu Opteron_G3", I tried adding the following to /etc/nova/nova.conf on the instance where I was unable to spawn nova VMs: libvirt_cpu_mode=custom libvirt_cpu_model=Opteron_G3 After restarting nova-compute, I booted an instance and it spawned successfully. Sadly, leaving those nova.conf overrides out and reverting to qemu-kvm-0.12.1.2-2.355.el6_4.9.x86_64 does not seem to cause qemu to boot instances correctly through nova when running on the "AMD Opteron(tm) Processor 4332 HE" machine. I'm going to work-around by setting my test instances to always use -cpu Opteron_G3, but please do let me know if you are able to replicate any of this strange behaviour that I'm seeing and require any additional information from me. Regards, Matt
Hi, Matt Thanks a lot for your update, I took two days leave last week and I will dive into it soon.
Reproduced the issue on qemu-kvm-rhev-0.12.1.2-2.415.el6.x86_64 with the command line in comment 8. (1) qemu-kvm-rhev-0.12.1.2-2.415.el6.x86_64 RHEL6.5 image: Can not boot up after 16 minutes. Image provided in comment 8: Guest boot up after hangs for about 5 minutes. (2) qemu-kvm-rhev-0.12.1.2-2.355.el6_4.9.x86_64 RHEL6.5 image: Guest boot up. (add a -m 1G in command line) Image provided in comment 8: Guest could boot up. Ack it first and wait for developer's investigation. Will bisect it later to see the problem is introduced from which version.
Hi, Ademar With the command line and image in comment 8, this bug could be reproduced on the qemu-kvm >=qemu-kvm-0.12.1.2-2.379.el6.x86_64, and could NOT be reproduced on the version <= qemu-kvm-378. However, using a rhel guest image could not reproduce it stable. Thanks, Qunfang
Thanks for the bisect Qunfang! In -379 we had the following changes: - I/O Throttling (Fam) - Serial (16550 uart) PCI - hotplug (Gerd) - QAPI error handling fixes (Luiz) I/O throttling appears to be the most intrusive change, so I'm reassigning it to Fam. Fam: if you have any reason to believe the culprit is Serial hotpulug, please reassign it to Gerd (or Luiz, if you think it's a QAPI problem - unlikely).
Matt, Ademar, I did some tracing. This is a special case where QEMU aio, TCG (-no-kvm) and a simple minded guest (BIOS or GRUB) slow each other down. With this version of qemu-kvm in -no-kvm mode, response time for IDE PIO could be ~30ms. To make thing worse, when guest starts, GRUB only requests for one sector at a time, which means to load a kernel image, a great number of requests need to be completed sequentially, that's why we see qemu-kvm "blocked", where it is actually "busy" loading GRUB stage images and kernel image. The reason why response time is so long, is that TCG and AIO are in the same thread -- the main loop. Only after one TCG execution cycle ends, AIO status is polled. So, although QEMU gets the data very quickly from host, the request is not completed until the next TCG cycle resumes guest cpu and get notified. Upstream doesn't suffer from this, because TCG and IO are in separate threads. Thanks, Fam