Bug 837925

Summary: Fedora 16 and 17 guests hang during boot
Product: [Fedora] Fedora Reporter: Jason Brooks <jbrooks>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 17CC: abaron, acathrow, amit.shah, bazulay, berrange, cfergeau, crobinso, dougsland, dwmw2, dyasny, herrold, iheim, itamar, knoel, mgoldboi, oschreib, pbonzini, rjones, scottt.tw, virt-maint, xuhj
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 839156 (view as bug list) Environment:
Last Closed: 2012-07-28 01:22:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 822145    
Attachments:
Description Flags
vdsm log from F17 VM
none
domain xml
none
vm definition from virt-manager test none

Description Jason Brooks 2012-07-05 20:18:40 UTC
Description of problem:

On oVirt 3.1, Fedora 16 and Fedora 17 guests hang during boot. F16 guests will install normally from the DVD iso, and then hang on first boot. F17 guests hang when booting from the DVD iso.

I booted up my F16 VM in rescue mode, and started each of the services wanted by the default runlevel individually. It was getty.target that appears to be causing the hangup. All the other services start without an issue, but right after "systemctl start getty.target" the system cpu % grows quickly to 99% and the VM won't respond.

Version-Release number of selected component (if applicable):

4.10.0-4.fc17

I chose vdsm as the affected component because a user on the list mentioned a similar-sounding issue with F16 and F17 guests, manipulating vdsm directly: http://lists.ovirt.org/pipermail/users/2012-June/002555.html.

How reproducible:

Install an F16 guest on oVirt 3.1 and attempt to boot, or attempt to install an F17 guest from the DVD install image.


Steps to Reproduce:
1.
2.
3.
  
Actual results:

Guest hangs during boot.


Expected results:

Guest boots normally.

Additional info:

Comment 1 Itamar Heim 2012-07-07 11:54:09 UTC
does it reproduce with virt-manager?

Comment 2 Jason Brooks 2012-07-08 17:48:25 UTC
(In reply to comment #1)
> does it reproduce with virt-manager?

I just tried this -- I connected to my node with virt-manager running on a separate machine, using the vdsm@rhev credentials. The F17 DVD image booted normally. From oVirt, booting from the same F17 DVD iso results in a hang.

Comment 3 Andrew Cathrow 2012-07-08 19:03:04 UTC
Can we get the logs from VDSM when the VM is started

Comment 4 Jason Brooks 2012-07-08 20:16:36 UTC
Created attachment 596917 [details]
vdsm log from F17 VM

Comment 5 Andrew Cathrow 2012-07-08 20:40:42 UTC
Created attachment 596918 [details]
domain xml

Attaching extracted domain xml from vdsm log.

We need to try and reproduce using this xml to see what breaks,

Comment 6 Itamar Heim 2012-07-09 03:02:30 UTC
jason - was the virt-manager try similar to vdsm configuration (spicevmc, usb, cpus, virtio devices, etc.)?

Comment 7 Jason Brooks 2012-07-09 04:51:08 UTC
Created attachment 596986 [details]
vm definition from virt-manager test

Comment 8 Xu He Jie 2012-07-09 15:41:43 UTC
which version of qemu did you used? 

I can reproduce qemu(version < 1.1) will hangs when guest os connect the virtio console.

vdsm will add this to vm:
<console type='pty'>
           <target type='virtio' port='0'/>
</console>

if host didn't open this pty, when guest execute 'getty' on this console, guest will hangs

I think it's a qemu bug, and it has already fix in qemu. You can try to upgrade your qemu to 1.1.0.

Comment 9 Jason Brooks 2012-07-09 16:37:55 UTC
(In reply to comment #8)
> which version of qemu did you used? 

I was using the current qemu from F17, which is 1.0.17.

> 
> I think it's a qemu bug, and it has already fix in qemu. You can try to
> upgrade your qemu to 1.1.0.

I built qemu-1.1.0-4.fc18.src.rpm for F17 and used it to update the qemu pkgs on my node, and F17 booted from the install DVD as expected. So, yay! I'll test now with F16, too.

Comment 10 Jason Brooks 2012-07-09 17:50:39 UTC
(In reply to comment #9)

> I'll test now with F16, too.

With the qemu 1.1 packages, F16 works, as well.

Comment 11 Amit Shah 2012-07-11 06:14:57 UTC
systemd creates a getty on serial ports available, and the virtio-console port is one of them.  If there's no listener connected for that port on the host, the output from the guest is throttled till a listener is connected.  This is desirable for virtio-serial ports, but not for console ports, whose output should be discarded in such cases.

With the current guest code, the guest just keeps spinning in a while() loop till it hears back from the host after writing data on a console port.  In this case, since there's no listener connected, the guest never hears back, and just keeps spinning.

Fix is to simply not throttle console ports, and discard the data if no listener is connected.  This is done in qemu upstream commit ed8e5a85a1741147ce06932b478a509ce3407061.

This affects only in the case where the console chardev is of type 'pty'.  'unix' and 'tcp' chardevs will just throttle with current code.

With the fix mentioned above, however, the default behaviour for 'unix' and 'tcp' chardevs will change as well to not throttle if the port is a console port.

Comment 12 Dan Kenigsberg 2012-07-12 07:43:48 UTC
Could you backport this fix to Fedora 17, so oVirt-3.1 can enjoy it?

Comment 13 Amit Shah 2012-07-12 08:57:15 UTC
(In reply to comment #12)
> Could you backport this fix to Fedora 17, so oVirt-3.1 can enjoy it?

Sure, please clone it for F16 (or re-assign), and the maintainer will take care of it.

Comment 14 Dan Kenigsberg 2012-07-12 13:10:48 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > Could you backport this fix to Fedora 17, so oVirt-3.1 can enjoy it?
> 
> Sure, please clone it for F16 (or re-assign), and the maintainer will take
> care of it.

If fixed in qemu, there's nothing to do in Vdsm. Moving.

Comment 15 Ofer Schreiber 2012-07-18 14:24:49 UTC
Any update on this?
it currently blocks oVirt 3.1 release.

Comment 16 Cole Robinson 2012-07-18 14:30:08 UTC
I'll do a build for this today

Comment 17 Fedora Update System 2012-07-18 22:39:28 UTC
qemu-1.0-18.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/qemu-1.0-18.fc17

Comment 18 Fedora Update System 2012-07-19 08:54:14 UTC
Package qemu-1.0-18.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing qemu-1.0-18.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-10744/qemu-1.0-18.fc17
then log in and leave karma (feedback).

Comment 19 Fedora Update System 2012-07-19 14:44:08 UTC
qemu-0.15.1-6.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/qemu-0.15.1-6.fc16

Comment 20 Fedora Update System 2012-07-28 01:22:07 UTC
qemu-1.0-18.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 21 Fedora Update System 2012-07-28 01:25:42 UTC
qemu-0.15.1-6.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.