Bug 1205529 - Race prevents qemu from getting kernel output
Summary: Race prevents qemu from getting kernel output
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 22
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1204627
TreeView+ depends on / blocked
 
Reported: 2015-03-25 06:44 UTC by Stef Walter
Modified: 2015-04-02 09:30 UTC (History)
13 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-04-01 14:43:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Stef Walter 2015-03-25 06:44:01 UTC
Description of problem:

We regularly see a condition with the qemu not seeing kernel boot output, when the guest is a Fedora 22 guest.


Version-Release number of selected component (if applicable):

Kernel 4.0.0-0.rc3.git0.1.fc22.x86_64 on an x86_64

In my case qemu is also running on a Fedora 22 host. We have have also seen this with RHEL 7.x hosts.

qemu-2.2.0-5.fc22.x86_64


How reproducible:

One out of 10 boots.

Steps to Reproduce:
1. Run the Cockpit CI suite.

Actual results. The entirety of the boot output:

Fedora release 22 (Twenty Two)
Kernel 4.0.0-0.rc3.git0.1.fc22.x86_64 on an x86_64 (ttyS0)

m11 login: 


Expected results:

The entire boot output. including kernel messages, systemd initialization, etc. This is what we see the other 9 out of 10 times.

Comment 1 Stef Walter 2015-03-25 09:38:20 UTC
This breaks Cockpit development.

Comment 2 Richard W.M. Jones 2015-03-25 13:15:12 UTC
Is there a reproducer which isn't "Run the Cockpit CI suite".

I can pretty much guarantee that no one will investigate this bug
without considerably more information, like the qemu command line
being used and how you're expecting to get the console messages
and so forth.  Ideally I'd want to see a qemu command line which
can be run that demonstrates the loss of console messages
intermittently, eg:

 $ qemu-kvm -nodefaults -nographic -m 1024 -kernel /boot/vmlinuz-XXX -append "console=ttyS0" -serial stdio

FWIW here is a simple libguestfs-based test you can try:

 $ libguestfs-test-tool

We have never seen intermittent lost console messages however.

Comment 3 Stef Walter 2015-03-25 13:18:32 UTC
> I can pretty much guarantee that no one will investigate this bug
without considerably more information, 

Indeed, and I wanted to see what kind of information to provide. Thanks for the notes, that's a good place to get started.

Comment 4 Stef Walter 2015-03-26 06:27:46 UTC
I've started trying to 'tee' the output from qemu. This may have caused a heisenbug situation, where the tee file descriptor reading behavior causes the bug to go away. Will keep you posted.

In the meantime, this is the sort of qemu command line we're running:

qemu-kvm -m 1024 -drive if=virtio,file=/data/src/cockpit/test/run/cockpit-fedora-22-x86_64-root,index=0,serial=ROOT,snapshot=on -kernel /data/src/cockpit/test/run/cockpit-fedora-22-x86_64-kernel -initrd /data/src/cockpit/test/run/cockpit-fedora-22-x86_64-initrd -append 'root=/dev/vda console=ttyS0 quiet ' -nographic -net nic,model=virtio,macaddr=52:54:00:9e:00:00 -net bridge,vlan=0,br=cockpit0 -device virtio-scsi-pci,id=hot -monitor unix:path=/data/src/cockpit/test/run/machine-lKrTWb.mon,server,nowait

Comment 5 Stef Walter 2015-04-01 14:43:50 UTC
We continue to see this behavior off and on. We had to refactor our test suite so we didn't depend on qemu console output.

But again, that doesn't help you debug this ... so I can close this for now. Sorry about that.

Comment 6 Marius Vollmer 2015-04-02 09:30:27 UTC
I have a very unreliable reproducer that I was meaning to upload and link to...

   http://files.cockpit-project.org/~mvo/bootlog-reproducer.tar.xz

(Warning, 600 MB.)

Instructions:

 Untar it and cd into the directory.
 $ sudo ./vm-prep
 $ ./check-example

This will very occasionally time out while waiting for a certain boot message.  You might try this:
 
 $ while ./check-example; do true; done

At this point, my personal hunch is that it's actually usually Fedora 22 that sometimes fails to output boot messages, but we have definitely also seen breakage with a Fedora 21 image.

With Fedora 22, we always see the final "<hostname> login: " output, but sometimes no "[ OK ] Starting BlitzGewitter" etc messages.  With Fedora 21, we used to sometimes not see any output.  This is what made us think that the breakage happens in qemu.


Note You need to log in before you can comment on or make changes to this bug.