Hide Forgot
Description of problem: [I'm still analyzing this problem, so at the moment this bug is just a placeholder. In particular I don't know if this is a qemu bug, kernel bug, or something else entirely.] When you output lots of text to the emulated ISA serial port from a guest, the output hangs. It can be restarted by pressing any key. An easy way to reproduce this if you have libguestfs >= 1.37.1 is: $ virt-builder fedora-25 $ virt-rescue -a fedora-25.img -i At the ><rescue> prompt, list a long directory, such as: ><rescue> ls -l /usr/bin Every few times you do this, the directory listing will stop somewhere at random in the middle. It can be resumed by pressing any key. The strace (of virt-rescue) when this happens is really odd. Here is an example, where times are shown on the left and I have added line numbers: 1: 1490297056.676881 poll([{fd=0, events=POLLIN}, {fd=1, events=POLLOUT}, {fd=8, events=POLLIN}], 3, -1) = 2 ([{fd=1, revents=POLLOUT}, {fd=8, revents=POLLIN}]) 2: 1490297056.676920 read(8, "\r\n", 4089) = 2 3: 1490297056.676953 write(1, " b2sum\r\n", 9) = 9 4: 1490297056.676985 poll([{fd=0, events=POLLIN}, {fd=1, events=0}, {fd=8, events=POLLIN}], 3, -1) = 1 ([{fd=8, revents=POLLIN}]) 5: 1490297056.681065 read(8, "-rwxr-xr-x 1 root root 40760 Mar 9 09:43 base32\r\n-rwxr-xr-x 1 root root 40768 Mar 9 09:43 base64\r\n-rwxr-xr-x 1 root root 32512 Mar 9 09:43 basename\r\n-rwxr-xr-x 1 root root 1132544 Feb 10 06:55 bash\r\nlrwxrwxrwx 1 root root 10 Feb 10 06:55 bashbug -> bashbug-64", 4096) = 278 6: 1490297056.681256 poll([{fd=0, events=POLLIN}, {fd=1, events=POLLOUT}, {fd=8, events=POLLIN}], 3, -1) = 1 ([{fd=1, revents=POLLOUT}]) 1490297056.681305 write(1, "-rwxr-xr-x 1 root root 40760 Mar 9 09:43 base32\r\n-rwxr-xr-x 1 root root 40768 Mar 9 09:43 base64\r\n-rwxr-xr-x 1 root root 32512 Mar 9 09:43 basename\r\n-rwxr-xr-x 1 root root 1132544 Feb 10 06:55 bash\r\nlrwxrwxrwx 1 root root 10 Feb 10 06:55 bashbug -> bashbug-64", 278) = 278 7: 1490297056.681354 poll([{fd=0, events=POLLIN}, {fd=1, events=0}, {fd=8, events=POLLIN}], 3, -1) = 1 ([{fd=0, revents=POLLIN}]) 8: 1490297060.120147 read(0, "\r", 4096) = 1 9: 1490297060.120191 poll([{fd=0, events=POLLIN}, {fd=1, events=0}, {fd=8, events=POLLIN|POLLOUT}], 3, -1) = 1 ([{fd=8, revents=POLLOUT}]) 10: 1490297060.120239 write(8, "\r", 1) = 1 11: 1490297060.120281 poll([{fd=0, events=POLLIN}, {fd=1, events=0}, {fd=8, events=POLLIN}], 3, -1) = 1 ([{fd=8, revents=POLLIN}]) 12: 1490297060.120585 read(8, "\r\n", 4096) = 2 13: 1490297060.124040 poll([{fd=0, events=POLLIN}, {fd=1, events=POLLOUT}, {fd=8, events=POLLIN}], 3, -1) = 2 ([{fd=1, revents=POLLOUT}, {fd=8, revents=POLLIN}]) 14: 1490297060.124098 read(8, "-rwxr-xr-x 1 root root 7215 Feb 10 06:55 bashbug-64\r\n-rwxr-xr-x 1 root root 26 Feb 10 06:55 bg\r\n-rwxr-xr-x 1 root root 44712 Mar 16 09:21 bootctl\r\nlrwxrwxrwx 1 root root 5 Feb 10 07:23 bunzip2 -> bzip2\r\n-rwxr-xr-x 1 root root 69272 Mar 16 09:21 busctl\r\nlr", 4094) = 278 15: 1490297060.124304 write(1, "\r\n-rwxr-xr-x 1 root root 7215 Feb 10 06:55 bashbug-64\r\n-rwxr-xr-x 1 root root 26 Feb 10 06:55 bg\r\n-rwxr-xr-x 1 root root 44712 Mar 16 09:21 bootctl\r\nlrwxrwxrwx 1 root root 5 Feb 10 07:23 bunzip2 -> bzip2\r\n-rwxr-xr-x 1 root root 69272 Mar 16 09:21 busctl\r\nlr", 280) = 280 Line 6 is the last message printed before the hang. Line 7 is the poll which hangs. Line 8 is me pressing the [Return] key about 4 seconds later. Line 11 is where the listing resumes normally. Note that the poll on line 7 is waiting correctly to POLLIN on fd=8. We know at this point that there must be more data for virt-rescue to read from qemu, but nothing is read. The poll on line 11 has an identical set of poll events, including POLLIN on fd=8, and that one returns immediately with a read indication on fd=8, as it should. Version-Release number of selected component (if applicable): qemu-2.8.0-2.fc26.x86_64 kernel-4.11.0-0.rc3.git0.2.virtio_reverts.fc26.x86_64 How reproducible: 100% Steps to Reproduce: 1. See above.
Created attachment 1265899 [details] strace of qemu before and after a hang Here is an strace of qemu around a hang. The serial line is resumed from the hang when I press the "x" key on the keyboard, which shows up as this line in the trace: 1490302045.249234 recvmsg(11, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="x", iov_len=8}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 1 Apart from that it's a bit more difficult to see any useful info from this trace.
A note for anyone trying to reproduce this, virt-rescue '-i' is a new option added after Fedora 26 (in fact only in Rawhide), which sucks a bit, but the Rawhide libguestfs package should likely be installable directly in Fedora 26.
I'm fairly sure this is not directly a qemu problem, because I went all the way back to qemu 2.5, and the bug is still reproducible there. To test against another qemu, do: $ export LIBGUESTFS_BACKEND=direct $ export LIBGUESTFS_HV=~/d/qemu/x86_64-softmmu/qemu-system-x86_64 $ virt-rescue -a fedora-25.img -i
(All tests done on baremetal) Fedora 25: HANGING Fedora 24: WORKS RHEL 7.4: WORKS qemu-kvm-rhev-2.8.0-6.el7.x86_64 kernel-3.10.0-567.el7.x86_64
I spent all of last night and the whole of today individually upgrading packages from F24 -> F25 and retesting for the bug (which is very rare), and I have satisfied myself the problem occurs in glib2: glib2-2.48.2-1.fc24.x86_64 -> glib2-2.50.3-1.fc25.x86_64 (good) (bad) Not narrowed it down yet to a particular commit, working on that next.
I bisected this to: e4ee3079c5afc3c1c3d2415f20c3e8605728f074 is the first bad commit commit e4ee3079c5afc3c1c3d2415f20c3e8605728f074 Author: Frediano Ziglio <fziglio> Date: Mon Jan 25 14:59:24 2016 +0000 Do not wake up main loop if change is from same thread This reduce the frequency the loop is waked up adding and removing file descriptors or timeouts. Considering that to support recursion events are removed from list and added again this reduce iteration number a lot. Signed-off-by: Frediano Ziglio <fziglio> https://bugzilla.gnome.org/show_bug.cgi?id=761102 :040000 040000 bb42e86b5a2c962322a84b52c939f16df2c43034 8d31e66a35fb3ae1e72fa7a34ad861bfd332bd84 M glib
Usually this indicate that the program is expecting loop even when there are not events. The fact you are expecting an event on fd=8 and poll is stuck means either a bug in kernel, that you already handled pending events or that some other thread should cause an event on this file descriptor. What is fd=8 ?
If you're looking at the strace in comment 0, then that's from virt-rescue, which is the other side of the connection. Comment 1 contains the strace from qemu which uses glib2.
> The fact you are expecting an event on fd=8 and poll is stuck means either a > bug in kernel, that you already handled pending events or that some other > thread should cause an event on this file descriptor. But how would you justify the bisection result then? A bug in the kernel, triggered by glib, is so unlikely that I'd rule it out.
There was a non-public email where I described other bisections that I had done on this bug. Reproducing here to make things clearer: > Now, I have tried to attack this problem by bisecting both > qemu from current HEAD back to qemu 2.5. Also by separately > bisecting the kernel from current HEAD back to kernel 4.7.0. > > In neither case can I find a 'good' working version. > > Although I was bisecting qemu and the kernel separately, for > completeness I also tried kernel 4.7 with qemu 2.5. That's hang-y too! > > Since all of this started recently, perhaps during Fedora 26 > timeframe, I suppose that means it must be some other component. Host > kernel or glibc spring to mind here. > > All of the above testing is done on baremetal on a very fast 16 core > server (but with no -smp). I'm fairly sure it cannot be the kernel at fault. Although I bisected qemu, it is still possible that qemu is misusing the glib main loop API somehow, and the glib patch exposed an existing bug in qemu. A thought: is there any way to turn on debugging in the glib main loop (similar to lockdep in the kernel)?
(In reply to Richard W.M. Jones from comment #10) > A thought: is there any way to turn on debugging in the glib > main loop (similar to lockdep in the kernel)? Examining the code says no, just a debug #define + environment variable which prints lots of debugging messages.
(In reply to Paolo Bonzini from comment #9) > > The fact you are expecting an event on fd=8 and poll is stuck means either a > > bug in kernel, that you already handled pending events or that some other > > thread should cause an event on this file descriptor. > > But how would you justify the bisection result then? > This optimization caused different of these "issues". Basically the glib loop does prepare + check + poll(2) + dispatch for every event. If one dispatch is not enough to handle an event (for instance as you mark some flag on first dispatch then need another call to handle it) or you expect more events the loop could seem stuck. Note from the Qemu strace (comment 1, first attachment) that the file descriptor list didn't changed but poll is not detecting any events. Looking (I have to say very quickly) at Qemu serial code there's no addition of events in the loop so this seems to confirm there are no pending events. > A bug in the kernel, triggered by glib, is so unlikely that I'd rule it out. I think so but I never exclude possibilities on early analysis. Is not clear which event should wake up the loop on serial data from the guest. Being a old serial port I suspect there should be a KVM exit with I/O. Maybe the strace is just showing the main thread but some different event should came from a different thread? What is doing this VM? Which devices are attached to it?
(In reply to Frediano Ziglio from comment #12) > Maybe the strace is just showing the main thread but some different > event should came from a different thread? That's a good point. I was only tracing a single thread. To trace a multithreaded process you apparently need to use -p PID -f which I will attach shortly. > What is doing this VM? Which devices are attached to it? The VM is running virt-rescue. You can reproduce the exact conditions very easily if you have a Rawhide machine (see comment 0). The qemu command line (generated by libvirt) is: LC_ALL=C PATH=/usr/libexec/python2-sphinx:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rjones/.local/bin:/home/rjones/bin HOME=/home/rjones USER=rjones LOGNAME=rjones QEMU_AUDIO_DRV=none TMPDIR=/var/tmp /usr/bin/qemu-kvm -name guest=guestfs-mykkcee059jvtksl,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/home/rjones/.config/libvirt/qemu/lib/domain-1-guestfs-mykkcee059jv/master-key.aes -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 500 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 5916b294-6bd3-46b7-9723-19be39d89eaf -display none -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/home/rjones/.config/libvirt/qemu/lib/domain-1-guestfs-mykkcee059jv/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -no-acpi -boot strict=on -kernel /var/tmp/.guestfs-1000/appliance.d/kernel -initrd /var/tmp/.guestfs-1000/appliance.d/initrd -append 'panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm-256color guestfs_rescue=1' -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/tmp/libguestfsJwydC2/scratch.1,format=raw,if=none,id=drive-scsi0-0-0-0,cache=unsafe -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file=/tmp/libguestfsJwydC2/overlay2,format=qcow2,if=none,id=drive-scsi0-0-1-0,cache=unsafe -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 -chardev socket,id=charserial0,path=/run/user/1000/libguestfsbUO182/console.sock -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/run/user/1000/libguestfsbUO182/guestfsd.sock -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.libguestfs.channel.0 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x4 -msg timestamp=on
(In reply to Richard W.M. Jones from comment #13) > (In reply to Frediano Ziglio from comment #12) > > Maybe the strace is just showing the main thread but some different > > event should came from a different thread? > > That's a good point. I was only tracing a single thread. To > trace a multithreaded process you apparently need to use -p PID -f > which I will attach shortly. Unfortunately that either slows down or serializes the emulation enough that the bug goes away.
(In reply to Richard W.M. Jones from comment #14) > (In reply to Richard W.M. Jones from comment #13) > > (In reply to Frediano Ziglio from comment #12) > > > Maybe the strace is just showing the main thread but some different > > > event should came from a different thread? > > > > That's a good point. I was only tracing a single thread. To > > trace a multithreaded process you apparently need to use -p PID -f > > which I will attach shortly. > > Unfortunately that either slows down or serializes the emulation > enough that the bug goes away. Yes, but at least they can help understanding the code path. I'm reading the code and seems that IRQs are involved but not clear how they get to the main loop but is also depending on different stuff (like machine type). Even a small one while you get some data from the serial could help.
Created attachment 1266714 [details] multithreaded strace of qemu OK, attached is a multithreaded strace of qemu. I am typing the command "ls -l /usr/bin" on the serial console, which is followed by the contents of /usr/bin being listed to the serial console. As noted in preceeding comemnts, this strace does not demonstrate the hang.
It's possible to attach gdb to the qemu process and get stack traces when it hangs. This one is from the latest qemu from git, 37c4a85cd256a. Thread 3 (Thread 0x7ff65ffca700 (LWP 3764)): #0 0x00007ff669fa8837 in ioctl () at /lib64/libc.so.6 #1 0x0000560a03e76f8b in kvm_vcpu_ioctl (cpu=0x560a051fc7e0, type=44672) at /home/rjones/d/qemu/kvm-all.c:2154 #2 0x0000560a03e7689f in kvm_cpu_exec (cpu=0x560a051fc7e0) at /home/rjones/d/qemu/kvm-all.c:1992 #3 0x0000560a03e5ce42 in qemu_kvm_cpu_thread_fn (arg=0x560a051fc7e0) at /home/rjones/d/qemu/cpus.c:1087 #4 0x00007ff66fe5c36d in start_thread () at /lib64/libpthread.so.0 #5 0x00007ff669fb2e0f in clone () at /lib64/libc.so.6 Thread 2 (Thread 0x7ff661ad8700 (LWP 3761)): #0 0x00007ff669fad7a9 in syscall () at /lib64/libc.so.6 #1 0x0000560a04345721 in qemu_futex_wait (f=0x560a04e20e44 <rcu_call_ready_event>, val=4294967295) at /home/rjones/d/qemu/include/qemu/futex.h:26 #2 0x0000560a04345824 in qemu_event_wait (ev=0x560a04e20e44 <rcu_call_ready_event>) at util/qemu-thread-posix.c:399 #3 0x0000560a0435d2cf in call_rcu_thread (opaque=0x0) at util/rcu.c:249 #4 0x00007ff66fe5c36d in start_thread () at /lib64/libpthread.so.0 #5 0x00007ff669fb2e0f in clone () at /lib64/libc.so.6 Thread 1 (Thread 0x7ff672afdc80 (LWP 3760)): #0 0x00007ff669fa6dc6 in ppoll () at /lib64/libc.so.6 #1 0x0000560a0433fdf9 in qemu_poll_ns (fds=0x560a05f9b270, nfds=74, timeout=33648614890000) at util/qemu-timer.c:334 #2 0x0000560a04340e0c in os_host_main_loop_wait (timeout=33648614890000) at util/main-loop.c:252 #3 0x0000560a04340ecf in main_loop_wait (nonblocking=0) at util/main-loop.c:506 #4 0x0000560a03fbbf44 in main_loop () at vl.c:1898 #5 0x0000560a03fc3cf4 in main (argc=67, argv=0x7ffc524d50a8, envp=0x7ffc524d52c8) at vl.c:4709 I believe that Thread 3 = vCPU, Thread 2 = iothread, Thread 1 = main loop. What's interesting (I think) is that it's not actually using glib's g_poll, but a replacement function: http://git.qemu-project.org/?p=qemu.git;a=blob;f=util/qemu-timer.c;h=82d56507a2bdc2416667c7d863798e91298904af;hb=HEAD#l315 (CONFIG_PPOLL is defined) It's also happily casting GPollFD to struct pollfd. Hmmm. They're actually compatible (apart from signedness) but it still doesn't seem like a good idea. Is there a way to get from here to the SerialState struct used by qemu to store the state of the serial ports?
I'm reassigning this back to qemu as it does indeed appear to be a race condition in qemu and nothing to do with glib2 (the change to glib2 appears to have exposed the bug, that's all). Stefan Hajnoczi diagnosed the problem and came up with the following patch (to qemu) as a solution: diff --git a/chardev/char.c b/chardev/char.c index 3df1163..4d66bb6 100644 --- a/chardev/char.c +++ b/chardev/char.c @@ -1059,6 +1059,8 @@ guint qemu_chr_fe_add_watch(CharBackend *be, GIOCondition cond, tag = g_source_attach(src, NULL); g_source_unref(src); + qemu_notify_event(); + return tag; }
Seems to be a glib2 bug even more to me. The g_wakeup_signal that Frediano removed (well, made conditional) is doing exactly the same thing as qemu_notify_event. It seems like glib is culling some g_wakeup_signal calls that were actually necessary.
Ok, I think I found out what's going on. glib is expecting QEMU to use g_main_context_acquire around accesses to GMainContext. However QEMU is not doing that, instead it is taking its own mutex. The bug should be fixed by adding g_main_context_acquire and g_main_context_release in the two implementations of os_host_main_loop_wait. In all fairness, the docs do say "You must be the owner of a context before you can call g_main_context_prepare(), g_main_context_query(), g_main_context_check(), g_main_context_dispatch()". However, until now nothing bad happened if you didn't do that. I'm not sure of glib's policy regarding backwards-compatibility but I think you should consider undoing this change.
Somewhat different patch posted upstream: https://www.mail-archive.com/qemu-devel@nongnu.org/msg441005.html
Richard, can you clone the bug to glib, since bug 761102 has been reopened? > glib docs say "You must be the owner of a context before you can call > g_main_context_prepare(), g_main_context_query(), g_main_context_check(), > g_main_context_dispatch()." They do not say that you need to be the owner of a > context between those calls. > > In particular, if you release the context after g_main_context_query returns > and reacquire it before g_main_context_check, it would cause the bug again: > > thread A thread B > ------------ ------------ > for (;;) { > acquire > prepare > query > release > ppoll() > g_source_attach > acquire > check > dispatch > release > }
Cloned to bug 1438539.
qemu-2.7.1-5.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-96a7189749
qemu-2.7.1-5.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-96a7189749
qemu-2.7.1-5.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.