Bug 992966 - [abrt] qemu-system-x86-1.4.2-4.fc19: scsi_dma_restart_bh: Process /usr/bin/qemu-system-x86_64 was killed by signal 6 (SIGABRT)
Summary: [abrt] qemu-system-x86-1.4.2-4.fc19: scsi_dma_restart_bh: Process /usr/bin/qe...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 19
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:ce3c9166c88b1a5b9a3fcb4d047...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-05 10:00 UTC by David Gibson
Modified: 2015-02-17 16:34 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-17 16:34:50 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: backtrace (99.49 KB, text/plain)
2013-08-05 10:01 UTC, David Gibson
no flags Details
File: cgroup (517 bytes, text/plain)
2013-08-05 10:01 UTC, David Gibson
no flags Details
File: core_backtrace (970 bytes, text/plain)
2013-08-05 10:01 UTC, David Gibson
no flags Details
File: dso_list (8.40 KB, text/plain)
2013-08-05 10:01 UTC, David Gibson
no flags Details
File: environ (94 bytes, text/plain)
2013-08-05 10:01 UTC, David Gibson
no flags Details
File: limits (1.29 KB, text/plain)
2013-08-05 10:01 UTC, David Gibson
no flags Details
File: maps (53.46 KB, text/plain)
2013-08-05 10:01 UTC, David Gibson
no flags Details
File: open_fds (6.21 KB, text/plain)
2013-08-05 10:02 UTC, David Gibson
no flags Details
File: proc_pid_status (938 bytes, text/plain)
2013-08-05 10:02 UTC, David Gibson
no flags Details
sparsify debugging output (53.65 KB, text/plain)
2013-08-15 07:08 UTC, David Gibson
no flags Details
libvirt log (1.72 KB, text/plain)
2013-08-15 07:09 UTC, David Gibson
no flags Details

Description David Gibson 2013-08-05 10:00:56 UTC
Description of problem:
Attempting to run virt-sparsify on a fresh RHEL 6.4 install.  After some time, the process seemed to freeze, and virt-manager showed the guestfs VM as paused.  I manually unpaused it, and it made a small amount of forward progress before pausing again.   Repeated this for some time, then the guestfs qemu crashed with attached core.

Version-Release number of selected component:
qemu-system-x86-1.4.2-4.fc19

Additional info:
reporter:       libreport-2.1.5
backtrace_rating: 4
cmdline:        /usr/bin/qemu-system-x86_64 -machine accel=kvm -name guestfs-radorwbctnktvu5t -S -machine pc-i440fx-1.4,accel=kvm,usb=off -m 500 -smp 1,sockets=1,cores=1,threads=1 -uuid 2a9b6621-437c-4e0f-ae68-68f92a0b6d7a -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/guestfs-radorwbctnktvu5t.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot -no-shutdown -no-acpi -kernel /var/tmp/.guestfs-0/kernel.20473 -initrd /var/tmp/.guestfs-0/initrd.20473 -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=xterm-256color' -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/tmp/sparsify04fe47.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file=/var/tmp/libguestfs1OyZuh/snapshot1,if=none,id=drive-scsi0-0-1-0,format=qcow2,cache=unsafe -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0 -chardev socket,id=charserial0,path=/var/tmp/libguestfs1OyZuh/console.sock -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/tmp/libguestfs1OyZuh/guestfsd.sock -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.libguestfs.channel.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
crash_function: scsi_dma_restart_bh
executable:     /usr/bin/qemu-system-x86_64
kernel:         3.9.9-302.fc19.x86_64
runlevel:       N 5
uid:            107
var_log_messages: Aug  5 19:29:24 voom abrt[22277]: Saved core dump of pid 20503 (/usr/bin/qemu-system-x86_64) to /var/tmp/abrt/ccpp-2013-08-05-19:29:22-20503 (1260548096 bytes)
xsession_errors: 

Truncated backtrace:
Thread no. 1 (8 frames)
 #4 scsi_dma_restart_bh at hw/scsi-bus.c:101
 #5 aio_bh_poll at async.c:69
 #6 aio_poll at aio-posix.c:148
 #7 aio_ctx_dispatch at async.c:166
 #10 glib_select_poll at main-loop.c:209
 #11 os_host_main_loop_wait at main-loop.c:236
 #12 main_loop_wait at main-loop.c:416
 #13 main_loop at vl.c:2001

Comment 1 David Gibson 2013-08-05 10:01:05 UTC
Created attachment 782738 [details]
File: backtrace

Comment 2 David Gibson 2013-08-05 10:01:10 UTC
Created attachment 782739 [details]
File: cgroup

Comment 3 David Gibson 2013-08-05 10:01:19 UTC
Created attachment 782740 [details]
File: core_backtrace

Comment 4 David Gibson 2013-08-05 10:01:33 UTC
Created attachment 782741 [details]
File: dso_list

Comment 5 David Gibson 2013-08-05 10:01:38 UTC
Created attachment 782742 [details]
File: environ

Comment 6 David Gibson 2013-08-05 10:01:47 UTC
Created attachment 782743 [details]
File: limits

Comment 7 David Gibson 2013-08-05 10:01:54 UTC
Created attachment 782744 [details]
File: maps

Comment 8 David Gibson 2013-08-05 10:02:13 UTC
Created attachment 782745 [details]
File: open_fds

Comment 9 David Gibson 2013-08-05 10:02:19 UTC
Created attachment 782746 [details]
File: proc_pid_status

Comment 10 Richard W.M. Jones 2013-08-05 10:26:01 UTC
(In reply to David Gibson from comment #0)
> Description of problem:
> Attempting to run virt-sparsify on a fresh RHEL 6.4 install.  After some
> time, the process seemed to freeze, and virt-manager showed the guestfs VM
> as paused.  I manually unpaused it, and it made a small amount of forward
> progress before pausing again.   Repeated this for some time, then the
> guestfs qemu crashed with attached core.

qemu should never crash, so there's a bug of some sort here.

However the original reason for virt-sparsify "pausing" is likely
to be because you ran out of disk space.  See:
https://bugzilla.redhat.com/show_bug.cgi?id=745576
http://libguestfs.org/virt-sparsify.1.html#environment-variables

Comment 11 David Gibson 2013-08-06 01:24:47 UTC
I don't think it's out of disk space.  I hit the problem with it freezing when disk was full earlier, and changed TMPDIR to work around.  At the time it froze up this time, none of my filesystems showed as close to full.  I also wouldn't have expected it to be able to make forward progress when I manually unpaused if it were out of disk space.  By manually unpausing it, it made several % worth of extra progress.

Comment 12 Richard W.M. Jones 2013-08-06 07:26:03 UTC
It seems a bit of a mystery what could be causing qemu to pause.

Libguestfs never pauses the appliance.  As far as I know, libvirt
won't pause a guest without being asked to.  The only thing I know
that could cause a guest to pause is ENOSPC, but that appears to
be ruled out (comment 11).

Are there any errors in the libvirt log file
(~/.cache/libvirt/qemu/log/guestfs-radorwbctnktvu5t.log)?

Are there any errors (eg I/O errors etc) in dmesg?

It would be helpful if you could try to reproduce this, and
enable full debugging at the same time:
http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs
This will create a huge amount of output which you should
capture to a file:
virt-sparsify [...] |& tee /tmp/log

Comment 13 David Gibson 2013-08-15 07:08:08 UTC
Ok, tried this again with the debugging enabled as suggested.

Again I got repeated pauses.  This time I didn't get a qemu crash, but after pausing/resuming quite a lot of times, it seemed to stop making forward progress.

At the time of the pause I ran these commands:

[root@voom base-images]# cp sparsify.log sparsify.log-pause1
[root@voom base-images]# du -h /var/tmp/sparsify127052.qcow2 
87G	/var/tmp/sparsify127052.qcow2
[root@voom base-images]# df -h
Filesystem             Size  Used Avail Use% Mounted on
/dev/dm-1              225G  165G   60G  74% /
devtmpfs               3.7G     0  3.7G   0% /dev
tmpfs                  3.7G  144K  3.7G   1% /dev/shm
tmpfs                  3.7G  968K  3.7G   1% /run
tmpfs                  3.7G     0  3.7G   0% /sys/fs/cgroup
tmpfs                  3.7G   98M  3.6G   3% /tmp
/dev/sda1              488M  126M  327M  28% /boot
/dev/dm-1              225G  165G   60G  74% /home
mcgurkus:/volume1/ISO  2.7T  1.2T  1.6T  44% /var/lib/libvirt/images/mcgurkus-ISO

When it seemed to stop making forward progress despite attempting to unpause, I ran these commands:

[root@voom base-images]# killall virt-sparsify
[root@voom base-images]# du -h /var/tmp/sparsify127052.qcow2 
91G	/var/tmp/sparsify127052.qcow2
[root@voom base-images]# df -h
Filesystem             Size  Used Avail Use% Mounted on
/dev/dm-1              225G  169G   56G  76% /
devtmpfs               3.7G     0  3.7G   0% /dev
tmpfs                  3.7G  144K  3.7G   1% /dev/shm
tmpfs                  3.7G  956K  3.7G   1% /run
tmpfs                  3.7G     0  3.7G   0% /sys/fs/cgroup
tmpfs                  3.7G   98M  3.6G   3% /tmp
/dev/sda1              488M  126M  327M  28% /boot
/dev/dm-1              225G  169G   56G  76% /home
mcgurkus:/volume1/ISO  2.7T  1.2T  1.6T  44% /var/lib/libvirt/images/mcgurkus-ISO
[root@voom base-images]# rm /var/tmp/sparsify127052.qcow2 
rm: remove regular file ‘/var/tmp/sparsify127052.qcow2’? y
[root@voom base-images]# df -h
Filesystem             Size  Used Avail Use% Mounted on
/dev/dm-1              225G   74G  147G  34% /
devtmpfs               3.7G     0  3.7G   0% /dev
tmpfs                  3.7G  144K  3.7G   1% /dev/shm
tmpfs                  3.7G  956K  3.7G   1% /run
tmpfs                  3.7G     0  3.7G   0% /sys/fs/cgroup
tmpfs                  3.7G   98M  3.6G   3% /tmp
/dev/sda1              488M  126M  327M  28% /boot
/dev/dm-1              225G   74G  147G  34% /home
mcgurkus:/volume1/ISO  2.7T  1.2T  1.6T  44% /var/lib/libvirt/images/mcgurkus-ISO

It turns out that between the first pause and no longer making progress, nothing further was output in the debug trace.  i.e. sparsify.log and sparsify.log-pause1 compared equal.

I'll attach the final sparsify.log and /var/log/libvirt/qemu/guestfs-i5ttt07coy5v8du0.log

Comment 14 David Gibson 2013-08-15 07:08:59 UTC
Created attachment 786828 [details]
sparsify debugging output

Comment 15 David Gibson 2013-08-15 07:09:39 UTC
Created attachment 786829 [details]
libvirt log

Comment 16 Richard W.M. Jones 2013-08-15 07:44:08 UTC
The pause or whatever it is happens when filling free space:

  mount -o  /dev/vg_rhel6u4serverx8664/lv_root /sysroot/
  [ 1654.502299] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
  guestfsd: main_loop: proc 1 (mount) took 0.13 seconds
  libguestfs: trace: mount = 0
  Fill free space in /dev/vg_rhel6u4serverx8664/lv_root with zero ...
  libguestfs: trace: zero_free_space "/"
  guestfsd: main_loop: new request, len 0x30
  random filename: /sysroot//ufy633xl.qwa

I'm no closer to understanding why this happens.  It doesn't
appear to be running out of disk space on the rootfs.

I wonder if there's some kind of maximum file size (on the
qcow2 file, or files in general, or quota).

For these type of huge images, we will be able to do better
in future once discard is fully supported.

Comment 17 Richard W.M. Jones 2013-08-15 08:01:00 UTC
(In reply to Richard W.M. Jones from comment #16)
> For these type of huge images, we will be able to do better
> in future once discard is fully supported.

To continue on this theme, you've got a 128 GB disk of which
only about 3.5 GB is being used.  This is really the worst
case for virt-sparsify since it has to write a temporary file
of around 124 GB (of zeroes!).  Although there is enough
free space, for some reason -- perhaps quota or a max file
size somewhere -- it only managed 91 GB.

In-place sparsification is what's needed in this case.
http://libguestfs.org/guestfs-faq.1.html#why-doesnt-virt-sparsify-work-on-the-disk-image-in-place%2d

There is also a non-upstream qemu patch which makes writing
files full of zeroes much more disk efficient:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03747.html

Comment 18 David Gibson 2013-08-27 01:06:15 UTC
I realise it needs to write a huge intermediate file, I just can't see why it's failing to do so.

I've worked around by using a smaller disk (since I realised I can give the guest more space by adding more disks later).  It's still weird though.

Comment 19 Cole Robinson 2013-09-03 17:48:10 UTC
I see this thread talking about a crash in aio_bh_poll_sync and trim:

https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg03267.html

Not sure if it's related. Any thoughts Paolo, Stefan?

Comment 20 Stefan Hajnoczi 2013-09-23 14:50:03 UTC
(In reply to Cole Robinson from comment #19)
> I see this thread talking about a crash in aio_bh_poll_sync and trim:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg03267.html
> 
> Not sure if it's related. Any thoughts Paolo, Stefan?

Cole: I'm not sure they are related.

Richard: The QEMU and qcow2 file size limits are around 2^63 bytes so this case should not hit the limit.

David: What is the output of "virsh dominfo <domain>" and "virsh domblkerror <domain>" when the guest has paused for the first time?

Comment 21 Paolo Bonzini 2013-11-27 13:24:21 UTC
Can this be reproduced with a newer QEMU?

BTW, for QEMU 1.8, you will be able to write zeroes efficiently to a qcow2 file using virtio-scsi and the BLKZEROOUT ioctl (from the guest of course).  The BLKZEROOUT ioctl can be used unconditionally, even on older QEMU; it just won't be as effective.

This should have the same effect as the patch in https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03747.html.

Comment 22 Fedora End Of Life 2015-01-09 19:17:31 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 23 Fedora End Of Life 2015-02-17 16:34:50 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.