Bug 659301

Summary:

[qemu]Qemu is sending enospace events and filling the lv way beyond vm limitations

Product:

Red Hat Enterprise Linux 6

Reporter:

Moran Goldboim <mgoldboi>

Component:

qemu-kvm

Assignee:

Kevin Wolf <kwolf>

Status:

CLOSED DUPLICATE

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

6.1

CC:

abaron, acathrow, bazulay, cpelland, danken, iheim, imansano, jwest, kwolf, lcapitulino, lihuang, mkenneth, szhou, tburke, virt-maint

Target Milestone:

Target Release:

6.1

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-02-01 15:30:18 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

580951

Attachments:

Description	Flags
vdsm+libvirt logs	none
VDSM and libvirt logs	none
2nd run - vdsm + libvirt logs. Here we see multiple events coming in	none
bt	none

Description Moran Goldboim 2010-12-02 13:17:57 UTC

Created attachment 464245 [details]
vdsm+libvirt logs

Description of problem:
Qemu is sending enospace evets and filling the lv way beyond vm limitations- till the LV is full. 2 vms with configured qcow disk virtio of 10GB and 40GB got to 213GB and 240GB

topology:
strong host and strong fc storage - qcow2 virtio

Version-Release number of selected component (if applicable):
libvirt-0.8.1-28.el6.x86_64
vdsm-4.9-28.el6.x86_64
device-mapper-multipath-0.4.9-32.el6.x86_64
lvm2-2.02.72-8.el6.x86_64
qemu-kvm-0.12.1.2-2.120.el6.x86_64


How reproducible:


Steps to Reproduce:
1.create a vm from blank
2.go through installation (vm will suspend in the middle of it)
3.long after vm is suspend lvextand proceeded 
  
Actual results:


Expected results:


Additional info:
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu Penryn -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -name rhel6 -uuid 2485e335-f253-4a1f-b
337-adb998475016 -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/rhel6.monitor,server,nowait -mon chardev=monitor,mode=control -rtc base=2010-11-2T9:57:17 -boot nc -device virt
io-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204
f9c2/4708ed10-de15-4403-b170-a928db205c5f,if=none,id=drive-virtio-disk0,format=qcow2,serial=93-acf1-9531f204f9c2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virti
o-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=24,id=hostnet0 -device virtio-ne
t-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:10:96:01,bus=pci.0,addr=0x3 -chardev socket,id=channel0,path=/var/lib/libvirt/qemu/channels/rhel6.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=vir
tio-serial0.0,nr=0,chardev=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet,id=input0 -vnc 0:1,password -k en-us -vga cirrus 
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
block I/O error in device 'drive-virtio-disk0': No space left on device (28)
                                  .
                                  . 
                                  .

[root@rhev-i32c-04 ~]# lvs
  LV                                   VG                                   Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  2df964b7-5dc1-48c5-9aab-fa1777ba9d1c a1c08089-faca-4949-9332-7410082e6d9c -wi-a- 213.00g                                      
  4708ed10-de15-4403-b170-a928db205c5f a1c08089-faca-4949-9332-7410082e6d9c -wi-a- 283.00g                                      
  ids                                  a1c08089-faca-4949-9332-7410082e6d9c -wi-a-  64.00m                                      
  inbox                                a1c08089-faca-4949-9332-7410082e6d9c -wi-a-  64.00m                                      
  leases                               a1c08089-faca-4949-9332-7410082e6d9c -wi-ao   2.00g                                      
  master                               a1c08089-faca-4949-9332-7410082e6d9c -wi-ao   1.00g                                      
  metadata                             a1c08089-faca-4949-9332-7410082e6d9c -wi-a- 256.00m                                      
  outbox                               a1c08089-faca-4949-9332-7410082e6d9c -wi-a-  64.00m                                      
  lv_home                              vg_rhevi32c04                        -wi-ao  21.29g                                      
  lv_root                              vg_rhevi32c04                        -wi-ao  50.00g                                      
  lv_swap                              vg_rhevi32c04                        -wi-ao  64.95g           

libvirt log is filled with:
06:18:37.037: debug : remoteRelayDomainEventIOErrorReason:274 : Relaying domain io error rhel6 2 /rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204f9c2/4708ed10-de15-4403-b170-a928db205c5f virtio-disk0 1 enospc
06:18:37.037: debug : remoteRelayDomainEventIOErrorReason:274 : Relaying domain io error rhel6 2 /rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204f9c2/4708ed10-de15-4403-b170-a928db205c5f virtio-disk0 1 enospc
06:18:37.038: debug : remoteRelayDomainEventIOErrorReason:274 : Relaying domain io error rhel6 2 /rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204f9c2/4708ed10-de15-4403-b170-a928db205c5f virtio-disk0 1 enospc
06:18:37.038: debug : remoteRelayDomainEventIOErrorReason:274 : Relaying domain io error rhel6 2 /rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204f9c2/4708ed10-de15-4403-b170-a928db205c5f virtio-disk0 1 enospc
06:18:37.038: debug : remoteRelayDomainEventIOErrorReason:274 : Relaying domain io error rhel6 2 /rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204f9c2/4708ed10-de15-4403-b170-a928db205c5f virtio-disk0 1 enospc
06:18:37.040: debug : remoteRelayDomainEventIOErrorReason:274 : Relaying domain io error rhel6 2 /rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204f9c2/4708ed10-de15-4403-b170-a928db205c5f virtio-disk0 1 enospc
06:18:37.040: debug : remoteRelayDomainEventIOErrorReason:274 : Relaying domain io error rhel6 2 /rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204f9c2/4708ed10-de15-4403-b170-a928db205c5f virtio-disk0 1 enospc
06:18:37.040: debug : remoteRelayDomainEventIOErrorReason:274 : Relaying domain io error rhel6 2 /rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204f9c2/4708ed10-de15-4403-b170-a928db205c5f virtio-disk0 1 enospc
06:18:37.040: debug : remoteRelayDomainEventIOErrorReason:274 : Relaying domain io error rhel6 2 /rhev/data-center/6e4a0f57-646b-4624-9662-6e8a06b36973/a1c08089-faca-4949-9332-7410082e6d9c/images/1aa4a5f6-0594-4d93-acf1-9531f204f9c2/4708ed10-de15-4403-b170-a928db205c5f virtio-disk0 1 enospc

Comment 2 Kevin Wolf 2010-12-02 14:09:22 UTC

Have you checked the image with qemu-img check?

Comment 3 Moran Goldboim 2010-12-02 14:26:40 UTC

qemu-img check:
                     .
                     .
                     .
Leaked cluster 32761 refcount=1 reference=0
Leaked cluster 32762 refcount=1 reference=0
Leaked cluster 32763 refcount=1 reference=0
Leaked cluster 32764 refcount=1 reference=0
Leaked cluster 32765 refcount=1 reference=0
Leaked cluster 32766 refcount=1 reference=0
Leaked cluster 32767 refcount=1 reference=0

16545 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.

Comment 4 Kevin Wolf 2010-12-02 14:37:09 UTC

16k clusters means about 1 GB worth of leaked clusters. Wow. That alone is bad enough (did you have lots of I/O errors on your storage backend?), but it doesn't explain disk usage up to 200 GB and more.

Comment 5 Dan Kenigsberg 2010-12-02 14:57:27 UTC

The huge disk is a direct result of repeated enospace events: for each one, vdsm extend the LV by 1Gb. The question reduces to why qemu generates (so many) enospace events, event after the LV is far larger than the qcow2 virtual size.

Comment 7 Dor Laor 2010-12-02 21:52:40 UTC

Moran, how easy it to reproduce it? Is it only a function of the storage?
Is there a slight chance for accessing the image by 2 VMs at once?

Comment 8 Kevin Wolf 2010-12-03 09:14:02 UTC

Please attach gdb to the qemu-kvm instance and set a breakpoint on bdrv_mon_event. The next time an ENOSPC occurs the breakpoint will trigger. Please provide the backtrace from there.

Also, if you have a VM in this state and I may log into your machine to debug this, please contact me.

Also, just to be sure: If many ENOSPC events are triggered at the same time (the log snippet posted above seems to show this case), VDSM will collapse them and increase the image size only once, right? qemu emits the event once for every request that was in flight and failed.

Comment 9 Dan Kenigsberg 2010-12-05 15:34:53 UTC

(In reply to comment #8)

> Also, just to be sure: If many ENOSPC events are triggered at the same time
> (the log snippet posted above seems to show this case), VDSM will collapse them
> and increase the image size only once, right? qemu emits the event once for
> every request that was in flight and failed.

Doh, bug 568098 is still alive, and this one might be a dup. vdsm cannot really collapse multiple events - it processes the first event on sight. If further events arrive after the first was processed, they are addressed as unrelated ones. Maybe the timing of events allows to collapse events in vdsm, but this is reminiscent of solving races with sleep, which is against my religion.

Comment 10 Andrew Cathrow 2010-12-05 22:06:19 UTC

I can reproduce 100%

qemu-kvm-0.12.1.2-2.113.el6_0.3.x86_64
vm2-2.02.72-8.el6_0.2.x86_64
vdsm-4.9-25.el6.x86_64
kernel-2.6.32-71.7.1.el6.x86_64

Using iSCSI storage (enterprise class array - IBM DS3300)

Logs attached from vdsm and libvirt

Comment 11 Andrew Cathrow 2010-12-05 22:09:22 UTC

Created attachment 464892 [details]
VDSM and libvirt logs

Comment 12 Dor Laor 2010-12-05 22:20:17 UTC

What's the size of the vDisk? As you stated it might be that multiple ENOSPACE per IO cause the system to increase the storage way too much (bz568098).

From the above vdsm.log:

libvirtEventLoop::INFO::2010-12-05 16:38:51,966::libvirtvm::860::vds.vmlog.8124aa09-3306-4aaf-b3a5-892dc01e221e::(_onAbnormalStop) abnormal vm stop device virtio-disk0 error enospc
libvirtEventLoop::DEBUG::2010-12-05 16:38:51,966::storage_mailbox::99::Storage.SPM.Messages.Extend::(__init__) new extend msg created: domain: 30d6c222-dba6-4970-a95c-1f270719fd5c, volume: dc2f029b-54ed-47cb-8bec-173f362a4468
libvirtEventLoop::DEBUG::2010-12-05 16:38:51,966::vm::1284::vds.vmlog.8124aa09-3306-4aaf-b3a5-892dc01e221e::(_lvExtend) _lvExtend: hda: apparentsize 15032385536 req 15360

>>>>>>>>>> What's req 15360 mean?

libvirtEventLoop::INFO::2010-12-05 16:38:51,967::libvirtvm::860::vds.vmlog.8124aa09-3306-4aaf-b3a5-892dc01e221e::(_onAbnormalStop) abnormal vm stop device virtio-disk0 error enospc
libvirtEventLoop::DEBUG::2010-12-05 16:38:51,967::storage_mailbox::99::Storage.SPM.Messages.Extend::(__init__) new extend msg created: domain: 30d6c222-dba6-4970-a95c-1f270719fd5c, volume: dc2f029b-54ed-47cb-8bec-173f362a4468
libvirtEventLoop::DEBUG::2010-12-05 16:38:51,968::vm::1284::vds.vmlog.8124aa09-3306-4aaf-b3a5-892dc01e221e::(_lvExtend) _lvExtend: hda: apparentsize 15032385536 req 15360
libvirtEventLoop::INFO::2010-12-05 16:38:51,968::libvirtvm::860::vds.vmlog.8124aa09-3306-4aaf-b3a5-892dc01e221e::(_onAbnormalStop) abnormal vm stop device virtio-disk0 error enospc
libvirtEventLoop::DEBUG::2010-12-05 16:38:51,968::storage_mailbox::99::Storage.SPM.Messages.Extend::(__init__) new extend msg created: domain: 30d6c222-dba6-4970-a95c-1f270719fd5c, volume: dc2f029b-54ed-47cb-8bec-173f362a4468
libvirtEventLoop::DEBUG::2010-12-05 16:38:51,969::vm::1284::vds.vmlog.8124aa09-3306-4aaf-b3a5-892dc01e221e::(_lvExtend) _lvExtend: hda: apparentsize 15032385536 req 15360

Comment 13 Dan Kenigsberg 2010-12-05 22:47:30 UTC

req 15360 means a request for extension of the said LV to 15360 MiB.

Comment 14 Andrew Cathrow 2010-12-05 23:10:53 UTC

We're certainly seeing multiple events come in

The vDisk is set to  20gb and starts off at 1GB.
After a minute or so into the install it needs to do the first extend and you see the lv go from 1GB to 2Gb a few seconds later it's 3Gb, then the VM gets paused (see notification in RHEV-M UI) but watching the lv it keeps on growing, even past the 20GB total allocation. I stopped the VM and killed the disk at 31GB (logs attached)

Comment 15 Andrew Cathrow 2010-12-05 23:11:53 UTC

Created attachment 464902 [details]
2nd run - vdsm + libvirt logs. Here we see multiple events coming in

Comment 16 Kevin Wolf 2010-12-06 09:07:45 UTC

(In reply to comment #9)
> (In reply to comment #8)
> 
> > Also, just to be sure: If many ENOSPC events are triggered at the same time
> > (the log snippet posted above seems to show this case), VDSM will collapse them
> > and increase the image size only once, right? qemu emits the event once for
> > every request that was in flight and failed.
> 
> Doh, bug 568098 is still alive, and this one might be a dup. vdsm cannot really
> collapse multiple events - it processes the first event on sight. If further
> events arrive after the first was processed, they are addressed as unrelated
> ones. Maybe the timing of events allows to collapse events in vdsm, but this is
> reminiscent of solving races with sleep, which is against my religion.

The bug you mentioned has been handled with very low priority as we were told that VDSM does cope with the fact that multiple events are emitted. If this isn't true, I'm not surprised that things break...

Anyway, these events _are_ independent events, so somebody must do the collapsing. It should be very easy to do for VDSM, because as long as it hasn't continued the VM, it definitely knows that it should resize only once. A stopped VM doesn't issue any more requests that could fail.

Anyway, adding Luiz to CC.

Comment 17 Dan Kenigsberg 2010-12-06 10:08:01 UTC

Kevin, please explain how vdsm can reliably collapse the events. There is an inherent race between handling the first event (which includes extending LV and continuing qemu) and the reception of further events.

Comment 18 Kevin Wolf 2010-12-06 10:33:33 UTC

True, but this isn't any different when you try to collapse them in qemu. How do you relate a block error event to a stop event today? What you really get is a number of block error events and one stop event. It should be safe to ignore any block error that isn't followed by a stop (you can double-check by querying the VM state).

But maybe what you really should do is to look at the high watermark even after an ENOSPC and use that to check if the image size needs to be further increased?

Comment 21 lihuang 2010-12-09 08:19:41 UTC

Created attachment 467701 [details]
bt

attached is the requested log.

Comment 22 lihuang 2010-12-09 09:41:35 UTC

(gdb) bt
#0  bdrv_mon_event (bdrv=0x2aa7490, action=BDRV_ACTION_STOP, error=28, is_read=0) at block.c:1436
#1  0x000000000041dd33 in virtio_blk_handle_rw_error (opaque=0x7f886801c1f0, ret=-28) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:77
#2  virtio_blk_rw_complete (opaque=0x7f886801c1f0, ret=-28) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:93
#3  0x0000000000479ade in multiwrite_user_cb (opaque=0x7f886804bc60, ret=<value optimized out>) at block.c:1907
#4  multiwrite_cb (opaque=0x7f886804bc60, ret=<value optimized out>) at block.c:1926
#5  0x000000000048d13d in qcow_aio_write_cb (opaque=0x7f8870019010, ret=-28) at block/qcow2.c:648
#6  0x000000000047eb68 in posix_aio_process_queue (opaque=0x2aa3300) at posix-aio-compat.c:460
#7  0x000000000047ec68 in posix_aio_read (opaque=0x2aa3300) at posix-aio-compat.c:501
#8  0x000000000040b4af in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4234
#9  0x0000000000428c2a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2133
#10 0x000000000040e5cb in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4444
#11 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6601

(gdb) frame 5
#5  0x000000000048d13d in qcow_aio_write_cb (opaque=0x7f8870019010, ret=-28) at block/qcow2.c:648
648         acb->common.cb(acb->common.opaque, ret);
(gdb) p *acb
$1 = {common = {pool = 0x8501e0, bs = 0x2aa7490, cb = 0x479a90 <multiwrite_cb>, opaque = 0x7f886804bc60, next = 0x7f887000fb00}, sector_num = 5606528, qiov = 0x7f8868028220, buf = 0x7f88681e5800 "", 
  orig_buf = 0x7f88681df600, remaining_sectors = 511, cur_nr_sectors = 511, cluster_offset = 3178561536, cluster_data = 0x0, hd_aiocb = 0x0, hd_iov = {iov_base = 0x7f88681e5800, iov_len = 261632}, 
  hd_qiov = {iov = 0x7f8870019078, niov = 1, nalloc = -1, size = 261632}, bh = 0x0, l2meta = {offset = 2870542336, cluster_offset = 3178561536, n_start = 0, nb_available = 511, nb_clusters = 4, 
    depends_on = 0x0, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0x0, le_prev = 0x2aa7ec8}}, next_depend = {le_next = 0x0, le_prev = 0x7f88700a4130}}

(gdb) frame 6
#6  0x000000000047eb68 in posix_aio_process_queue (opaque=0x2aa3300) at posix-aio-compat.c:460
460                     acb->common.cb(acb->common.opaque, ret);
(gdb) p *acb
$3 = {common = {pool = 0x84eea0, bs = 0x2aa8120, cb = 0x48cee0 <qcow_aio_write_cb>, opaque = 0x7f8870019010, next = 0x7f8870028bf0}, aio_fildes = 11, {aio_iov = 0x7f8870019078, 
    aio_ioctl_buf = 0x7f8870019078}, aio_niov = 1, aio_nbytes = 261632, ev_signo = 12, aio_offset = 3178561536, node = {tqe_next = 0x0, tqe_prev = 0x85c240}, aio_type = 2, ret = -28, active = 1, 
  next = 0x7f8870019740, async_context_id = 0}
(gdb)

Comment 23 Kevin Wolf 2010-12-09 12:47:49 UTC

qcow2 acb:
> l2meta = {offset = 2870542336, cluster_offset = 3178561536, n_start = 0,

file acb:
>     aio_ioctl_buf = 0x7f8870019078}, aio_niov = 1, aio_nbytes = 261632,
> ev_signo = 12, aio_offset = 3178561536

These few values are the interesting ones. The guest was trying to access offset 2870542336 on the virtual hard disk (2.8 GB), and qcow2 translated this into an access at offset 3178561536 in the host LV (3.1 GB). This looks perfectly sane.

This is another strong indication that VDSM tries to increase the size multiple size until it reaches really big sizes, even though qemu only wanted to access something around 3 GB.

Comment 24 Dor Laor 2010-12-19 22:07:30 UTC

Idan, if you see another issue with IDE, please use the bug for reporting it.

Comment 25 Dor Laor 2010-12-26 10:58:16 UTC

Are there more reports of this bug? It seems that all the cases so far the issue is that there was a real ENOSPC that was reported multiple times. In this case, this is solvable through vdsm and arguably not a bug in qemu terms (not worth to open it for discussion).

If there are other cases that are not caused by 'normal' operation please report them, otherwise we'll close the issue.

Comment 26 Kevin Wolf 2011-02-01 15:30:18 UTC

Nobody seems to have found a case where it's not caused by missing event collapsing, so I'm closing this as a duplicate of the existing request.

*** This bug has been marked as a duplicate of bug 568098 ***