Bug 720903 - The guest cannot be resumed without any error info when there is overcommit to storage
Summary: The guest cannot be resumed without any error info when there is overcommit t...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Libvirt Maintainers
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-13 08:04 UTC by dyuan
Modified: 2012-05-16 08:12 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-13 13:47:50 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description dyuan 2011-07-13 08:04:59 UTC
Description of problem:
The guest will be paused automatically and cannot be resumed without any error info when there is overcommit to the storage.

no error info from /var/log/messages and /var/log/libvirt/libvirtd.log.

and get the error report in qemu/$guest.log when the guest is paused
# cat /var/log/libvirt/qemu/guest.log
block I/O error in device 'drive-virtio-disk0': No space left on device (28)

when I try to resume the guest, the same error will appear in qemu/$guest.log again.

Version-Release number of selected component (if applicable):
libvirt-0.9.3-2.el6.x86_64
qemu-kvm-0.12.1.2-2.169.el6.x86_64
kernel-2.6.32-167.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. Prepare a small partition for this tesing.

# fdisk -l /dev/sda11

Disk /dev/sda11: 11 MB, 11517952 bytes
64 heads, 32 sectors/track, 10 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

# mkfs.ext3 /dev/sda11

2. Define, build and start the pool

<pool type='fs'>
  <name>mypool</name>
  <source>
    <device path='/dev/sda11'/>
    <format type='auto'/>
  </source>
  <target>
    <path>/var/lib/libvirt/images/mypool</path>
  </target>
</pool>

# virsh pool-define mypool.xml

# virsh pool-build mypool

# virsh pool-start mypool

3. Check the pool is working fine.

# df -h
/dev/sda11              11M  1.1M  9.0M  11% /var/lib/libvirt/images/mypool

4. Prepare the following xml to create volume in the pool.

# cat vol-disk-template.xml
<volume>
  <name>disk1.img</name>
  <capacity unit='M'>100</capacity>
  <allocation unit='M'>0</allocation>
  <target>
    <path>/var/lib/libvirt/images/mypool/disk1.img</path>
    <format type='raw'/>
  </target>
</volume>

# virsh vol-create mypool vol-disk-template.xml

5. Attach the volume to an existing guest as 2rd disk, then start the guest.

6. In guest, try to write some staf (which size is bigger than 50M ) in the 2rd disk.

# fdisk /dev/vdb

# mkfs.ext3 /dev/vdb1

# mount /dev/vdb1 /mnt

# dd if=/dev/zero of=/mnt/test.img bs=1M count=50

7. virsh list --all
 Id Name                 State
----------------------------------
  6 guest                paused


Expected result:
There should be some message that prompt user no availabe space, and the guest can be resumed successfully.

Actual result:
The guest will be paused automatically and cannot be resumed without any error info when there is overcommit to the storage.

Additional info:

Comment 2 Dave Allan 2011-07-13 13:47:50 UTC
(In reply to comment #0)

Thank you for the detailed bug report--it's very helpful.

> and get the error report in qemu/$guest.log when the guest is paused
> # cat /var/log/libvirt/qemu/guest.log
> block I/O error in device 'drive-virtio-disk0': No space left on device (28)

That's where the error is intended to be reported.  There is also an event emitted that provides the same information as the guest.log, that the VM was stopped because of an i/o error: event VIR_DOMAIN_EVENT_SUSPENDED, detail VIR_DOMAIN_EVENT_SUSPENDED_IOERROR

> Expected result:
> There should be some message that prompt user no availabe space, and the guest
> can be resumed successfully.

The guest has been configured to pause on i/o error, which is why it does not resume, or rather it resumes and then immediately suspends again.  It can't continue because it's out of disk space.  If you extend the underlying storage, the guest will stay running.

Comment 3 Dave Allan 2011-07-13 14:01:26 UTC
BTW, to control the behavior on i/o error, see:

http://libvirt.org/formatdomain.html#elementsDisks

In particular:

The optional error_policy attribute controls how the hypervisor will behave on an error, possible values are "stop", "ignore", and "enospace".


Note You need to log in before you can comment on or make changes to this bug.