Bug 720903

Summary: The guest cannot be resumed without any error info when there is overcommit to storage
Product: Red Hat Enterprise Linux 6 Reporter: dyuan
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.2CC: dallan, mzhan, nzhang, rwu, weizhan, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-13 13:47:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description dyuan 2011-07-13 08:04:59 UTC
Description of problem:
The guest will be paused automatically and cannot be resumed without any error info when there is overcommit to the storage.

no error info from /var/log/messages and /var/log/libvirt/libvirtd.log.

and get the error report in qemu/$guest.log when the guest is paused
# cat /var/log/libvirt/qemu/guest.log
block I/O error in device 'drive-virtio-disk0': No space left on device (28)

when I try to resume the guest, the same error will appear in qemu/$guest.log again.

Version-Release number of selected component (if applicable):
libvirt-0.9.3-2.el6.x86_64
qemu-kvm-0.12.1.2-2.169.el6.x86_64
kernel-2.6.32-167.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. Prepare a small partition for this tesing.

# fdisk -l /dev/sda11

Disk /dev/sda11: 11 MB, 11517952 bytes
64 heads, 32 sectors/track, 10 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

# mkfs.ext3 /dev/sda11

2. Define, build and start the pool

<pool type='fs'>
  <name>mypool</name>
  <source>
    <device path='/dev/sda11'/>
    <format type='auto'/>
  </source>
  <target>
    <path>/var/lib/libvirt/images/mypool</path>
  </target>
</pool>

# virsh pool-define mypool.xml

# virsh pool-build mypool

# virsh pool-start mypool

3. Check the pool is working fine.

# df -h
/dev/sda11              11M  1.1M  9.0M  11% /var/lib/libvirt/images/mypool

4. Prepare the following xml to create volume in the pool.

# cat vol-disk-template.xml
<volume>
  <name>disk1.img</name>
  <capacity unit='M'>100</capacity>
  <allocation unit='M'>0</allocation>
  <target>
    <path>/var/lib/libvirt/images/mypool/disk1.img</path>
    <format type='raw'/>
  </target>
</volume>

# virsh vol-create mypool vol-disk-template.xml

5. Attach the volume to an existing guest as 2rd disk, then start the guest.

6. In guest, try to write some staf (which size is bigger than 50M ) in the 2rd disk.

# fdisk /dev/vdb

# mkfs.ext3 /dev/vdb1

# mount /dev/vdb1 /mnt

# dd if=/dev/zero of=/mnt/test.img bs=1M count=50

7. virsh list --all
 Id Name                 State
----------------------------------
  6 guest                paused


Expected result:
There should be some message that prompt user no availabe space, and the guest can be resumed successfully.

Actual result:
The guest will be paused automatically and cannot be resumed without any error info when there is overcommit to the storage.

Additional info:

Comment 2 Dave Allan 2011-07-13 13:47:50 UTC
(In reply to comment #0)

Thank you for the detailed bug report--it's very helpful.

> and get the error report in qemu/$guest.log when the guest is paused
> # cat /var/log/libvirt/qemu/guest.log
> block I/O error in device 'drive-virtio-disk0': No space left on device (28)

That's where the error is intended to be reported.  There is also an event emitted that provides the same information as the guest.log, that the VM was stopped because of an i/o error: event VIR_DOMAIN_EVENT_SUSPENDED, detail VIR_DOMAIN_EVENT_SUSPENDED_IOERROR

> Expected result:
> There should be some message that prompt user no availabe space, and the guest
> can be resumed successfully.

The guest has been configured to pause on i/o error, which is why it does not resume, or rather it resumes and then immediately suspends again.  It can't continue because it's out of disk space.  If you extend the underlying storage, the guest will stay running.

Comment 3 Dave Allan 2011-07-13 14:01:26 UTC
BTW, to control the behavior on i/o error, see:

http://libvirt.org/formatdomain.html#elementsDisks

In particular:

The optional error_policy attribute controls how the hypervisor will behave on an error, possible values are "stop", "ignore", and "enospace".