Bug 1569614 - IOERROR pause code lost after resuming a VM while I/O error is still present
Summary: IOERROR pause code lost after resuming a VM while I/O error is still present
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: yanqzhan@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1526025
TreeView+ depends on / blocked
 
Reported: 2018-04-19 15:04 UTC by Markus Armbruster
Modified: 2018-06-26 15:56 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Under certain circumstances, resuming a paused guest generated redundant "VIR_DOMAIN_PAUSED_UNKNOWN" error messages in the libvirt log. This update corrects the event sending order when resuming guests, which prevents the errors being logged.
Clone Of: 1566153
Environment:
Last Closed: 2018-05-29 21:51:27 UTC
Target Upstream Version:


Attachments (Terms of Use)

Comment 2 Markus Armbruster 2018-04-19 15:18:55 UTC
When a VM is paused due to an erroneous storage, libvirt emits a
corresponding life cycle event with VIR_DOMAIN_PAUSED_IOERROR reason
and then VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON event. Also the state of
the VM is set appropriately:

  # virsh -r domstate 2 --reason
  paused (I/O error)

When the VM is then resumed manually while the I/O error still persists, it gets paused again immediately. However in that case the life cycle event contains VIR_DOMAIN_PAUSED_UNKNOWN reason. I/O error is also no longer reported when asking for the VM state:

  # virsh -r domstate 2 --reason
  paused (unknown)

Additionally, the order of incoming events is weird, as follows:

- IO_ERROR event
- RESUME event
- PAUSED event

That means the real pause reason is lost.

This happens because libvirt gets confused by the QMP events it receives from qemu-kvm after the resume: first BLOCK_IO_ERROR, then RESUME, then STOP.

I guess libvirt would be fine if qemu-kvm sent them in the more natural order RESUME, BLOCK_IO_ERROR, STOP.

Perhaps we can fix qemu-kvm to do that (bug 1566153), and perhaps making libvirt coping with the current order won't be necessary then.  This bug tracks possible libvirt work in case we can't fix qemu-kvm, or libvirt needs to cope with unfixed versions of qemu-kvm.

For detailed reproducers see bug 1566153.

Comment 3 Jiri Denemark 2018-05-29 21:51:27 UTC
QEMU fixed the order of emitted events and no additional libvirt work is needed.


Note You need to log in before you can comment on or make changes to this bug.